LeanMachineLearning exposition

3.3.ย SequentialLearning.StationaryEnv๐Ÿ”—

Oblivious and stationary environments

An oblivious environment is an environment in which the distribution of the next feedback depends only on the last action (and not on the past history). If the kernel that gives the distribution of the next feedback given the last action is the same at every time step, then we say that the environment is stationary.

Main definitions

We define a Prop-valued typeclass IsObliviousEnv to express that an environment is oblivious, and we define two constructors for oblivious environments.

Typeclass and related definitions:

  • IsObliviousEnv env: the environment env is oblivious.

  • feedbackCondAction env n: the kernel representing the conditional distribution of the feedback given the action at time n in an oblivious environment env.

Constructors for oblivious environments:

  • obliviousEnv ฮฝ: an oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner, and is given by a sequence of Markov kernels ฮฝ : โ„• โ†’ Kernel ๐“ ๐“จ.

  • stationaryEnv ฮฝ: a stationary environment, in which the distribution of the next feedback depends only on the last action (and not on the past history), and is given by a Markov kernel ฮฝ : Kernel ๐“ ๐“จ.

Module LeanMachineLearning.SequentialLearning.StationaryEnv contains 26 exposed declarations.

IsObliviousEnv๐Ÿ”—

Type ClassLearning.IsObliviousEnv

An environment is oblivious if the distribution of the next feedback depends only on the last action and not on the past history.

๐Ÿ”—type class
Learning.IsObliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) : Prop
Learning.IsObliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) : Prop

Code

class IsObliviousEnv (env : Environment ๐“ ๐“จ) : Prop where
  exists_eq_prodMkLeft : โˆƒ ฮฝ : โ„• โ†’ Kernel ๐“ ๐“จ, (โˆ€ n, IsMarkovKernel (ฮฝ n)) โˆง
    (env.ฮฝ0 = ฮฝ 0) โˆง (โˆ€ n, env.feedback n = (ฮฝ (n + 1)).prodMkLeft _)
Type uses (1)
Used by (13)

Actions: Source ยท Open Issue

feedbackCondAction๐Ÿ”—

DefinitionLearning.feedbackCondAction

The kernel representing the conditional distribution of the feedback given the action at time n in an oblivious environment.

๐Ÿ”—def
Learning.feedbackCondAction.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) [h_obl : IsObliviousEnv env] (n : โ„•) : ProbabilityTheory.Kernel ๐“ ๐“จ
Learning.feedbackCondAction.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) [h_obl : IsObliviousEnv env] (n : โ„•) : ProbabilityTheory.Kernel ๐“ ๐“จ

Code

noncomputable
def feedbackCondAction (env : Environment ๐“ ๐“จ) [h_obl : IsObliviousEnv env] (n : โ„•) : Kernel ๐“ ๐“จ :=
  h_obl.exists_eq_prodMkLeft.choose n
Type uses (2)
Used by (12)

Actions: Source ยท Open Issue

instIsMarkovKernelFeedbackCondAction๐Ÿ”—

InstanceLearning.instIsMarkovKernelFeedbackCondAction

No docstring.

๐Ÿ”—theorem
Learning.instIsMarkovKernelFeedbackCondAction.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) [IsObliviousEnv env] (n : โ„•) : ProbabilityTheory.IsMarkovKernel (feedbackCondAction env n)
Learning.instIsMarkovKernelFeedbackCondAction.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) [IsObliviousEnv env] (n : โ„•) : ProbabilityTheory.IsMarkovKernel (feedbackCondAction env n)

Code

instance (env : Environment ๐“ ๐“จ) [IsObliviousEnv env] (n : โ„•) :
    IsMarkovKernel (feedbackCondAction env n)
Type uses (3)
Used by (2)

Actions: Source ยท Open Issue

Proof
IsObliviousEnv.exists_eq_prodMkLeft.choose_spec.1 n

ฮฝ0_eq_feedbackCondAction๐Ÿ”—

LemmaLearning.ฮฝ0_eq_feedbackCondAction

No docstring.

๐Ÿ”—theorem
Learning.ฮฝ0_eq_feedbackCondAction.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) [IsObliviousEnv env] : Environment.ฮฝ0 env = feedbackCondAction env 0
Learning.ฮฝ0_eq_feedbackCondAction.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) [IsObliviousEnv env] : Environment.ฮฝ0 env = feedbackCondAction env 0

Code

lemma ฮฝ0_eq_feedbackCondAction (env : Environment ๐“ ๐“จ) [IsObliviousEnv env] :
    env.ฮฝ0 = feedbackCondAction env 0
Type uses (3)
Used by (2)

Actions: Source ยท Open Issue

Proof
IsObliviousEnv.exists_eq_prodMkLeft.choose_spec.2.1

feedback_eq_feedbackCondAction๐Ÿ”—

LemmaLearning.feedback_eq_feedbackCondAction

No docstring.

๐Ÿ”—theorem
Learning.feedback_eq_feedbackCondAction.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) [IsObliviousEnv env] (n : โ„•) : Environment.feedback env n = ProbabilityTheory.Kernel.prodMkLeft (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) (feedbackCondAction env (n + 1))
Learning.feedback_eq_feedbackCondAction.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) [IsObliviousEnv env] (n : โ„•) : Environment.feedback env n = ProbabilityTheory.Kernel.prodMkLeft (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) (feedbackCondAction env (n + 1))

Code

lemma feedback_eq_feedbackCondAction (env : Environment ๐“ ๐“จ) [IsObliviousEnv env] (n : โ„•) :
    env.feedback n = (feedbackCondAction env (n + 1)).prodMkLeft _
Type uses (3)
Used by (3)

Actions: Source ยท Open Issue

Proof
IsObliviousEnv.exists_eq_prodMkLeft.choose_spec.2.2 n

hasCondDistrib_feedback๐Ÿ”—

LemmaLearning.IsObliviousEnv.hasCondDistrib_feedback

No docstring.

๐Ÿ”—theorem
Learning.IsObliviousEnv.hasCondDistrib_feedback.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) : ProbabilityTheory.HasCondDistrib (Y n) (A n) (feedbackCondAction env n) P
Learning.IsObliviousEnv.hasCondDistrib_feedback.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) : ProbabilityTheory.HasCondDistrib (Y n) (A n) (feedbackCondAction env n) P

Code

lemma hasCondDistrib_feedback [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) :
    HasCondDistrib (Y n) (A n) (feedbackCondAction env n) P
Type uses (5)
Body uses (6)
Used by (3)

Actions: Source ยท Open Issue

Proof
by
  have hA := h.measurable_action
  have hY := h.measurable_feedback
  cases n with
  | zero => rw [โ† ฮฝ0_eq_feedbackCondAction]; exact h.hasCondDistrib_feedback_zero
  | succ n =>
    refine โŸจby fun_prop, ?_โŸฉ
    have h_eq := (h.hasCondDistrib_feedback n).map_eq
    have : P.map (A (n + 1)) =
        (P.map (fun x โ†ฆ (history A Y n x, A (n + 1) x))).snd := by
      rw [Measure.snd_map_prodMk (by fun_prop)]
    simp only [feedback_eq_feedbackCondAction] at h_eq
    rw [this, โ† Measure.snd_prodAssoc_compProd_prodMkLeft, โ† h_eq,
      Measure.snd_map_prodMk (by fun_prop), Measure.map_map (by fun_prop) (by fun_prop)]
    congr

condIndepFun_feedback_history_action๐Ÿ”—

LemmaLearning.IsObliviousEnv.condIndepFun_feedback_history_action

The feedback at time n + 1 is conditionally independent of the history up to time n given the action at time n + 1.

๐Ÿ”—theorem
Learning.IsObliviousEnv.condIndepFun_feedback_history_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (history A Y n) P
Learning.IsObliviousEnv.condIndepFun_feedback_history_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (history A Y n) P

Code

lemma condIndepFun_feedback_history_action [StandardBorelSpace ฮฉ]
        [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) :
    Y (n + 1) โŸ‚แตข[A (n + 1), h.measurable_action _ ; P] history A Y n
Type uses (5)
Body uses (6)
Used by (2)

Actions: Source ยท Open Issue

Proof
by
  have hA := h.measurable_action
  have hY := h.measurable_feedback
  refine condIndepFun_of_exists_condDistrib_prod_ae_eq_prodMkLeft
    (ฮท := feedbackCondAction env (n + 1))
    (by fun_prop) (by fun_prop) (by fun_prop) ?_
  refine HasCondDistrib.condDistrib_eq ?_
  rw [โ† feedback_eq_feedbackCondAction]
  exact h.hasCondDistrib_feedback n

condIndepFun_feedback_history_action_action๐Ÿ”—

LemmaLearning.IsObliviousEnv.condIndepFun_feedback_history_action_action

No docstring.

๐Ÿ”—theorem
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (fun ฯ‰ => (history A Y n ฯ‰, A (n + 1) ฯ‰)) P
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (fun ฯ‰ => (history A Y n ฯ‰, A (n + 1) ฯ‰)) P

Code

lemma condIndepFun_feedback_history_action_action [StandardBorelSpace ฮฉ]
    [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) :
    Y (n + 1) โŸ‚แตข[A (n + 1), h.measurable_action (n + 1); P]
      (fun ฯ‰ โ†ฆ (history A Y n ฯ‰, A (n + 1) ฯ‰))
Type uses (5)
Body uses (3)
Used by (2)

Actions: Source ยท Open Issue

Proof
by
  have h_indep : Y (n + 1) โŸ‚แตข[A (n + 1), h.measurable_action (n + 1); P] history A Y n :=
    condIndepFun_feedback_history_action h n
  have hA := h.measurable_action
  have hY := h.measurable_feedback
  exact h_indep.prod_right (by fun_prop) (by fun_prop) (by fun_prop)

condIndepFun_feedback_history_action_action'๐Ÿ”—

LemmaLearning.IsObliviousEnv.condIndepFun_feedback_history_action_action'

No docstring.

๐Ÿ”—theorem
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action'.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) (hn : n โ‰  0) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) โ‹ฏ (Y n) (fun ฯ‰ => (history A Y (n - 1) ฯ‰, A n ฯ‰)) P
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action'.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) (hn : n โ‰  0) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) โ‹ฏ (Y n) (fun ฯ‰ => (history A Y (n - 1) ฯ‰, A n ฯ‰)) P

Code

lemma condIndepFun_feedback_history_action_action' [StandardBorelSpace ฮฉ]
        [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) (hn : n โ‰  0) :
    Y n โŸ‚แตข[A n, h.measurable_action n; P] (fun ฯ‰ โ†ฆ (history A Y (n - 1) ฯ‰, A n ฯ‰))
Type uses (5)
Body uses (1)
Used by (1)

Actions: Source ยท Open Issue

Proof
by
  have := condIndepFun_feedback_history_action_action h (n - 1)
  grind

obliviousEnv๐Ÿ”—

DefinitionLearning.obliviousEnv

An oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner.

๐Ÿ”—def
Learning.obliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment ๐“ ๐“จ
Learning.obliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment ๐“ ๐“จ

Code

def obliviousEnv (ฮฝ : โ„• โ†’ Kernel ๐“ ๐“จ) [โˆ€ n, IsMarkovKernel (ฮฝ n)] : Environment ๐“ ๐“จ where
  feedback n := (ฮฝ (n + 1)).prodMkLeft _
  ฮฝ0 := ฮฝ 0
Type uses (1)
Used by (10)

Actions: Source ยท Open Issue

obliviousEnv_feedback๐Ÿ”—

LemmaLearning.obliviousEnv_feedback

No docstring.

๐Ÿ”—theorem
Learning.obliviousEnv_feedback.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ„•) : Environment.feedback (obliviousEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) (ฮฝ (n + 1))
Learning.obliviousEnv_feedback.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ„•) : Environment.feedback (obliviousEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) (ฮฝ (n + 1))

Code

theorem obliviousEnv_feedback : โˆ€ {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ)
  [inst : โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ„•),
  (Learning.obliviousEnv ฮฝ).feedback n = ProbabilityTheory.Kernel.prodMkLeft (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) (ฮฝ (n + 1))
Type uses (2)
Used by (2)

Actions: Source ยท Open Issue

Proof
@[simps]

obliviousEnv_ฮฝ0๐Ÿ”—

LemmaLearning.obliviousEnv_ฮฝ0

No docstring.

๐Ÿ”—theorem
Learning.obliviousEnv_ฮฝ0.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment.ฮฝ0 (obliviousEnv ฮฝ) = ฮฝ 0
Learning.obliviousEnv_ฮฝ0.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment.ฮฝ0 (obliviousEnv ฮฝ) = ฮฝ 0

Code

theorem obliviousEnv_ฮฝ0 : โˆ€ {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ)
  [inst : โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)], (Learning.obliviousEnv ฮฝ).ฮฝ0 = ฮฝ 0
Type uses (2)
Used by (1)

Actions: Source ยท Open Issue

Proof
@[simps]

feedback_obliviousEnv๐Ÿ”—

LemmaLearning.feedback_obliviousEnv

No docstring.

๐Ÿ”—theorem
Learning.feedback_obliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ„•) : Environment.feedback (obliviousEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) (ฮฝ (n + 1))
Learning.feedback_obliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ„•) : Environment.feedback (obliviousEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) (ฮฝ (n + 1))

Code

lemma feedback_obliviousEnv (ฮฝ : โ„• โ†’ Kernel ๐“ ๐“จ) [โˆ€ n, IsMarkovKernel (ฮฝ n)] (n : โ„•) :
    (obliviousEnv ฮฝ).feedback n = (ฮฝ (n + 1)).prodMkLeft _
Type uses (2)
Used by (1)

Actions: Source ยท Open Issue

Proof
by simp [obliviousEnv]

ฮฝ0_obliviousEnv๐Ÿ”—

LemmaLearning.ฮฝ0_obliviousEnv

No docstring.

๐Ÿ”—theorem
Learning.ฮฝ0_obliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment.ฮฝ0 (obliviousEnv ฮฝ) = ฮฝ 0
Learning.ฮฝ0_obliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment.ฮฝ0 (obliviousEnv ฮฝ) = ฮฝ 0

Code

lemma ฮฝ0_obliviousEnv (ฮฝ : โ„• โ†’ Kernel ๐“ ๐“จ) [โˆ€ n, IsMarkovKernel (ฮฝ n)] :
    (obliviousEnv ฮฝ).ฮฝ0 = ฮฝ 0
Type uses (2)
Used by (1)

Actions: Source ยท Open Issue

Proof
by simp [obliviousEnv]

instIsObliviousEnvObliviousEnv๐Ÿ”—

InstanceLearning.instIsObliviousEnvObliviousEnv

No docstring.

๐Ÿ”—theorem
Learning.instIsObliviousEnvObliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : IsObliviousEnv (obliviousEnv ฮฝ)
Learning.instIsObliviousEnvObliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : IsObliviousEnv (obliviousEnv ฮฝ)

Code

instance (ฮฝ : โ„• โ†’ Kernel ๐“ ๐“จ) [โˆ€ n, IsMarkovKernel (ฮฝ n)] :
    IsObliviousEnv (obliviousEnv ฮฝ) where
  exists_eq_prodMkLeft
Type uses (2)
Body uses (1)
Used by (1)

Actions: Source ยท Open Issue

Proof
โŸจfun n โ†ฆ ฮฝ n, inferInstance,rfl, fun _ โ†ฆ rflโŸฉ

feedbackCondAction_obliviousEnv๐Ÿ”—

LemmaLearning.feedbackCondAction_obliviousEnv

No docstring.

๐Ÿ”—theorem
Learning.feedbackCondAction_obliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [hฮฝ : โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ„•) : feedbackCondAction (obliviousEnv ฮฝ) n = ฮฝ n
Learning.feedbackCondAction_obliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : โ„• โ†’ ProbabilityTheory.Kernel ๐“ ๐“จ) [hฮฝ : โˆ€ (n : โ„•), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ„•) : feedbackCondAction (obliviousEnv ฮฝ) n = ฮฝ n

Code

lemma feedbackCondAction_obliviousEnv (ฮฝ : โ„• โ†’ Kernel ๐“ ๐“จ) [hฮฝ : โˆ€ n, IsMarkovKernel (ฮฝ n)]
    (n : โ„•) :
    feedbackCondAction (obliviousEnv ฮฝ) n = ฮฝ n
Type uses (3)
Body uses (6)
Used by (2)

Actions: Source ยท Open Issue

Proof
by
  rcases isEmpty_or_nonempty ๐“ with h๐“ | h๐“
  ยท ext a : 1
    exact h๐“.elim a
  rcases isEmpty_or_nonempty ๐“จ with hR | hR
  ยท refine absurd (hฮฝ 0) ?_
    simp only [Subsingleton.eq_zero ฮฝ, Pi.zero_apply]
    exact Kernel.not_isMarkovKernel_zero
  have : Nonempty (Iic n โ†’ ๐“ ร— ๐“จ) := โŸจfun _ โ†ฆ (h๐“.some, hR.some)โŸฉ
  have h_eq_zero := ฮฝ0_eq_feedbackCondAction (obliviousEnv ฮฝ)
  have h_eq := feedback_eq_feedbackCondAction (obliviousEnv ฮฝ) (n - 1)
  cases n with
  | zero => exact h_eq_zero.symm
  | succ n =>
    simp only [Nat.add_one_sub_one, obliviousEnv_feedback, add_tsub_cancel_right] at h_eq
    rw [โ† Kernel.prodMkLeft_inj (ฮณ := Iic n โ†’ ๐“ ร— ๐“จ)]
    exact h_eq.symm

stationaryEnv๐Ÿ”—

DefinitionLearning.stationaryEnv

A stationary environment, in which the distribution of the next feedback depends only on the last action.

๐Ÿ”—def
Learning.stationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : Environment ๐“ ๐“จ
Learning.stationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : Environment ๐“ ๐“จ

Code

def stationaryEnv (ฮฝ : Kernel ๐“ ๐“จ) [IsMarkovKernel ฮฝ] : Environment ๐“ ๐“จ := obliviousEnv fun _ โ†ฆ ฮฝ
Type uses (1)
Body uses (1)
Used by (81)

Actions: Source ยท Open Issue

feedback_stationaryEnv๐Ÿ”—

LemmaLearning.feedback_stationaryEnv

No docstring.

๐Ÿ”—theorem
Learning.feedback_stationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] (n : โ„•) : Environment.feedback (stationaryEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) ฮฝ
Learning.feedback_stationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] (n : โ„•) : Environment.feedback (stationaryEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) ฮฝ

Code

lemma feedback_stationaryEnv (ฮฝ : Kernel ๐“ ๐“จ) [IsMarkovKernel ฮฝ] (n : โ„•) :
    (stationaryEnv ฮฝ).feedback n = ฮฝ.prodMkLeft _
Type uses (2)
Body uses (1)

Actions: Source ยท Open Issue

Proof
by simp [stationaryEnv]

ฮฝ0_stationaryEnv๐Ÿ”—

LemmaLearning.ฮฝ0_stationaryEnv

No docstring.

๐Ÿ”—theorem
Learning.ฮฝ0_stationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : Environment.ฮฝ0 (stationaryEnv ฮฝ) = ฮฝ
Learning.ฮฝ0_stationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : Environment.ฮฝ0 (stationaryEnv ฮฝ) = ฮฝ

Code

lemma ฮฝ0_stationaryEnv (ฮฝ : Kernel ๐“ ๐“จ) [IsMarkovKernel ฮฝ] : (stationaryEnv ฮฝ).ฮฝ0 = ฮฝ
Type uses (2)
Body uses (1)

Actions: Source ยท Open Issue

Proof
by
  simp [stationaryEnv]

instIsObliviousEnvStationaryEnv๐Ÿ”—

InstanceLearning.instIsObliviousEnvStationaryEnv

No docstring.

๐Ÿ”—theorem
Learning.instIsObliviousEnvStationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : IsObliviousEnv (stationaryEnv ฮฝ)
Learning.instIsObliviousEnvStationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : IsObliviousEnv (stationaryEnv ฮฝ)

Code

instance (ฮฝ : Kernel ๐“ ๐“จ) [IsMarkovKernel ฮฝ] : IsObliviousEnv (stationaryEnv ฮฝ) where
  exists_eq_prodMkLeft
Type uses (2)
Body uses (1)
Used by (5)

Actions: Source ยท Open Issue

Proof
โŸจfun _ โ†ฆ ฮฝ, inferInstance, rfl, fun _ โ†ฆ rflโŸฉ

feedbackCondAction_stationaryEnv๐Ÿ”—

LemmaLearning.feedbackCondAction_stationaryEnv

No docstring.

๐Ÿ”—theorem
Learning.feedbackCondAction_stationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [hฮฝ : ProbabilityTheory.IsMarkovKernel ฮฝ] (n : โ„•) : feedbackCondAction (stationaryEnv ฮฝ) n = ฮฝ
Learning.feedbackCondAction_stationaryEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ) [hฮฝ : ProbabilityTheory.IsMarkovKernel ฮฝ] (n : โ„•) : feedbackCondAction (stationaryEnv ฮฝ) n = ฮฝ

Code

lemma feedbackCondAction_stationaryEnv (ฮฝ : Kernel ๐“ ๐“จ) [hฮฝ : IsMarkovKernel ฮฝ] (n : โ„•) :
    feedbackCondAction (stationaryEnv ฮฝ) n = ฮฝ
Type uses (3)
Body uses (1)
Used by (1)

Actions: Source ยท Open Issue

Proof
feedbackCondAction_obliviousEnv _ _

hasCondDistrib_feedback_stationaryEnv๐Ÿ”—

LemmaLearning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv

The conditional distribution of the feedback at time n given the action at time n is ฮฝ.

๐Ÿ”—theorem
Learning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) : ProbabilityTheory.HasCondDistrib (Y n) (A n) ฮฝ P
Learning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) : ProbabilityTheory.HasCondDistrib (Y n) (A n) ฮฝ P

Code

lemma hasCondDistrib_feedback_stationaryEnv
    (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) :
    HasCondDistrib (Y n) (A n) ฮฝ P
Type uses (3)
Body uses (4)
Used by (1)

Actions: Source ยท Open Issue

Proof
by
  simpa using IsObliviousEnv.hasCondDistrib_feedback h n

condDistrib_feedback_stationaryEnv๐Ÿ”—

LemmaLearning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv

The conditional distribution of the feedback at time n given the action at time n is ฮฝ.

๐Ÿ”—theorem
Learning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) : โ‡‘๐“›[Y n | A n; P] =แต[MeasureTheory.Measure.map (A n) P] โ‡‘ฮฝ
Learning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) : โ‡‘๐“›[Y n | A n; P] =แต[MeasureTheory.Measure.map (A n) P] โ‡‘ฮฝ

Code

lemma condDistrib_feedback_stationaryEnv [StandardBorelSpace ๐“จ] [Nonempty ๐“จ]
    (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) :
    condDistrib (Y n) (A n) P =แต[P.map (A n)] ฮฝ
Type uses (3)
Body uses (2)
Used by (1)

Actions: Source ยท Open Issue

Proof
(hasCondDistrib_feedback_stationaryEnv h n).condDistrib_eq

condIndepFun_feedback_history_action๐Ÿ”—

LemmaLearning.IsAlgEnvSeq.condIndepFun_feedback_history_action

The feedback at time n + 1 is conditionally independent of the history up to time n given the action at time n + 1.

๐Ÿ”—theorem
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (history A Y n) P
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (history A Y n) P

Code

lemma condIndepFun_feedback_history_action [StandardBorelSpace ฮฉ]
    [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ]
    (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) :
    Y (n + 1) โŸ‚แตข[A (n + 1), h.measurable_action _ ; P] history A Y n
Type uses (5)
Body uses (2)

Actions: Source ยท Open Issue

Proof
IsObliviousEnv.condIndepFun_feedback_history_action h n

condIndepFun_feedback_history_action_action๐Ÿ”—

LemmaLearning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action

No docstring.

๐Ÿ”—theorem
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (fun ฯ‰ => (history A Y n ฯ‰, A (n + 1) ฯ‰)) P
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (fun ฯ‰ => (history A Y n ฯ‰, A (n + 1) ฯ‰)) P

Code

lemma condIndepFun_feedback_history_action_action [StandardBorelSpace ฮฉ]
    [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ]
    (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) :
    Y (n + 1) โŸ‚แตข[A (n + 1), h.measurable_action (n + 1); P]
      (fun ฯ‰ โ†ฆ (history A Y n ฯ‰, A (n + 1) ฯ‰))
Type uses (5)
Body uses (2)

Actions: Source ยท Open Issue

Proof
IsObliviousEnv.condIndepFun_feedback_history_action_action h n

condIndepFun_feedback_history_action_action'๐Ÿ”—

LemmaLearning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action'

No docstring.

๐Ÿ”—theorem
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action'.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) (hn : n โ‰  0) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) โ‹ฏ (Y n) (fun ฯ‰ => (history A Y (n - 1) ฯ‰, A n ฯ‰)) P
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action'.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {ฮฝ : ProbabilityTheory.Kernel ๐“ ๐“จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) (hn : n โ‰  0) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) โ‹ฏ (Y n) (fun ฯ‰ => (history A Y (n - 1) ฯ‰, A n ฯ‰)) P

Code

lemma condIndepFun_feedback_history_action_action' [StandardBorelSpace ฮฉ]
    [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ]
    (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ„•) (hn : n โ‰  0) :
    Y n โŸ‚แตข[A n, h.measurable_action n; P] (fun ฯ‰ โ†ฆ (history A Y (n - 1) ฯ‰, A n ฯ‰))
Type uses (5)
Body uses (2)
Used by (1)

Actions: Source ยท Open Issue

Proof
IsObliviousEnv.condIndepFun_feedback_history_action_action' h n hn
  1. Learning.IsObliviousEnv
  2. Learning.feedbackCondAction
  3. Learning.instIsMarkovKernelFeedbackCondAction
  4. Learning.ฮฝ0_eq_feedbackCondAction
  5. Learning.feedback_eq_feedbackCondAction
  6. Learning.IsObliviousEnv.hasCondDistrib_feedback
  7. Learning.IsObliviousEnv.condIndepFun_feedback_history_action
  8. Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action
  9. Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action'
  10. Learning.obliviousEnv
  11. Learning.obliviousEnv_feedback
  12. Learning.obliviousEnv_ฮฝ0
  13. Learning.feedback_obliviousEnv
  14. Learning.ฮฝ0_obliviousEnv
  15. Learning.instIsObliviousEnvObliviousEnv
  16. Learning.feedbackCondAction_obliviousEnv
  17. Learning.stationaryEnv
  18. Learning.feedback_stationaryEnv
  19. Learning.ฮฝ0_stationaryEnv
  20. Learning.instIsObliviousEnvStationaryEnv
  21. Learning.feedbackCondAction_stationaryEnv
  22. Learning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv
  23. Learning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv
  24. Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action
  25. Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action
  26. Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action'