3.3.ย SequentialLearning.StationaryEnv
Oblivious and stationary environments
An oblivious environment is an environment in which the distribution of the next feedback depends only on the last action (and not on the past history). If the kernel that gives the distribution of the next feedback given the last action is the same at every time step, then we say that the environment is stationary.
Main definitions
We define a Prop-valued typeclass IsObliviousEnv to express that an environment is oblivious,
and we define two constructors for oblivious environments.
Typeclass and related definitions:
-
IsObliviousEnv env: the environmentenvis oblivious. -
feedbackCondAction env n: the kernel representing the conditional distribution of the feedback given the action at timenin an oblivious environmentenv.
Constructors for oblivious environments:
-
obliviousEnv ฮฝ: an oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner, and is given by a sequence of Markov kernelsฮฝ : โ โ Kernel ๐ ๐จ. -
stationaryEnv ฮฝ: a stationary environment, in which the distribution of the next feedback depends only on the last action (and not on the past history), and is given by a Markov kernelฮฝ : Kernel ๐ ๐จ.
Module LeanMachineLearning.SequentialLearning.StationaryEnv contains 26 exposed declarations.
IsObliviousEnv๐
Learning.IsObliviousEnvAn environment is oblivious if the distribution of the next feedback depends only on the last action and not on the past history.
Learning.IsObliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) : PropLearning.IsObliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) : Prop
Code
class IsObliviousEnv (env : Environment ๐ ๐จ) : Prop where
exists_eq_prodMkLeft : โ ฮฝ : โ โ Kernel ๐ ๐จ, (โ n, IsMarkovKernel (ฮฝ n)) โง
(env.ฮฝ0 = ฮฝ 0) โง (โ n, env.feedback n = (ฮฝ (n + 1)).prodMkLeft _)Type uses (1)
Used by (13)
Actions: Source ยท Open Issue
feedbackCondAction๐
Learning.feedbackCondAction
The kernel representing the conditional distribution of the feedback given the action
at time n in an oblivious environment.
Learning.feedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [h_obl : IsObliviousEnv env] (n : โ) : ProbabilityTheory.Kernel ๐ ๐จLearning.feedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [h_obl : IsObliviousEnv env] (n : โ) : ProbabilityTheory.Kernel ๐ ๐จ
Code
noncomputable def feedbackCondAction (env : Environment ๐ ๐จ) [h_obl : IsObliviousEnv env] (n : โ) : Kernel ๐ ๐จ := h_obl.exists_eq_prodMkLeft.choose n
Type uses (2)
Used by (12)
Actions: Source ยท Open Issue
instIsMarkovKernelFeedbackCondAction๐
Learning.instIsMarkovKernelFeedbackCondActionNo docstring.
Learning.instIsMarkovKernelFeedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [IsObliviousEnv env] (n : โ) : ProbabilityTheory.IsMarkovKernel (feedbackCondAction env n)Learning.instIsMarkovKernelFeedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [IsObliviousEnv env] (n : โ) : ProbabilityTheory.IsMarkovKernel (feedbackCondAction env n)
Code
instance (env : Environment ๐ ๐จ) [IsObliviousEnv env] (n : โ) :
IsMarkovKernel (feedbackCondAction env n)Type uses (3)
Actions: Source ยท Open Issue
Proof
IsObliviousEnv.exists_eq_prodMkLeft.choose_spec.1 n
ฮฝ0_eq_feedbackCondAction๐
Learning.ฮฝ0_eq_feedbackCondActionNo docstring.
Learning.ฮฝ0_eq_feedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [IsObliviousEnv env] : Environment.ฮฝ0 env = feedbackCondAction env 0Learning.ฮฝ0_eq_feedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [IsObliviousEnv env] : Environment.ฮฝ0 env = feedbackCondAction env 0
Code
lemma ฮฝ0_eq_feedbackCondAction (env : Environment ๐ ๐จ) [IsObliviousEnv env] :
env.ฮฝ0 = feedbackCondAction env 0Type uses (3)
Actions: Source ยท Open Issue
Proof
IsObliviousEnv.exists_eq_prodMkLeft.choose_spec.2.1
feedback_eq_feedbackCondAction๐
Learning.feedback_eq_feedbackCondActionNo docstring.
Learning.feedback_eq_feedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [IsObliviousEnv env] (n : โ) : Environment.feedback env n = ProbabilityTheory.Kernel.prodMkLeft (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (feedbackCondAction env (n + 1))Learning.feedback_eq_feedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [IsObliviousEnv env] (n : โ) : Environment.feedback env n = ProbabilityTheory.Kernel.prodMkLeft (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (feedbackCondAction env (n + 1))
Code
lemma feedback_eq_feedbackCondAction (env : Environment ๐ ๐จ) [IsObliviousEnv env] (n : โ) :
env.feedback n = (feedbackCondAction env (n + 1)).prodMkLeft _Type uses (3)
Used by (3)
Actions: Source ยท Open Issue
Proof
IsObliviousEnv.exists_eq_prodMkLeft.choose_spec.2.2 n
hasCondDistrib_feedback๐
Learning.IsObliviousEnv.hasCondDistrib_feedbackNo docstring.
Learning.IsObliviousEnv.hasCondDistrib_feedback.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {env : Environment ๐ ๐จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) : ProbabilityTheory.HasCondDistrib (Y n) (A n) (feedbackCondAction env n) PLearning.IsObliviousEnv.hasCondDistrib_feedback.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {env : Environment ๐ ๐จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) : ProbabilityTheory.HasCondDistrib (Y n) (A n) (feedbackCondAction env n) P
Code
lemma hasCondDistrib_feedback [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) :
HasCondDistrib (Y n) (A n) (feedbackCondAction env n) PType uses (5)
Body uses (6)
Used by (3)
Actions: Source ยท Open Issue
Proof
by
have hA := h.measurable_action
have hY := h.measurable_feedback
cases n with
| zero => rw [โ ฮฝ0_eq_feedbackCondAction]; exact h.hasCondDistrib_feedback_zero
| succ n =>
refine โจby fun_prop, ?_โฉ
have h_eq := (h.hasCondDistrib_feedback n).map_eq
have : P.map (A (n + 1)) =
(P.map (fun x โฆ (history A Y n x, A (n + 1) x))).snd := by
rw [Measure.snd_map_prodMk (by fun_prop)]
simp only [feedback_eq_feedbackCondAction] at h_eq
rw [this, โ Measure.snd_prodAssoc_compProd_prodMkLeft, โ h_eq,
Measure.snd_map_prodMk (by fun_prop), Measure.map_map (by fun_prop) (by fun_prop)]
congr
condIndepFun_feedback_history_action๐
Learning.IsObliviousEnv.condIndepFun_feedback_history_action
The feedback at time n + 1 is conditionally independent of the history up to time n
given the action at time n + 1.
Learning.IsObliviousEnv.condIndepFun_feedback_history_action.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {env : Environment ๐ ๐จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โฏ (Y (n + 1)) (history A Y n) PLearning.IsObliviousEnv.condIndepFun_feedback_history_action.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {env : Environment ๐ ๐จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โฏ (Y (n + 1)) (history A Y n) P
Code
lemma condIndepFun_feedback_history_action [StandardBorelSpace ฮฉ]
[IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) :
Y (n + 1) โแตข[A (n + 1), h.measurable_action _ ; P] history A Y nType uses (5)
Body uses (6)
Actions: Source ยท Open Issue
Proof
by
have hA := h.measurable_action
have hY := h.measurable_feedback
refine condIndepFun_of_exists_condDistrib_prod_ae_eq_prodMkLeft
(ฮท := feedbackCondAction env (n + 1))
(by fun_prop) (by fun_prop) (by fun_prop) ?_
refine HasCondDistrib.condDistrib_eq ?_
rw [โ feedback_eq_feedbackCondAction]
exact h.hasCondDistrib_feedback n
condIndepFun_feedback_history_action_action๐
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_actionNo docstring.
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {env : Environment ๐ ๐จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โฏ (Y (n + 1)) (fun ฯ => (history A Y n ฯ, A (n + 1) ฯ)) PLearning.IsObliviousEnv.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {env : Environment ๐ ๐จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โฏ (Y (n + 1)) (fun ฯ => (history A Y n ฯ, A (n + 1) ฯ)) P
Code
lemma condIndepFun_feedback_history_action_action [StandardBorelSpace ฮฉ]
[IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) :
Y (n + 1) โแตข[A (n + 1), h.measurable_action (n + 1); P]
(fun ฯ โฆ (history A Y n ฯ, A (n + 1) ฯ))Type uses (5)
Body uses (3)
Used by (2)
Actions: Source ยท Open Issue
Proof
by
have h_indep : Y (n + 1) โแตข[A (n + 1), h.measurable_action (n + 1); P] history A Y n :=
condIndepFun_feedback_history_action h n
have hA := h.measurable_action
have hY := h.measurable_feedback
exact h_indep.prod_right (by fun_prop) (by fun_prop) (by fun_prop)
condIndepFun_feedback_history_action_action'๐
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action'No docstring.
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action'.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {env : Environment ๐ ๐จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) (hn : n โ 0) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) โฏ (Y n) (fun ฯ => (history A Y (n - 1) ฯ, A n ฯ)) PLearning.IsObliviousEnv.condIndepFun_feedback_history_action_action'.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {env : Environment ๐ ๐จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) (hn : n โ 0) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) โฏ (Y n) (fun ฯ => (history A Y (n - 1) ฯ, A n ฯ)) P
Code
lemma condIndepFun_feedback_history_action_action' [StandardBorelSpace ฮฉ]
[IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ) (hn : n โ 0) :
Y n โแตข[A n, h.measurable_action n; P] (fun ฯ โฆ (history A Y (n - 1) ฯ, A n ฯ))Type uses (5)
Body uses (1)
Used by (1)
Actions: Source ยท Open Issue
Proof
by have := condIndepFun_feedback_history_action_action h (n - 1) grind
obliviousEnv๐
Learning.obliviousEnvAn oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner.
Learning.obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment ๐ ๐จLearning.obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment ๐ ๐จ
Code
def obliviousEnv (ฮฝ : โ โ Kernel ๐ ๐จ) [โ n, IsMarkovKernel (ฮฝ n)] : Environment ๐ ๐จ where feedback n := (ฮฝ (n + 1)).prodMkLeft _ ฮฝ0 := ฮฝ 0
Type uses (1)
Used by (10)
Actions: Source ยท Open Issue
obliviousEnv_feedback๐
Learning.obliviousEnv_feedbackNo docstring.
Learning.obliviousEnv_feedback.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ) : Environment.feedback (obliviousEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (ฮฝ (n + 1))Learning.obliviousEnv_feedback.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ) : Environment.feedback (obliviousEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (ฮฝ (n + 1))
Code
theorem obliviousEnv_feedback : โ {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ)
[inst : โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ),
(Learning.obliviousEnv ฮฝ).feedback n = ProbabilityTheory.Kernel.prodMkLeft (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (ฮฝ (n + 1))Type uses (2)
Actions: Source ยท Open Issue
Proof
@[simps]
obliviousEnv_ฮฝ0๐
Learning.obliviousEnv_ฮฝ0No docstring.
Learning.obliviousEnv_ฮฝ0.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment.ฮฝ0 (obliviousEnv ฮฝ) = ฮฝ 0Learning.obliviousEnv_ฮฝ0.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment.ฮฝ0 (obliviousEnv ฮฝ) = ฮฝ 0
Code
theorem obliviousEnv_ฮฝ0 : โ {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ)
[inst : โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)], (Learning.obliviousEnv ฮฝ).ฮฝ0 = ฮฝ 0Type uses (2)
Used by (1)
Actions: Source ยท Open Issue
Proof
@[simps]
feedback_obliviousEnv๐
Learning.feedback_obliviousEnvNo docstring.
Learning.feedback_obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ) : Environment.feedback (obliviousEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (ฮฝ (n + 1))Learning.feedback_obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ) : Environment.feedback (obliviousEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (ฮฝ (n + 1))
Code
lemma feedback_obliviousEnv (ฮฝ : โ โ Kernel ๐ ๐จ) [โ n, IsMarkovKernel (ฮฝ n)] (n : โ) :
(obliviousEnv ฮฝ).feedback n = (ฮฝ (n + 1)).prodMkLeft _Type uses (2)
Used by (1)
Actions: Source ยท Open Issue
Proof
by simp [obliviousEnv]
ฮฝ0_obliviousEnv๐
Learning.ฮฝ0_obliviousEnvNo docstring.
Learning.ฮฝ0_obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment.ฮฝ0 (obliviousEnv ฮฝ) = ฮฝ 0Learning.ฮฝ0_obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment.ฮฝ0 (obliviousEnv ฮฝ) = ฮฝ 0
Code
lemma ฮฝ0_obliviousEnv (ฮฝ : โ โ Kernel ๐ ๐จ) [โ n, IsMarkovKernel (ฮฝ n)] :
(obliviousEnv ฮฝ).ฮฝ0 = ฮฝ 0Type uses (2)
Used by (1)
Actions: Source ยท Open Issue
Proof
by simp [obliviousEnv]
instIsObliviousEnvObliviousEnv๐
Learning.instIsObliviousEnvObliviousEnvNo docstring.
Learning.instIsObliviousEnvObliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : IsObliviousEnv (obliviousEnv ฮฝ)Learning.instIsObliviousEnvObliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : IsObliviousEnv (obliviousEnv ฮฝ)
Code
instance (ฮฝ : โ โ Kernel ๐ ๐จ) [โ n, IsMarkovKernel (ฮฝ n)] :
IsObliviousEnv (obliviousEnv ฮฝ) where
exists_eq_prodMkLeftType uses (2)
Body uses (1)
Used by (1)
Actions: Source ยท Open Issue
Proof
โจfun n โฆ ฮฝ n, inferInstance,rfl, fun _ โฆ rflโฉ
feedbackCondAction_obliviousEnv๐
Learning.feedbackCondAction_obliviousEnvNo docstring.
Learning.feedbackCondAction_obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [hฮฝ : โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ) : feedbackCondAction (obliviousEnv ฮฝ) n = ฮฝ nLearning.feedbackCondAction_obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [hฮฝ : โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] (n : โ) : feedbackCondAction (obliviousEnv ฮฝ) n = ฮฝ n
Code
lemma feedbackCondAction_obliviousEnv (ฮฝ : โ โ Kernel ๐ ๐จ) [hฮฝ : โ n, IsMarkovKernel (ฮฝ n)]
(n : โ) :
feedbackCondAction (obliviousEnv ฮฝ) n = ฮฝ nType uses (3)
Body uses (6)
Actions: Source ยท Open Issue
Proof
by
rcases isEmpty_or_nonempty ๐ with h๐ | h๐
ยท ext a : 1
exact h๐.elim a
rcases isEmpty_or_nonempty ๐จ with hR | hR
ยท refine absurd (hฮฝ 0) ?_
simp only [Subsingleton.eq_zero ฮฝ, Pi.zero_apply]
exact Kernel.not_isMarkovKernel_zero
have : Nonempty (Iic n โ ๐ ร ๐จ) := โจfun _ โฆ (h๐.some, hR.some)โฉ
have h_eq_zero := ฮฝ0_eq_feedbackCondAction (obliviousEnv ฮฝ)
have h_eq := feedback_eq_feedbackCondAction (obliviousEnv ฮฝ) (n - 1)
cases n with
| zero => exact h_eq_zero.symm
| succ n =>
simp only [Nat.add_one_sub_one, obliviousEnv_feedback, add_tsub_cancel_right] at h_eq
rw [โ Kernel.prodMkLeft_inj (ฮณ := Iic n โ ๐ ร ๐จ)]
exact h_eq.symm
stationaryEnv๐
Learning.stationaryEnvA stationary environment, in which the distribution of the next feedback depends only on the last action.
Learning.stationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : Environment ๐ ๐จLearning.stationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : Environment ๐ ๐จ
Code
def stationaryEnv (ฮฝ : Kernel ๐ ๐จ) [IsMarkovKernel ฮฝ] : Environment ๐ ๐จ := obliviousEnv fun _ โฆ ฮฝ
Type uses (1)
Body uses (1)
Used by (81)
Actions: Source ยท Open Issue
feedback_stationaryEnv๐
Learning.feedback_stationaryEnvNo docstring.
Learning.feedback_stationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] (n : โ) : Environment.feedback (stationaryEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โฅ(Finset.Iic n) โ ๐ ร ๐จ) ฮฝLearning.feedback_stationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] (n : โ) : Environment.feedback (stationaryEnv ฮฝ) n = ProbabilityTheory.Kernel.prodMkLeft (โฅ(Finset.Iic n) โ ๐ ร ๐จ) ฮฝ
Code
lemma feedback_stationaryEnv (ฮฝ : Kernel ๐ ๐จ) [IsMarkovKernel ฮฝ] (n : โ) :
(stationaryEnv ฮฝ).feedback n = ฮฝ.prodMkLeft _Type uses (2)
Body uses (1)
Actions: Source ยท Open Issue
Proof
by simp [stationaryEnv]
ฮฝ0_stationaryEnv๐
Learning.ฮฝ0_stationaryEnvNo docstring.
Learning.ฮฝ0_stationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : Environment.ฮฝ0 (stationaryEnv ฮฝ) = ฮฝLearning.ฮฝ0_stationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : Environment.ฮฝ0 (stationaryEnv ฮฝ) = ฮฝ
Code
lemma ฮฝ0_stationaryEnv (ฮฝ : Kernel ๐ ๐จ) [IsMarkovKernel ฮฝ] : (stationaryEnv ฮฝ).ฮฝ0 = ฮฝ
Type uses (2)
Body uses (1)
Actions: Source ยท Open Issue
Proof
by simp [stationaryEnv]
instIsObliviousEnvStationaryEnv๐
Learning.instIsObliviousEnvStationaryEnvNo docstring.
Learning.instIsObliviousEnvStationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : IsObliviousEnv (stationaryEnv ฮฝ)Learning.instIsObliviousEnvStationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [ProbabilityTheory.IsMarkovKernel ฮฝ] : IsObliviousEnv (stationaryEnv ฮฝ)
Code
instance (ฮฝ : Kernel ๐ ๐จ) [IsMarkovKernel ฮฝ] : IsObliviousEnv (stationaryEnv ฮฝ) where exists_eq_prodMkLeft
Type uses (2)
Body uses (1)
Used by (5)
Actions: Source ยท Open Issue
Proof
โจfun _ โฆ ฮฝ, inferInstance, rfl, fun _ โฆ rflโฉ
feedbackCondAction_stationaryEnv๐
Learning.feedbackCondAction_stationaryEnvNo docstring.
Learning.feedbackCondAction_stationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [hฮฝ : ProbabilityTheory.IsMarkovKernel ฮฝ] (n : โ) : feedbackCondAction (stationaryEnv ฮฝ) n = ฮฝLearning.feedbackCondAction_stationaryEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ) [hฮฝ : ProbabilityTheory.IsMarkovKernel ฮฝ] (n : โ) : feedbackCondAction (stationaryEnv ฮฝ) n = ฮฝ
Code
lemma feedbackCondAction_stationaryEnv (ฮฝ : Kernel ๐ ๐จ) [hฮฝ : IsMarkovKernel ฮฝ] (n : โ) :
feedbackCondAction (stationaryEnv ฮฝ) n = ฮฝType uses (3)
Body uses (1)
Used by (1)
Actions: Source ยท Open Issue
Proof
feedbackCondAction_obliviousEnv _ _
hasCondDistrib_feedback_stationaryEnv๐
Learning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv
The conditional distribution of the feedback at time n given the action at time n is ฮฝ.
Learning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) : ProbabilityTheory.HasCondDistrib (Y n) (A n) ฮฝ PLearning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) : ProbabilityTheory.HasCondDistrib (Y n) (A n) ฮฝ P
Code
lemma hasCondDistrib_feedback_stationaryEnv
(h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) :
HasCondDistrib (Y n) (A n) ฮฝ PType uses (3)
Body uses (4)
Used by (1)
Actions: Source ยท Open Issue
Proof
by simpa using IsObliviousEnv.hasCondDistrib_feedback h n
condDistrib_feedback_stationaryEnv๐
Learning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv
The conditional distribution of the feedback at time n given the action at time n is ฮฝ.
Learning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ๐จ] [Nonempty ๐จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) : โ๐[Y n | A n; P] =แต[MeasureTheory.Measure.map (A n) P] โฮฝLearning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ๐จ] [Nonempty ๐จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) : โ๐[Y n | A n; P] =แต[MeasureTheory.Measure.map (A n) P] โฮฝ
Code
lemma condDistrib_feedback_stationaryEnv [StandardBorelSpace ๐จ] [Nonempty ๐จ]
(h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) :
condDistrib (Y n) (A n) P =แต[P.map (A n)] ฮฝType uses (3)
Body uses (2)
Used by (1)
Actions: Source ยท Open Issue
Proof
(hasCondDistrib_feedback_stationaryEnv h n).condDistrib_eq
condIndepFun_feedback_history_action๐
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action
The feedback at time n + 1 is conditionally independent of the history up to time n
given the action at time n + 1.
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โฏ (Y (n + 1)) (history A Y n) PLearning.IsAlgEnvSeq.condIndepFun_feedback_history_action.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โฏ (Y (n + 1)) (history A Y n) P
Code
lemma condIndepFun_feedback_history_action [StandardBorelSpace ฮฉ]
[StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ]
(h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) :
Y (n + 1) โแตข[A (n + 1), h.measurable_action _ ; P] history A Y nType uses (5)
Actions: Source ยท Open Issue
Proof
IsObliviousEnv.condIndepFun_feedback_history_action h n
condIndepFun_feedback_history_action_action๐
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_actionNo docstring.
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โฏ (Y (n + 1)) (fun ฯ => (history A Y n ฯ, A (n + 1) ฯ)) PLearning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โฏ (Y (n + 1)) (fun ฯ => (history A Y n ฯ, A (n + 1) ฯ)) P
Code
lemma condIndepFun_feedback_history_action_action [StandardBorelSpace ฮฉ]
[StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ]
(h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) :
Y (n + 1) โแตข[A (n + 1), h.measurable_action (n + 1); P]
(fun ฯ โฆ (history A Y n ฯ, A (n + 1) ฯ))Type uses (5)
Actions: Source ยท Open Issue
Proof
IsObliviousEnv.condIndepFun_feedback_history_action_action h n
condIndepFun_feedback_history_action_action'๐
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action'No docstring.
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action'.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) (hn : n โ 0) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) โฏ (Y n) (fun ฯ => (history A Y (n - 1) ฯ, A n ฯ)) PLearning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action'.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐ ๐จ} {ฮฝ : ProbabilityTheory.Kernel ๐ ๐จ} [ProbabilityTheory.IsMarkovKernel ฮฝ] {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ โ ฮฉ โ ๐} {Y : โ โ ฮฉ โ ๐จ} [StandardBorelSpace ฮฉ] [StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ] (h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) (hn : n โ 0) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) โฏ (Y n) (fun ฯ => (history A Y (n - 1) ฯ, A n ฯ)) P
Code
lemma condIndepFun_feedback_history_action_action' [StandardBorelSpace ฮฉ]
[StandardBorelSpace ๐] [Nonempty ๐] [StandardBorelSpace ๐จ] [Nonempty ๐จ]
(h : IsAlgEnvSeq A Y alg (stationaryEnv ฮฝ) P) (n : โ) (hn : n โ 0) :
Y n โแตข[A n, h.measurable_action n; P] (fun ฯ โฆ (history A Y (n - 1) ฯ, A n ฯ))Type uses (5)
Used by (1)
Actions: Source ยท Open Issue
Proof
IsObliviousEnv.condIndepFun_feedback_history_action_action' h n hn
-
Learning.IsObliviousEnv -
Learning.feedbackCondAction -
Learning.instIsMarkovKernelFeedbackCondAction -
Learning.ฮฝ0_eq_feedbackCondAction -
Learning.feedback_eq_feedbackCondAction -
Learning.IsObliviousEnv.hasCondDistrib_feedback -
Learning.IsObliviousEnv.condIndepFun_feedback_history_action -
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action -
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action' -
Learning.obliviousEnv -
Learning.obliviousEnv_feedback -
Learning.obliviousEnv_ฮฝ0 -
Learning.feedback_obliviousEnv -
Learning.ฮฝ0_obliviousEnv -
Learning.instIsObliviousEnvObliviousEnv -
Learning.feedbackCondAction_obliviousEnv -
Learning.stationaryEnv -
Learning.feedback_stationaryEnv -
Learning.ฮฝ0_stationaryEnv -
Learning.instIsObliviousEnvStationaryEnv -
Learning.feedbackCondAction_stationaryEnv -
Learning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv -
Learning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv -
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action -
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action -
Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action'