Documentation

LeanMachineLearning.SequentialLearning.StationaryEnv

Oblivious and stationary environments #

An oblivious environment is an environment in which the distribution of the next feedback depends only on the last action (and not on the past history). If the kernel that gives the distribution of the next feedback given the last action is the same at every time step, then we say that the environment is stationary.

Main definitions #

We define a Prop-valued typeclass IsObliviousEnv to express that an environment is oblivious, and we define two constructors for oblivious environments.

Typeclass and related definitions:

IsObliviousEnv env: the environment env is oblivious.
feedbackCondAction env n: the kernel representing the conditional distribution of the feedback given the action at time n in an oblivious environment env.

Constructors for oblivious environments:

obliviousEnv ν: an oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner, and is given by a sequence of Markov kernels ν : ℕ → Kernel 𝓐 𝓨.
stationaryEnv ν: a stationary environment, in which the distribution of the next feedback depends only on the last action (and not on the past history), and is given by a Markov kernel ν : Kernel 𝓐 𝓨.

class Learning.IsObliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) :

An environment is oblivious if the distribution of the next feedback depends only on the last action and not on the past history.

exists_eq_prodMkLeft : ∃ (ν : ℕ → ProbabilityTheory.Kernel 𝓐 𝓨), (∀ (n : ℕ), ProbabilityTheory.IsMarkovKernel (ν n)) ∧ env.ν0 = ν 0 ∧ ∀ (n : ℕ), env.feedback n = ProbabilityTheory.Kernel.prodMkLeft (↥(Finset.Iic n) → 𝓐 × 𝓨) (ν (n + 1))

Instances

noncomputable def Learning.feedbackCondAction {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) [h_obl : IsObliviousEnv env] (n : ℕ) :

ProbabilityTheory.Kernel 𝓐 𝓨

The kernel representing the conditional distribution of the feedback given the action at time n in an oblivious environment.

Equations

Learning.feedbackCondAction env n = ⋯.choose n

Instances For

instance Learning.instIsMarkovKernelFeedbackCondAction {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) [IsObliviousEnv env] (n : ℕ) :

ProbabilityTheory.IsMarkovKernel (feedbackCondAction env n)

theorem Learning.ν0_eq_feedbackCondAction {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) [IsObliviousEnv env] :

env.ν0 = feedbackCondAction env 0

theorem Learning.feedback_eq_feedbackCondAction {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) [IsObliviousEnv env] (n : ℕ) :

env.feedback n = ProbabilityTheory.Kernel.prodMkLeft (↥(Finset.Iic n) → 𝓐 × 𝓨) (feedbackCondAction env (n + 1))

theorem Learning.IsObliviousEnv.hasCondDistrib_feedback {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} {mΩ : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {env : Environment 𝓐 𝓨} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] {A : ℕ → Ω → 𝓐} {Y : ℕ → Ω → 𝓨} [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : ℕ) :

ProbabilityTheory.HasCondDistrib (Y n) (A n) (feedbackCondAction env n) P

theorem Learning.IsObliviousEnv.condIndepFun_feedback_history_action {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} {mΩ : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {env : Environment 𝓐 𝓨} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] {A : ℕ → Ω → 𝓐} {Y : ℕ → Ω → 𝓨} [StandardBorelSpace Ω] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : ℕ) :

ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) ⋯ (Y (n + 1)) (history A Y n) P

The feedback at time n + 1 is conditionally independent of the history up to time n given the action at time n + 1.

theorem Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} {mΩ : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {env : Environment 𝓐 𝓨} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] {A : ℕ → Ω → 𝓐} {Y : ℕ → Ω → 𝓨} [StandardBorelSpace Ω] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : ℕ) :

ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) ⋯ (Y (n + 1)) (fun (ω : Ω) => (history A Y n ω, A (n + 1) ω)) P

theorem Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action' {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} {mΩ : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {env : Environment 𝓐 𝓨} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] {A : ℕ → Ω → 𝓐} {Y : ℕ → Ω → 𝓨} [StandardBorelSpace Ω] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : ℕ) (hn : n ≠ 0) :

ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) ⋯ (Y n) (fun (ω : Ω) => (history A Y (n - 1) ω, A n ω)) P

def Learning.obliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ℕ → ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ℕ), ProbabilityTheory.IsMarkovKernel (ν n)] :

Environment 𝓐 𝓨

An oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner.

Equations

Learning.obliviousEnv ν = { feedback := fun (n : ℕ) => ProbabilityTheory.Kernel.prodMkLeft (↥(Finset.Iic n) → 𝓐 × 𝓨) (ν (n + 1)), h_feedback := ⋯, ν0 := ν 0, hp0 := ⋯ }

Instances For

@[simp]

theorem Learning.obliviousEnv_ν0 {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ℕ → ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ℕ), ProbabilityTheory.IsMarkovKernel (ν n)] :

(obliviousEnv ν).ν0 = ν 0

@[simp]

theorem Learning.obliviousEnv_feedback {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ℕ → ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ℕ), ProbabilityTheory.IsMarkovKernel (ν n)] (n : ℕ) :

(obliviousEnv ν).feedback n = ProbabilityTheory.Kernel.prodMkLeft (↥(Finset.Iic n) → 𝓐 × 𝓨) (ν (n + 1))

theorem Learning.feedback_obliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ℕ → ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ℕ), ProbabilityTheory.IsMarkovKernel (ν n)] (n : ℕ) :

(obliviousEnv ν).feedback n = ProbabilityTheory.Kernel.prodMkLeft (↥(Finset.Iic n) → 𝓐 × 𝓨) (ν (n + 1))

theorem Learning.ν0_obliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ℕ → ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ℕ), ProbabilityTheory.IsMarkovKernel (ν n)] :

(obliviousEnv ν).ν0 = ν 0

instance Learning.instIsObliviousEnvObliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ℕ → ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ℕ), ProbabilityTheory.IsMarkovKernel (ν n)] :

IsObliviousEnv (obliviousEnv ν)

@[simp]

theorem Learning.feedbackCondAction_obliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ℕ → ProbabilityTheory.Kernel 𝓐 𝓨) [hν : ∀ (n : ℕ), ProbabilityTheory.IsMarkovKernel (ν n)] (n : ℕ) :

feedbackCondAction (obliviousEnv ν) n = ν n

def Learning.stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [ProbabilityTheory.IsMarkovKernel ν] :

Environment 𝓐 𝓨

A stationary environment, in which the distribution of the next feedback depends only on the last action.

Equations

Learning.stationaryEnv ν = Learning.obliviousEnv fun (x : ℕ) => ν

Instances For

@[simp]

theorem Learning.feedback_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [ProbabilityTheory.IsMarkovKernel ν] (n : ℕ) :

(stationaryEnv ν).feedback n = ProbabilityTheory.Kernel.prodMkLeft (↥(Finset.Iic n) → 𝓐 × 𝓨) ν

@[simp]

theorem Learning.ν0_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [ProbabilityTheory.IsMarkovKernel ν] :

(stationaryEnv ν).ν0 = ν

instance Learning.instIsObliviousEnvStationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [ProbabilityTheory.IsMarkovKernel ν] :

IsObliviousEnv (stationaryEnv ν)

@[simp]

theorem Learning.feedbackCondAction_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [hν : ProbabilityTheory.IsMarkovKernel ν] (n : ℕ) :

feedbackCondAction (stationaryEnv ν) n = ν

theorem Learning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} {mΩ : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → 𝓐} {Y : ℕ → Ω → 𝓨} (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ℕ) :

ProbabilityTheory.HasCondDistrib (Y n) (A n) ν P

The conditional distribution of the feedback at time n given the action at time n is ν.

theorem Learning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} {mΩ : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → 𝓐} {Y : ℕ → Ω → 𝓨} (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ℕ) :

⇑𝓛[Y n | A n; P] =ᵐ[MeasureTheory.Measure.map (A n) P] ⇑ν

The conditional distribution of the feedback at time n given the action at time n is ν.

theorem Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} {mΩ : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → 𝓐} {Y : ℕ → Ω → 𝓨} [StandardBorelSpace Ω] (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ℕ) :

ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) ⋯ (Y (n + 1)) (history A Y n) P

The feedback at time n + 1 is conditionally independent of the history up to time n given the action at time n + 1.

theorem Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} {mΩ : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → 𝓐} {Y : ℕ → Ω → 𝓨} [StandardBorelSpace Ω] (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ℕ) :

ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) ⋯ (Y (n + 1)) (fun (ω : Ω) => (history A Y n ω, A (n + 1) ω)) P

theorem Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action' {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} {mΩ : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → 𝓐} {Y : ℕ → Ω → 𝓨} [StandardBorelSpace Ω] (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ℕ) (hn : n ≠ 0) :

ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) ⋯ (Y n) (fun (ω : Ω) => (history A Y (n - 1) ω, A n ω)) P