Documentation

LeanMachineLearning.SequentialLearning.StationaryEnv

Oblivious and stationary environments #

An oblivious environment is an environment in which the distribution of the next feedback depends only on the last action (and not on the past history). If the kernel that gives the distribution of the next feedback given the last action is the same at every time step, then we say that the environment is stationary.

Main definitions #

We define a Prop-valued typeclass IsObliviousEnv to express that an environment is oblivious, and we define two constructors for oblivious environments.

Typeclass and related definitions:

Constructors for oblivious environments:

class Learning.IsObliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) :

An environment is oblivious if the distribution of the next feedback depends only on the last action and not on the past history.

Instances
    noncomputable def Learning.feedbackCondAction {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) [h_obl : IsObliviousEnv env] (n : ) :

    The kernel representing the conditional distribution of the feedback given the action at time n in an oblivious environment.

    Equations
    Instances For
      instance Learning.instIsMarkovKernelFeedbackCondAction {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) [IsObliviousEnv env] (n : ) :
      theorem Learning.ν0_eq_feedbackCondAction {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) [IsObliviousEnv env] :
      theorem Learning.feedback_eq_feedbackCondAction {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (env : Environment 𝓐 𝓨) [IsObliviousEnv env] (n : ) :
      theorem Learning.IsObliviousEnv.hasCondDistrib_feedback {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {env : Environment 𝓐 𝓨} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] {A : Ω𝓐} {Y : Ω𝓨} [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : ) :
      theorem Learning.IsObliviousEnv.condIndepFun_feedback_history_action {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {env : Environment 𝓐 𝓨} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] {A : Ω𝓐} {Y : Ω𝓨} [StandardBorelSpace Ω] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : ) :

      The feedback at time n + 1 is conditionally independent of the history up to time n given the action at time n + 1.

      theorem Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {env : Environment 𝓐 𝓨} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] {A : Ω𝓐} {Y : Ω𝓨} [StandardBorelSpace Ω] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : ) :
      ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) (Y (n + 1)) (fun (ω : Ω) => (history A Y n ω, A (n + 1) ω)) P
      theorem Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action' {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {env : Environment 𝓐 𝓨} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] {A : Ω𝓐} {Y : Ω𝓨} [StandardBorelSpace Ω] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : ) (hn : n 0) :
      ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) (Y n) (fun (ω : Ω) => (history A Y (n - 1) ω, A n ω)) P
      def Learning.obliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ), ProbabilityTheory.IsMarkovKernel (ν n)] :
      Environment 𝓐 𝓨

      An oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner.

      Equations
      Instances For
        @[simp]
        theorem Learning.obliviousEnv_ν0 {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ), ProbabilityTheory.IsMarkovKernel (ν n)] :
        (obliviousEnv ν).ν0 = ν 0
        @[simp]
        theorem Learning.obliviousEnv_feedback {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ), ProbabilityTheory.IsMarkovKernel (ν n)] (n : ) :
        (obliviousEnv ν).feedback n = ProbabilityTheory.Kernel.prodMkLeft ((Finset.Iic n)𝓐 × 𝓨) (ν (n + 1))
        theorem Learning.feedback_obliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ), ProbabilityTheory.IsMarkovKernel (ν n)] (n : ) :
        (obliviousEnv ν).feedback n = ProbabilityTheory.Kernel.prodMkLeft ((Finset.Iic n)𝓐 × 𝓨) (ν (n + 1))
        theorem Learning.ν0_obliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ), ProbabilityTheory.IsMarkovKernel (ν n)] :
        (obliviousEnv ν).ν0 = ν 0
        instance Learning.instIsObliviousEnvObliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [∀ (n : ), ProbabilityTheory.IsMarkovKernel (ν n)] :
        @[simp]
        theorem Learning.feedbackCondAction_obliviousEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [ : ∀ (n : ), ProbabilityTheory.IsMarkovKernel (ν n)] (n : ) :
        def Learning.stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [ProbabilityTheory.IsMarkovKernel ν] :
        Environment 𝓐 𝓨

        A stationary environment, in which the distribution of the next feedback depends only on the last action.

        Equations
        Instances For
          @[simp]
          theorem Learning.feedback_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [ProbabilityTheory.IsMarkovKernel ν] (n : ) :
          @[simp]
          theorem Learning.ν0_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [ProbabilityTheory.IsMarkovKernel ν] :
          @[simp]
          theorem Learning.feedbackCondAction_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} (ν : ProbabilityTheory.Kernel 𝓐 𝓨) [ : ProbabilityTheory.IsMarkovKernel ν] (n : ) :
          theorem Learning.IsAlgEnvSeq.hasCondDistrib_feedback_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω𝓐} {Y : Ω𝓨} (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ) :

          The conditional distribution of the feedback at time n given the action at time n is ν.

          theorem Learning.IsAlgEnvSeq.condDistrib_feedback_stationaryEnv {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω𝓐} {Y : Ω𝓨} (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ) :
          𝓛[Y n | A n; P] =ᵐ[MeasureTheory.Measure.map (A n) P] ν

          The conditional distribution of the feedback at time n given the action at time n is ν.

          theorem Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω𝓐} {Y : Ω𝓨} [StandardBorelSpace Ω] (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ) :

          The feedback at time n + 1 is conditionally independent of the history up to time n given the action at time n + 1.

          theorem Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω𝓐} {Y : Ω𝓨} [StandardBorelSpace Ω] (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ) :
          ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) (Y (n + 1)) (fun (ω : Ω) => (history A Y n ω, A (n + 1) ω)) P
          theorem Learning.IsAlgEnvSeq.condIndepFun_feedback_history_action_action' {𝓐 : Type u_1} {𝓨 : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {m𝓨 : MeasurableSpace 𝓨} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace 𝓨] [Nonempty 𝓨] {alg : Algorithm 𝓐 𝓨} {ν : ProbabilityTheory.Kernel 𝓐 𝓨} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω𝓐} {Y : Ω𝓨} [StandardBorelSpace Ω] (h : IsAlgEnvSeq A Y alg (stationaryEnv ν) P) (n : ) (hn : n 0) :
          ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) (Y n) (fun (ω : Ω) => (history A Y (n - 1) ω, A n ω)) P