Documentation

LeanMachineLearning.SequentialLearning.EvaluationEnv

Function evaluation environments #

We define two environments, onlineEvalEnv and evalEnv, where the feedback is given by evaluating a measurable function at the chosen action. The first one allows the function to change at every time step, while the second one uses a fixed function at every time step.

Main definitions #

They both satisfy the typeclasses IsObliviousEnv and IsDeterministicEnv.

Main statements #

noncomputable def Learning.onlineEvalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (g : โ„• โ†’ ๐“ โ†’ ๐“จ) (hg : โˆ€ (n : โ„•), Measurable (g n)) :
Environment ๐“ ๐“จ

The evaluation environment where the feedback is given by evaluating a fixed measurable function f at the chosen action.

Equations
Instances For
    instance Learning.instIsObliviousEnvOnlineEvalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {g : โ„• โ†’ ๐“ โ†’ ๐“จ} {hg : โˆ€ (n : โ„•), Measurable (g n)} :
    instance Learning.instIsDeterministicEnvOnlineEvalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {g : โ„• โ†’ ๐“ โ†’ ๐“จ} {hg : โˆ€ (n : โ„•), Measurable (g n)} :
    @[simp]
    theorem Learning.feedbackCondAction_onlineEvalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {g : โ„• โ†’ ๐“ โ†’ ๐“จ} {hg : โˆ€ (n : โ„•), Measurable (g n)} (n : โ„•) :
    @[simp]
    theorem Learning.feedbackFunZero_onlineEvalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {g : โ„• โ†’ ๐“ โ†’ ๐“จ} {hg : โˆ€ (n : โ„•), Measurable (g n)} [MeasurableSpace.SeparatesPoints ๐“จ] :
    @[simp]
    theorem Learning.feedbackFun_onlineEvalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {g : โ„• โ†’ ๐“ โ†’ ๐“จ} {hg : โˆ€ (n : โ„•), Measurable (g n)} [MeasurableSpace.SeparatesPoints ๐“จ] (n : โ„•) :
    feedbackFun (onlineEvalEnv g hg) n = fun (p : (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) ร— ๐“) => g (n + 1) p.2
    theorem Learning.hascondDistrib_feedback_onlineEvalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {g : โ„• โ†’ ๐“ โ†’ ๐“จ} {hg : โˆ€ (n : โ„•), Measurable (g n)} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} (h : IsAlgEnvSeq A Y alg (onlineEvalEnv g hg) P) (n : โ„•) :
    theorem Learning.feedback_onlineEvalEnv_ae_eq_eval_action {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {g : โ„• โ†’ ๐“ โ†’ ๐“จ} {hg : โˆ€ (n : โ„•), Measurable (g n)} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} (h : IsAlgEnvSeq A Y alg (onlineEvalEnv g hg) P) (n : โ„•) :
    Y n =แต[P] g n โˆ˜ A n
    theorem Learning.forall_feedback_onlineEvalEnv_ae_eq_eval_action {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {g : โ„• โ†’ ๐“ โ†’ ๐“จ} {hg : โˆ€ (n : โ„•), Measurable (g n)} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} (h : IsAlgEnvSeq A Y alg (onlineEvalEnv g hg) P) :
    โˆ€แต (ฯ‰ : ฮฉ) โˆ‚P, โˆ€ (n : โ„•), Y n ฯ‰ = g n (A n ฯ‰)
    noncomputable def Learning.evalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (f : ๐“ โ†’ ๐“จ) (hf : Measurable f) :
    Environment ๐“ ๐“จ

    The evaluation environment where the feedback is given by evaluating a fixed measurable function f at the chosen action.

    Equations
    Instances For
      instance Learning.instIsObliviousEnvEvalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {f : ๐“ โ†’ ๐“จ} {hf : Measurable f} :
      instance Learning.instIsDeterministicEnvEvalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {f : ๐“ โ†’ ๐“จ} {hf : Measurable f} :
      @[simp]
      theorem Learning.feedbackCondAction_evalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {f : ๐“ โ†’ ๐“จ} {hf : Measurable f} (n : โ„•) :
      @[simp]
      theorem Learning.feedbackFunZero_evalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {f : ๐“ โ†’ ๐“จ} {hf : Measurable f} [MeasurableSpace.SeparatesPoints ๐“จ] :
      @[simp]
      theorem Learning.feedbackFun_evalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {f : ๐“ โ†’ ๐“จ} {hf : Measurable f} [MeasurableSpace.SeparatesPoints ๐“จ] (n : โ„•) :
      feedbackFun (evalEnv f hf) n = fun (p : (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) ร— ๐“) => f p.2
      theorem Learning.hascondDistrib_feedback_evalEnv {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {f : ๐“ โ†’ ๐“จ} {hf : Measurable f} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} (h : IsAlgEnvSeq A Y alg (evalEnv f hf) P) (n : โ„•) :
      theorem Learning.feedback_evalEnv_ae_eq_eval_action {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {f : ๐“ โ†’ ๐“จ} {hf : Measurable f} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} (h : IsAlgEnvSeq A Y alg (evalEnv f hf) P) (n : โ„•) :
      Y n =แต[P] f โˆ˜ A n
      theorem Learning.forall_feedback_evalEnv_ae_eq_eval_action {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {f : ๐“ โ†’ ๐“จ} {hf : Measurable f} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} (h : IsAlgEnvSeq A Y alg (evalEnv f hf) P) :
      โˆ€แต (ฯ‰ : ฮฉ) โˆ‚P, โˆ€ (n : โ„•), Y n ฯ‰ = f (A n ฯ‰)
      theorem Learning.feedback_evalEnv_ae_eq_eval_action_comp {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {f : ๐“ โ†’ ๐“จ} {hf : Measurable f} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} {ฮฒ : Type u_4} (h : IsAlgEnvSeq A Y alg (evalEnv f hf) P) {n : โ„•} (g : (โ†ฅ(Finset.Iic n) โ†’ ๐“จ) โ†’ ฮฒ) :
      โˆ€แต (ฯ‰ : ฮฉ) โˆ‚P, (g fun (i : โ†ฅ(Finset.Iic n)) => Y (โ†‘i) ฯ‰) = g fun (i : โ†ฅ(Finset.Iic n)) => f (A (โ†‘i) ฯ‰)