Documentation

LeanMachineLearning.Online.Bandit.RewardByCountMeasure

Laws of stepsUntil and rewardByCount #

theorem Bandits.hasLaw_Z {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] (a : 𝓐) (m : β„•) :
ProbabilityTheory.HasLaw (fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => Ο‰.2 m a) (Ξ½ a) (P.prod (streamMeasure Ξ½))

Law of Y conditioned on the event s.

Equations
  • One or more equations did not get rendered due to their size.
Instances For

    Law of Y conditioned on the event that X is in s.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Law of Y conditioned on the event that X equals x.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For
        theorem Bandits.condDistrib_reward'' {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (n : β„•) :
        ⇑𝓛[fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => R n Ο‰.1 | fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => A n Ο‰.1; P.prod (streamMeasure Ξ½)] =ᡐ[MeasureTheory.Measure.map (fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => A n Ο‰.1) (P.prod (streamMeasure Ξ½))] ⇑ν
        theorem Bandits.reward_cond_action {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n : β„•) (hΞΌa : (MeasureTheory.Measure.map (fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => A n Ο‰.1) (P.prod (streamMeasure Ξ½))) {a} β‰  0) :
        𝓛[fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => R n Ο‰.1 | fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => A n Ο‰.1 in {a}; P.prod (streamMeasure Ξ½)] = Ξ½ a
        theorem Bandits.condIndepFun_reward_stepsUntil_action' {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace Ξ©] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) :
        ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) β‹― (R n) ({Ο‰ : Ξ© | Learning.stepsUntil A a m Ο‰ = ↑n}.indicator fun (x : Ξ©) => 1) P
        theorem Bandits.condIndepFun_reward_stepsUntil_action {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) :
        ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => A n Ο‰.1) m𝓐) β‹― (fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => R n Ο‰.1) ({Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ) | Learning.stepsUntil A a m Ο‰.1 = ↑n}.indicator fun (x : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => 1) (P.prod (streamMeasure Ξ½))
        theorem Bandits.reward_cond_stepsUntil {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) (hm : m β‰  0) (hΞΌn : (P.prod (streamMeasure Ξ½)) ((fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => Learning.stepsUntil A a m Ο‰.1) ⁻¹' {↑n}) β‰  0) :
        𝓛[fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => R n Ο‰.1 | fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => Learning.stepsUntil A a m Ο‰.1 in {↑n}; P.prod (streamMeasure Ξ½)] = Ξ½ a
        theorem Bandits.condDistrib_rewardByCount_stepsUntil {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m : β„•) (hm : m β‰  0) :
        ⇑𝓛[Learning.rewardByCount A R a m | fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => Learning.stepsUntil A a m Ο‰.1; P.prod (streamMeasure Ξ½)] =ᡐ[MeasureTheory.Measure.map (fun (Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ)) => Learning.stepsUntil A a m Ο‰.1) (P.prod (streamMeasure Ξ½))] ⇑(ProbabilityTheory.Kernel.const β„•βˆž (Ξ½ a))

        The conditional distribution of the reward received at the m-th pull of action a given the time at which number of pulls is m is the constant kernel with value Ξ½ a.

        theorem Bandits.hasLaw_rewardByCount {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m : β„•) (hm : m β‰  0) :

        The reward received at the m-th pull of action a has law Ξ½ a.

        theorem Bandits.identDistrib_rewardByCount {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n m : β„•) (hn : n β‰  0) (hm : m β‰  0) :
        theorem Bandits.identDistrib_rewardByCount_id {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n : β„•) (hn : n β‰  0) :
        theorem Bandits.identDistrib_rewardByCount_eval {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] [StandardBorelSpace 𝓐] [Nonempty 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n m : β„•) (hn : n β‰  0) :
        ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) (fun (Ο‰ : β„• β†’ 𝓐 β†’ ℝ) => Ο‰ m a) (P.prod (streamMeasure Ξ½)) (streamMeasure Ξ½)