LeanMachineLearning exposition

2.9.Β Online.Bandit.RewardByCountMeasureπŸ”—

Laws of stepsUntil and rewardByCount

Module LeanMachineLearning.Online.Bandit.RewardByCountMeasure contains 14 exposed declarations.

hasLaw_ZπŸ”—

LemmaBandits.hasLaw_Z

No docstring.

πŸ”—theorem
Bandits.hasLaw_Z.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] (a : 𝓐) (m : β„•) : ProbabilityTheory.HasLaw (fun Ο‰ => Prod.snd Ο‰ m a) (Ξ½ a) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))
Bandits.hasLaw_Z.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] (a : 𝓐) (m : β„•) : ProbabilityTheory.HasLaw (fun Ο‰ => Prod.snd Ο‰ m a) (Ξ½ a) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))

Code

lemma hasLaw_Z (a : 𝓐) (m : β„•) :
  HasLaw (fun Ο‰ ↦ Ο‰.2 m a) (Ξ½ a) 𝔓 where
  map_eq
Type uses (1)
Body uses (1)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
    calc (𝔓).map (fun Ο‰ ↦ Ο‰.2 m a)
    _ = ((𝔓).snd).map (fun Ο‰ ↦ Ο‰ m a) := by
      rw [Measure.snd, Measure.map_map (by fun_prop) (by fun_prop)]
      rfl
    _ = (streamMeasure Ξ½).map (fun Ο‰ ↦ Ο‰ m a) := by simp
    _ = ((Measure.infinitePi fun _ ↦ Measure.infinitePi Ξ½).map (fun Ο‰ ↦ Ο‰ m)).map
        (fun Ο‰ ↦ Ο‰ a) := by
      rw [streamMeasure, Measure.map_map (by fun_prop) (by fun_prop)]
      rfl
    _ = Ξ½ a := by simp_rw [(measurePreserving_eval_infinitePi _ _).map_eq]

term𝓛[_|_;_]πŸ”—

DefinitionBandits.Β«term𝓛[_|_;_]Β»

Law of Y conditioned on the event s.

πŸ”—def
Bandits.Β«term𝓛[_|_;_]Β» : Lean.ParserDescr
Bandits.Β«term𝓛[_|_;_]Β» : Lean.ParserDescr

Code

notation "𝓛[" Y " | " s "; " ΞΌ "]" => Measure.map Y (ΞΌ[|s])

Actions: Source Β· Open Issue

term𝓛[_|_In_;_]πŸ”—

DefinitionBandits.Β«term𝓛[_|_In_;_]Β»

Law of Y conditioned on the event that X is in s.

πŸ”—def
Bandits.Β«term𝓛[_|_In_;_]Β» : Lean.ParserDescr
Bandits.Β«term𝓛[_|_In_;_]Β» : Lean.ParserDescr

Code

notation "𝓛[" Y " | " X " in " s "; " ΞΌ "]" => Measure.map Y (ΞΌ[|X ⁻¹' s])

Actions: Source Β· Open Issue

term𝓛[_|_←_;_]πŸ”—

DefinitionBandits.Β«term𝓛[_|_←_;_]Β»

Law of Y conditioned on the event that X equals x.

πŸ”—def
Bandits.Β«term𝓛[_|_←_;_]Β» : Lean.ParserDescr
Bandits.Β«term𝓛[_|_←_;_]Β» : Lean.ParserDescr

Code

notation "𝓛[" Y " | " X " ← " x "; " ΞΌ "]" => Measure.map Y (ΞΌ[|X ⁻¹' {x}])

Actions: Source Β· Open Issue

condDistrib_reward''πŸ”—

LemmaBandits.condDistrib_reward''

No docstring.

πŸ”—theorem
Bandits.condDistrib_reward''.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (n : β„•) : ⇑𝓛[fun Ο‰ => R n (Prod.fst Ο‰) | fun Ο‰ => A n (Prod.fst Ο‰); MeasureTheory.Measure.prod P (streamMeasure Ξ½)] =ᡐ[MeasureTheory.Measure.map (fun Ο‰ => A n (Prod.fst Ο‰)) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))] ⇑ν
Bandits.condDistrib_reward''.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (n : β„•) : ⇑𝓛[fun Ο‰ => R n (Prod.fst Ο‰) | fun Ο‰ => A n (Prod.fst Ο‰); MeasureTheory.Measure.prod P (streamMeasure Ξ½)] =ᡐ[MeasureTheory.Measure.map (fun Ο‰ => A n (Prod.fst Ο‰)) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))] ⇑ν

Code

lemma condDistrib_reward'' [Countable 𝓐]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P) (n : β„•) :
    𝓛[fun Ο‰ ↦ R n Ο‰.1 | fun Ο‰ ↦ A n Ο‰.1; 𝔓] =ᡐ[(𝔓).map (fun Ο‰ ↦ A n Ο‰.1)] Ξ½
Type uses (5)
Body uses (2)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  have hA := h.measurable_action
  have hR := h.measurable_feedback
  have h_ra' : 𝓛[R n | A n; P] =ᡐ[P.map (A n)] Ξ½ := h.condDistrib_feedback_stationaryEnv n
  have h_law : (𝔓).map (fun Ο‰ ↦ A n Ο‰.1) = P.map (A n) := by
    change ((𝔓).map (A n ∘ Prod.fst)) = _
    rw [← Measure.map_map (by fun_prop) (by fun_prop), ← Measure.fst, Measure.fst_prod]
  rw [h_law]
  have h_prod : 𝓛[fun Ο‰ ↦ R n Ο‰.1 | fun Ο‰ ↦ A n Ο‰.1; 𝔓]
      =ᡐ[P.map (A n)] 𝓛[R n | A n; P] :=
    condDistrib_fst_prod _ (by fun_prop) _
  filter_upwards [h_ra', h_prod] with Ο‰ h_eq h_prod
  rw [h_prod, h_eq]

reward_cond_actionπŸ”—

LemmaBandits.reward_cond_action

No docstring.

πŸ”—theorem
Bandits.reward_cond_action.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n : β„•) (hΞΌa : (MeasureTheory.Measure.map (fun Ο‰ => A n (Prod.fst Ο‰)) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))) {a} β‰  0) : 𝓛[fun Ο‰ => R n (Prod.fst Ο‰) | fun Ο‰ => A n (Prod.fst Ο‰) in {a}; MeasureTheory.Measure.prod P (streamMeasure Ξ½)] = Ξ½ a
Bandits.reward_cond_action.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n : β„•) (hΞΌa : (MeasureTheory.Measure.map (fun Ο‰ => A n (Prod.fst Ο‰)) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))) {a} β‰  0) : 𝓛[fun Ο‰ => R n (Prod.fst Ο‰) | fun Ο‰ => A n (Prod.fst Ο‰) in {a}; MeasureTheory.Measure.prod P (streamMeasure Ξ½)] = Ξ½ a

Code

lemma reward_cond_action [Countable 𝓐]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P) (a : 𝓐) (n : β„•)
    (hΞΌa : (𝔓).map (fun Ο‰ ↦ A n Ο‰.1) {a} β‰  0) :
    𝓛[fun Ο‰ ↦ R n Ο‰.1 | fun Ο‰ ↦ A n Ο‰.1 ← a; 𝔓] = Ξ½ a
Type uses (4)
Body uses (5)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  have hA := h.measurable_action
  have hR := h.measurable_feedback
  have h_ra : 𝓛[fun Ο‰ ↦ R n Ο‰.1 | fun Ο‰ ↦ A n Ο‰.1; 𝔓] =ᡐ[(𝔓).map (fun Ο‰ ↦ A n Ο‰.1)] Ξ½ :=
    condDistrib_reward'' h n
  have h_eq := condDistrib_ae_eq_cond (ΞΌ := 𝔓)
    (X := fun Ο‰ ↦ A n Ο‰.1) (Y := fun Ο‰ ↦ R n Ο‰.1) (by fun_prop) (by fun_prop)
  rw [Filter.EventuallyEq, ae_iff_of_countable] at h_ra h_eq
  specialize h_ra a hΞΌa
  specialize h_eq a hΞΌa
  rw [h_ra] at h_eq
  exact h_eq.symm

condIndepFun_reward_stepsUntil_action'πŸ”—

LemmaBandits.condIndepFun_reward_stepsUntil_action'

No docstring.

πŸ”—theorem
Bandits.condIndepFun_reward_stepsUntil_action'.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) β‹― (R n) (Set.indicator {Ο‰ | Learning.stepsUntil A a m Ο‰ = ↑n} fun x => 1) P
Bandits.condIndepFun_reward_stepsUntil_action'.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) β‹― (R n) (Set.indicator {Ο‰ | Learning.stepsUntil A a m Ο‰ = ↑n} fun x => 1) P

Code

lemma condIndepFun_reward_stepsUntil_action' [StandardBorelSpace Ξ©]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) :
    R n βŸ‚α΅’[A n, h.measurable_action n; P] {Ο‰ | stepsUntil A a m Ο‰ = ↑n}.indicator (fun _ ↦ 1)
Type uses (5)
Body uses (6)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  -- the indicator of `stepsUntil ... = n` is a function of `hist (n-1)` and `action n`.
  -- It thus suffices to use the independence of `reward n` and `hist (n-1)` conditionally
  -- on `action n`.
  have hA := h.measurable_action
  have hR := h.measurable_feedback
  by_cases hn : n = 0
  Β· have h_indep : R 0 βŸ‚α΅’[A 0, hA 0; P] A 0 :=
      condIndepFun_self_right (by fun_prop) (by fun_prop)
    simp only [hn]
    refine h_indep.of_measurable_right (hX := hA 0) ?_
    exact measurable_comap_indicator_stepsUntil_eq_zero a m
  Β· have h_indep : R n βŸ‚α΅’[A n, hA n; P] fun Ο‰ ↦ (history A R (n - 1) Ο‰, A n Ο‰) :=
      IsAlgEnvSeq.condIndepFun_feedback_history_action_action' h n (by grind)
    refine h_indep.of_measurable_right (hX := hA n) ?_
    exact measurable_comap_indicator_stepsUntil_eq hA hR a m n

condIndepFun_reward_stepsUntil_actionπŸ”—

LemmaBandits.condIndepFun_reward_stepsUntil_action

No docstring.

πŸ”—theorem
Bandits.condIndepFun_reward_stepsUntil_action.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (fun Ο‰ => A n (Prod.fst Ο‰)) m𝓐) β‹― (fun Ο‰ => R n (Prod.fst Ο‰)) (Set.indicator {Ο‰ | Learning.stepsUntil A a m (Prod.fst Ο‰) = ↑n} fun x => 1) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))
Bandits.condIndepFun_reward_stepsUntil_action.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (fun Ο‰ => A n (Prod.fst Ο‰)) m𝓐) β‹― (fun Ο‰ => R n (Prod.fst Ο‰)) (Set.indicator {Ο‰ | Learning.stepsUntil A a m (Prod.fst Ο‰) = ↑n} fun x => 1) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))

Code

lemma condIndepFun_reward_stepsUntil_action [StandardBorelSpace Ξ©] [Countable 𝓐]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P)
    (a : 𝓐) (m n : β„•) :
    CondIndepFun (m𝓐.comap (fun Ο‰ ↦ A n Ο‰.1)) ((h.measurable_action n).comp measurable_fst).comap_le
      (fun Ο‰ ↦ R n Ο‰.1) ({Ο‰ | stepsUntil A a m Ο‰.1 = ↑n}.indicator (fun _ ↦ 1)) 𝔓
Type uses (7)
Body uses (4)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  have hA := h.measurable_action
  have hR := h.measurable_feedback
  exact condIndepFun_fst_prod (Ξ½ := streamMeasure Ξ½)
    (measurable_indicator_stepsUntil_eq hA hR a m n) (by fun_prop) (by fun_prop)
    (condIndepFun_reward_stepsUntil_action' h a m n)

reward_cond_stepsUntilπŸ”—

LemmaBandits.reward_cond_stepsUntil

No docstring.

πŸ”—theorem
Bandits.reward_cond_stepsUntil.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) (hm : m β‰  0) (hΞΌn : (MeasureTheory.Measure.prod P (streamMeasure Ξ½)) ((fun Ο‰ => Learning.stepsUntil A a m (Prod.fst Ο‰)) ⁻¹' {↑n}) β‰  0) : 𝓛[fun Ο‰ => R n (Prod.fst Ο‰) | fun Ο‰ => Learning.stepsUntil A a m (Prod.fst Ο‰) in {↑n}; MeasureTheory.Measure.prod P (streamMeasure Ξ½)] = Ξ½ a
Bandits.reward_cond_stepsUntil.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•) (hm : m β‰  0) (hΞΌn : (MeasureTheory.Measure.prod P (streamMeasure Ξ½)) ((fun Ο‰ => Learning.stepsUntil A a m (Prod.fst Ο‰)) ⁻¹' {↑n}) β‰  0) : 𝓛[fun Ο‰ => R n (Prod.fst Ο‰) | fun Ο‰ => Learning.stepsUntil A a m (Prod.fst Ο‰) in {↑n}; MeasureTheory.Measure.prod P (streamMeasure Ξ½)] = Ξ½ a

Code

lemma reward_cond_stepsUntil [StandardBorelSpace Ξ©] [Countable 𝓐]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P) (a : 𝓐) (m n : β„•)
    (hm : m β‰  0) (hΞΌn : 𝔓 ((fun Ο‰ ↦ stepsUntil A a m Ο‰.1) ⁻¹' {↑n}) β‰  0) :
    𝓛[fun Ο‰ ↦ R n Ο‰.1 | fun Ο‰ ↦ stepsUntil A a m Ο‰.1 ← ↑n; 𝔓] = Ξ½ a
Type uses (5)
Body uses (8)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  have hA := h.measurable_action
  have hR := h.measurable_feedback
  have hΞΌna :
      𝔓 ((fun Ο‰ ↦ stepsUntil A a m Ο‰.1) ⁻¹' {↑n} ∩ (fun Ο‰ ↦ A n Ο‰.1) ⁻¹' {a}) β‰  0 := by
    suffices ((fun Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ) ↦
          stepsUntil A a m Ο‰.1) ⁻¹' {↑n} ∩ (fun Ο‰ ↦ A n Ο‰.1) ⁻¹' {a})
        = (fun Ο‰ ↦ stepsUntil A a m Ο‰.1) ⁻¹' {↑n} by simpa [this] using hΞΌn
    ext Ο‰
    simp only [Set.mem_inter_iff, Set.mem_preimage, Set.mem_singleton_iff, and_iff_left_iff_imp]
    exact action_eq_of_stepsUntil_eq_coe hm
  have hΞΌa : (𝔓).map (fun Ο‰ ↦ A n Ο‰.1) {a} β‰  0 := by
    rw [Measure.map_apply (by fun_prop) (measurableSet_singleton _)]
    refine fun h_zero ↦ hΞΌn (measure_mono_null (fun Ο‰ ↦ ?_) h_zero)
    simp only [Set.mem_preimage, Set.mem_singleton_iff]
    exact action_eq_of_stepsUntil_eq_coe hm
  calc 𝓛[fun Ο‰ ↦ R n Ο‰.1 | fun Ο‰ ↦ stepsUntil A a m Ο‰.1 ← (n : β„•βˆž); 𝔓]
  _ = (𝔓[|(fun Ο‰ ↦ stepsUntil A a m Ο‰.1) ⁻¹' {↑n} ∩ (fun Ο‰ ↦ A n Ο‰.1) ⁻¹' {a}]).map
      (fun Ο‰ ↦ R n Ο‰.1) := by
    congr with Ο‰
    simp only [Set.mem_preimage, Set.mem_singleton_iff, Set.mem_inter_iff, iff_self_and]
    exact action_eq_of_stepsUntil_eq_coe hm
  _ = (𝔓[|(fun Ο‰ ↦ A n Ο‰.1) ⁻¹' {a}
      ∩ {Ο‰ : Ξ© Γ— (β„• β†’ 𝓐 β†’ ℝ) | stepsUntil A a m Ο‰.1 = ↑n}.indicator 1 ⁻¹' {1} ]).map
      (fun Ο‰ ↦ R n Ο‰.1) := by
    congr 2 with Ο‰
    simp only [Set.mem_inter_iff, Set.mem_preimage, Set.mem_singleton_iff, Set.indicator_apply,
      Set.mem_setOf_eq, Pi.one_apply, ite_eq_left_iff, zero_ne_one, imp_false, Decidable.not_not]
    rw [and_comm]
  _ = 𝓛[fun Ο‰ ↦ R n Ο‰.1 | fun Ο‰ ↦ A n Ο‰.1 ← a; 𝔓] := by
    rw [cond_of_condIndepFun (by fun_prop)]
    Β· exact condIndepFun_reward_stepsUntil_action h a m n
    Β· refine measurable_one.indicator ?_
      exact measurableSet_eq_fun (by fun_prop) (by fun_prop)
    Β· fun_prop
    Β· convert hΞΌna using 2
      rw [Set.inter_comm]
      congr 1 with Ο‰
      simp [Set.indicator_apply]
  _ = Ξ½ a := reward_cond_action h a n hΞΌa

condDistrib_rewardByCount_stepsUntilπŸ”—

LemmaBandits.condDistrib_rewardByCount_stepsUntil

The conditional distribution of the reward received at the m-th pull of action a given the time at which number of pulls is m is the constant kernel with value Ξ½ a.

πŸ”—theorem
Bandits.condDistrib_rewardByCount_stepsUntil.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m : β„•) (hm : m β‰  0) : ⇑𝓛[Learning.rewardByCount A R a m | fun Ο‰ => Learning.stepsUntil A a m (Prod.fst Ο‰); MeasureTheory.Measure.prod P (streamMeasure Ξ½)] =ᡐ[MeasureTheory.Measure.map (fun Ο‰ => Learning.stepsUntil A a m (Prod.fst Ο‰)) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))] ⇑(ProbabilityTheory.Kernel.const β„•βˆž (Ξ½ a))
Bandits.condDistrib_rewardByCount_stepsUntil.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m : β„•) (hm : m β‰  0) : ⇑𝓛[Learning.rewardByCount A R a m | fun Ο‰ => Learning.stepsUntil A a m (Prod.fst Ο‰); MeasureTheory.Measure.prod P (streamMeasure Ξ½)] =ᡐ[MeasureTheory.Measure.map (fun Ο‰ => Learning.stepsUntil A a m (Prod.fst Ο‰)) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))] ⇑(ProbabilityTheory.Kernel.const β„•βˆž (Ξ½ a))

Code

lemma condDistrib_rewardByCount_stepsUntil [StandardBorelSpace Ξ©] [Countable 𝓐]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P) (a : 𝓐) (m : β„•) (hm : m β‰  0) :
    condDistrib (rewardByCount A R a m) (fun Ο‰ ↦ stepsUntil A a m Ο‰.1) 𝔓
      =ᡐ[(𝔓).map (fun Ο‰ ↦ stepsUntil A a m Ο‰.1)] Kernel.const _ (Ξ½ a)
Type uses (7)
Body uses (10)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  have hA := h.measurable_action
  have hR := h.measurable_feedback
  refine (condDistrib_ae_eq_cond (ΞΌ := 𝔓)
    (X := fun Ο‰ ↦ stepsUntil A a m Ο‰.1) (by fun_prop) (by fun_prop)).trans ?_
  rw [Filter.EventuallyEq, ae_iff_of_countable]
  intro n hn
  simp only [Kernel.const_apply]
  cases n with
  | top =>
    rw [Measure.map_congr (g := fun Ο‰ ↦ Ο‰.2 m a)]
    swap
    Β· refine ae_cond_of_forall_mem ((measurableSet_singleton _).preimage (by fun_prop)) ?_
      simp only [Set.mem_preimage, Set.mem_singleton_iff]
      exact fun Ο‰ ↦ rewardByCount_of_stepsUntil_eq_top
    rw [cond_of_indepFun _ (by fun_prop) (by fun_prop) (measurableSet_singleton _)]
    Β· exact (hasLaw_Z a m).map_eq
    Β· rwa [Measure.map_apply (by fun_prop) (measurableSet_singleton _)] at hn
    Β· exact indepFun_prod (X := fun Ο‰ : Ξ© ↦ stepsUntil A a m Ο‰)
        (Y := fun Ο‰ : β„• β†’ 𝓐 β†’ ℝ ↦ Ο‰ m a) (by fun_prop) (by fun_prop)
  | coe n =>
    rw [Measure.map_congr (g := fun Ο‰ ↦ R n Ο‰.1)]
    swap
    Β· refine ae_cond_of_forall_mem ((measurableSet_singleton _).preimage (by fun_prop)) ?_
      simp only [Set.mem_preimage, Set.mem_singleton_iff]
      exact fun Ο‰ ↦ rewardByCount_of_stepsUntil_eq_coe
    refine reward_cond_stepsUntil h a m n hm ?_
    rwa [Measure.map_apply (by fun_prop) (measurableSet_singleton _)] at hn

hasLaw_rewardByCountπŸ”—

LemmaBandits.hasLaw_rewardByCount

The reward received at the m-th pull of action a has law Ξ½ a.

πŸ”—theorem
Bandits.hasLaw_rewardByCount.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m : β„•) (hm : m β‰  0) : ProbabilityTheory.HasLaw (Learning.rewardByCount A R a m) (Ξ½ a) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))
Bandits.hasLaw_rewardByCount.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (m : β„•) (hm : m β‰  0) : ProbabilityTheory.HasLaw (Learning.rewardByCount A R a m) (Ξ½ a) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))

Code

lemma hasLaw_rewardByCount [StandardBorelSpace Ξ©] [Countable 𝓐]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P) (a : 𝓐) (m : β„•) (hm : m β‰  0) :
    HasLaw (rewardByCount A R a m) (Ξ½ a) 𝔓 where
  aemeasurable
Type uses (5)
Body uses (7)
Used by (2)

Actions: Source Β· Open Issue

Proof
(measurable_rewardByCount h.measurable_action h.measurable_feedback a m).aemeasurable
  map_eq := by
    have hA := h.measurable_action
    have hR := h.measurable_feedback
    have h_condDistrib :
        condDistrib (rewardByCount A R a m) (fun Ο‰ ↦ stepsUntil A a m Ο‰.1) 𝔓
        =ᡐ[(𝔓).map (fun Ο‰ ↦ stepsUntil A a m Ο‰.1)]
          Kernel.const _ (Ξ½ a) := condDistrib_rewardByCount_stepsUntil h a m hm
    calc (𝔓).map (rewardByCount A R a m)
    _ = (condDistrib (rewardByCount A R a m) (fun Ο‰ ↦ stepsUntil A a m Ο‰.1) 𝔓)
        βˆ˜β‚˜ ((𝔓).map (fun Ο‰ ↦ stepsUntil A a m Ο‰.1)) := by
      rw [condDistrib_comp_map (by fun_prop) (by fun_prop)]
    _ = (Kernel.const _ (Ξ½ a)) βˆ˜β‚˜ ((𝔓).map (fun Ο‰ ↦ stepsUntil A a m Ο‰.1)) :=
      Measure.comp_congr h_condDistrib
    _ = Ξ½ a := by
      have : IsProbabilityMeasure ((𝔓).map (fun Ο‰ ↦ stepsUntil A a m Ο‰.1)) :=
        Measure.isProbabilityMeasure_map (by fun_prop)
      simp

identDistrib_rewardByCountπŸ”—

LemmaBandits.identDistrib_rewardByCount

No docstring.

πŸ”—theorem
Bandits.identDistrib_rewardByCount.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n m : β„•) (hn : n β‰  0) (hm : m β‰  0) : ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) (Learning.rewardByCount A R a m) (MeasureTheory.Measure.prod P (streamMeasure Ξ½)) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))
Bandits.identDistrib_rewardByCount.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n m : β„•) (hn : n β‰  0) (hm : m β‰  0) : ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) (Learning.rewardByCount A R a m) (MeasureTheory.Measure.prod P (streamMeasure Ξ½)) (MeasureTheory.Measure.prod P (streamMeasure Ξ½))

Code

lemma identDistrib_rewardByCount [StandardBorelSpace Ξ©] [Countable 𝓐]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P) (a : 𝓐) (n m : β„•)
    (hn : n β‰  0) (hm : m β‰  0) :
    IdentDistrib (rewardByCount A R a n) (rewardByCount A R a m) 𝔓 𝔓 where
  aemeasurable_fst
Type uses (5)
Body uses (4)

Actions: Source Β· Open Issue

Proof
(measurable_rewardByCount h.measurable_action h.measurable_feedback a n).aemeasurable
  aemeasurable_snd :=
    (measurable_rewardByCount h.measurable_action h.measurable_feedback a m).aemeasurable
  map_eq := by rw [(hasLaw_rewardByCount h a n hn).map_eq, (hasLaw_rewardByCount h a m hm).map_eq]

identDistrib_rewardByCount_idπŸ”—

LemmaBandits.identDistrib_rewardByCount_id

No docstring.

πŸ”—theorem
Bandits.identDistrib_rewardByCount_id.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n : β„•) (hn : n β‰  0) : ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) id (MeasureTheory.Measure.prod P (streamMeasure Ξ½)) (Ξ½ a)
Bandits.identDistrib_rewardByCount_id.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n : β„•) (hn : n β‰  0) : ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) id (MeasureTheory.Measure.prod P (streamMeasure Ξ½)) (Ξ½ a)

Code

lemma identDistrib_rewardByCount_id [StandardBorelSpace Ξ©] [Countable 𝓐]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P) (a : 𝓐) (n : β„•) (hn : n β‰  0) :
    IdentDistrib (rewardByCount A R a n) id 𝔓 (Ξ½ a) where
  aemeasurable_fst
Type uses (5)
Body uses (4)
Used by (1)

Actions: Source Β· Open Issue

Proof
(measurable_rewardByCount h.measurable_action h.measurable_feedback a n).aemeasurable
  aemeasurable_snd := Measurable.aemeasurable <| by fun_prop
  map_eq := by rw [(hasLaw_rewardByCount h a n hn).map_eq, Measure.map_id]

identDistrib_rewardByCount_evalπŸ”—

LemmaBandits.identDistrib_rewardByCount_eval

No docstring.

πŸ”—theorem
Bandits.identDistrib_rewardByCount_eval.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n m : β„•) (hn : n β‰  0) : ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) (fun Ο‰ => Ο‰ m a) (MeasureTheory.Measure.prod P (streamMeasure Ξ½)) (streamMeasure Ξ½)
Bandits.identDistrib_rewardByCount_eval.{u_1, u_2} {𝓐 : Type u_1} {Ξ© : Type u_2} {m𝓐 : MeasurableSpace 𝓐} {mΞ© : MeasurableSpace Ξ©} [DecidableEq 𝓐] {A : β„• β†’ Ξ© β†’ 𝓐} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm 𝓐 ℝ} {Ξ½ : ProbabilityTheory.Kernel 𝓐 ℝ} [ProbabilityTheory.IsMarkovKernel Ξ½] [StandardBorelSpace 𝓐] [Nonempty 𝓐] [StandardBorelSpace Ξ©] [Countable 𝓐] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P) (a : 𝓐) (n m : β„•) (hn : n β‰  0) : ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) (fun Ο‰ => Ο‰ m a) (MeasureTheory.Measure.prod P (streamMeasure Ξ½)) (streamMeasure Ξ½)

Code

lemma identDistrib_rewardByCount_eval [StandardBorelSpace Ξ©] [Countable 𝓐]
    (h : IsAlgEnvSeq A R alg (stationaryEnv Ξ½) P) (a : 𝓐) (n m : β„•) (hn : n β‰  0) :
    IdentDistrib (rewardByCount A R a n) (fun Ο‰ ↦ Ο‰ m a) 𝔓 (streamMeasure Ξ½)
Type uses (5)
Body uses (2)

Actions: Source Β· Open Issue

Proof
(identDistrib_rewardByCount_id h a n hn).trans
    (identDistrib_eval_eval_id_streamMeasure Ξ½ m a).symm
  1. Bandits.hasLaw_Z
  2. Bandits.Β«term𝓛[_|_;_]Β»
  3. Bandits.Β«term𝓛[_|_In_;_]Β»
  4. Bandits.Β«term𝓛[_|_←_;_]Β»
  5. Bandits.condDistrib_reward''
  6. Bandits.reward_cond_action
  7. Bandits.condIndepFun_reward_stepsUntil_action'
  8. Bandits.condIndepFun_reward_stepsUntil_action
  9. Bandits.reward_cond_stepsUntil
  10. Bandits.condDistrib_rewardByCount_stepsUntil
  11. Bandits.hasLaw_rewardByCount
  12. Bandits.identDistrib_rewardByCount
  13. Bandits.identDistrib_rewardByCount_id
  14. Bandits.identDistrib_rewardByCount_eval