LeanMachineLearning exposition

Learning.IsBayesAlgEnvSeq.prob_empMean_sub_actionMean_ge_le๐Ÿ”—

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

prob_empMean_sub_actionMean_ge_le๐Ÿ”—

LemmaLearning.IsBayesAlgEnvSeq.prob_empMean_sub_actionMean_ge_le

No docstring.

๐Ÿ”—theorem
Learning.IsBayesAlgEnvSeq.prob_empMean_sub_actionMean_ge_le.{u_1, u_2} {๐“” : Type u_1} {ฮฉ : Type u_2} [MeasurableSpace ๐“”] [MeasurableSpace ฮฉ] {K : โ„•} [Nonempty (Fin K)] {Q : MeasureTheory.Measure ๐“”} {ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„} [ProbabilityTheory.IsMarkovKernel ฮบ] {alg : Algorithm (Fin K) โ„} {E : ฮฉ โ†’ ๐“”} {A : โ„• โ†’ ฮฉ โ†’ Fin K} {R : โ„• โ†’ ฮฉ โ†’ โ„} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] (h : IsBayesAlgEnvSeq Q ฮบ alg E A R P) {ฯƒ2 : NNReal} (hฯƒ2 : 0 < ฯƒ2) (hs : โˆ€ (e : ๐“”) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - โˆซ (x : โ„), id x โˆ‚ฮบ (e, a)) ฯƒ2 (ฮบ (e, a))) {ฮด : โ„} (hฮด : 0 < ฮด) (n : โ„•) : P {ฯ‰ | โˆƒ t < n, โˆƒ a, pullCount A a t ฯ‰ โ‰  0 โˆง โˆš(2 * โ†‘ฯƒ2 * Real.log (1 / ฮด) / โ†‘(pullCount A a t ฯ‰)) โ‰ค empMean A R a t ฯ‰ - actionMean ฮบ E a ฯ‰} โ‰ค ENNReal.ofReal (โ†‘K * (โ†‘n - 1) * ฮด)
Learning.IsBayesAlgEnvSeq.prob_empMean_sub_actionMean_ge_le.{u_1, u_2} {๐“” : Type u_1} {ฮฉ : Type u_2} [MeasurableSpace ๐“”] [MeasurableSpace ฮฉ] {K : โ„•} [Nonempty (Fin K)] {Q : MeasureTheory.Measure ๐“”} {ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„} [ProbabilityTheory.IsMarkovKernel ฮบ] {alg : Algorithm (Fin K) โ„} {E : ฮฉ โ†’ ๐“”} {A : โ„• โ†’ ฮฉ โ†’ Fin K} {R : โ„• โ†’ ฮฉ โ†’ โ„} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] (h : IsBayesAlgEnvSeq Q ฮบ alg E A R P) {ฯƒ2 : NNReal} (hฯƒ2 : 0 < ฯƒ2) (hs : โˆ€ (e : ๐“”) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - โˆซ (x : โ„), id x โˆ‚ฮบ (e, a)) ฯƒ2 (ฮบ (e, a))) {ฮด : โ„} (hฮด : 0 < ฮด) (n : โ„•) : P {ฯ‰ | โˆƒ t < n, โˆƒ a, pullCount A a t ฯ‰ โ‰  0 โˆง โˆš(2 * โ†‘ฯƒ2 * Real.log (1 / ฮด) / โ†‘(pullCount A a t ฯ‰)) โ‰ค empMean A R a t ฯ‰ - actionMean ฮบ E a ฯ‰} โ‰ค ENNReal.ofReal (โ†‘K * (โ†‘n - 1) * ฮด)

Code

lemma prob_empMean_sub_actionMean_ge_le (h : IsBayesAlgEnvSeq Q ฮบ alg E A R P) {ฯƒ2 : โ„โ‰ฅ0}
    (hฯƒ2 : 0 < ฯƒ2) (hs : โˆ€ e a, HasSubgaussianMGF (fun x โ†ฆ x - (ฮบ (e, a))[id]) ฯƒ2 (ฮบ (e, a)))
    {ฮด : โ„} (hฮด : 0 < ฮด) (n : โ„•) :
    P {ฯ‰ | โˆƒ t < n, โˆƒ a, pullCount A a t ฯ‰ โ‰  0 โˆง
      โˆš(2 * ฯƒ2 * Real.log (1 / ฮด) / pullCount A a t ฯ‰) โ‰ค empMean A R a t ฯ‰ - actionMean ฮบ E a ฯ‰}
      โ‰ค ENNReal.ofReal (K * (n - 1) * ฮด)
Type uses (5)
Body uses (14)
Used by (1)

Actions: Source ยท Open Issue

Proof
by
  have := h.measurable_param
  have := h.measurable_action
  have := h.measurable_feedback
  let S := {(e, ฯ„) | โˆƒ a, โˆƒ t < n, pullCount IT.action a t ฯ„ โ‰  0 โˆง
    โˆš(2 * pullCount IT.action a t ฯ„ * ฯƒ2 * Real.log (1 / ฮด)) โ‰ค
      sumRewards IT.action IT.feedback a t ฯ„ - pullCount IT.action a t ฯ„ * actionMean ฮบ id a e}
  calc
    _ โ‰ค (P.map (fun ฯ‰ โ†ฆ (E ฯ‰, trajectory A R ฯ‰))) S := by
        rw [Measure.map_apply (by fun_prop) (by measurability)]
        apply measure_mono
        intro ฯ‰ โŸจt, ht, a, hpc, hleโŸฉ
        rw [empMean] at hle
        exact โŸจa, t, ht, hpc, sqrt_two_mul_le_sub hpc hleโŸฉ
    _ = (P.map E โŠ—โ‚˜ condDistrib (trajectory A R) E P) S := by
        rw [โ† compProd_map_condDistrib (by fun_prop)]
    _ = โˆซโป e, condDistrib (trajectory A R) E P e (Prod.mk e โปยน' S) โˆ‚(P.map E) :=
        Measure.compProd_apply (by measurability)
    _ โ‰ค โˆซโป e, ENNReal.ofReal (Fintype.card (Fin K) * (n - 1) * ฮด) โˆ‚(P.map E) := by
        apply lintegral_mono_ae
        rw [h.hasLaw_env.map_eq]
        filter_upwards [h.ae_IsAlgEnvSeq] with e he
        exact Bandits.prob_sumRewards_sub_pullCount_mul_ge_le_of_Fintype hฯƒ2 (hs e) he hฮด
    _ = ENNReal.ofReal (K * (n - 1) * ฮด) := by
      simp [Measure.map_apply h.measurable_param]

Dependency graph

Type dependencies (5)

Algorithm๐Ÿ”—

StructureLearning.Algorithm

A stochastic, sequential algorithm.

๐Ÿ”—structure
Learning.Algorithm.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)
Learning.Algorithm.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)

Code

structure Algorithm (๐“ ๐“จ : Type*) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] where
  /-- Policy or sampling rule: distribution of the next action. -/
  policy : (n : โ„•) โ†’ Kernel (Iic n โ†’ ๐“ ร— ๐“จ) ๐“
  /-- The policy is a Markov kernel. -/
  [h_policy : โˆ€ n, IsMarkovKernel (policy n)]
  /-- Distribution of the first action. -/
  p0 : Measure ๐“
  /-- The first action distribution is a probability measure. -/
  [hp0 : IsProbabilityMeasure p0]
Used by (216)

Actions: Source ยท Open Issue

IsBayesAlgEnvSeq๐Ÿ”—

StructureLearning.IsBayesAlgEnvSeq

IsBayesAlgEnvSeq Q ฮบ alg E A Y P states that there is a measure P : Measure ฮฉ such that the parameter E : ฮฉ โ†’ ๐“” has law Q and that the sequences of actions A : โ„• โ†’ ฮฉ โ†’ ๐“ and feedbacks Y : โ„• โ†’ ฮฉ โ†’ ๐“จ are generated by the algorithm alg : Algorithm ๐“ ๐“จ interacting with an underlying environment that depends on E and ฮบ (stationaryEnv (ฮบ.sectR (E ฯ‰))).

๐Ÿ”—structure
Learning.IsBayesAlgEnvSeq.{u_1, u_2, u_3, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] [MeasurableSpace ฮฉ] (Q : MeasureTheory.Measure ๐“”) (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) (alg : Algorithm ๐“ ๐“จ) (E : ฮฉ โ†’ ๐“”) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (P : MeasureTheory.Measure ฮฉ) [MeasureTheory.IsFiniteMeasure P] : Prop
Learning.IsBayesAlgEnvSeq.{u_1, u_2, u_3, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] [MeasurableSpace ฮฉ] (Q : MeasureTheory.Measure ๐“”) (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) (alg : Algorithm ๐“ ๐“จ) (E : ฮฉ โ†’ ๐“”) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (P : MeasureTheory.Measure ฮฉ) [MeasureTheory.IsFiniteMeasure P] : Prop

Code

structure IsBayesAlgEnvSeq
    (Q : Measure ๐“”) (ฮบ : Kernel (๐“” ร— ๐“) ๐“จ) (alg : Algorithm ๐“ ๐“จ)
    (E : ฮฉ โ†’ ๐“”) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ)
    (P : Measure ฮฉ) [IsFiniteMeasure P] : Prop where
  measurable_param : Measurable E := by fun_prop
  measurable_action n : Measurable (A n) := by fun_prop
  measurable_feedback n : Measurable (Y n) := by fun_prop
  hasLaw_env : HasLaw E Q P
  hasCondDistrib_action_zero : HasCondDistrib (A 0) E (Kernel.const _ alg.p0) P
  hasCondDistrib_feedback_zero : HasCondDistrib (Y 0) (fun ฯ‰ โ†ฆ (E ฯ‰, A 0 ฯ‰)) ฮบ P
  hasCondDistrib_action n :
    HasCondDistrib (A (n + 1)) (fun ฯ‰ โ†ฆ (E ฯ‰, history A Y n ฯ‰))
      ((alg.policy n).prodMkLeft _) P
  hasCondDistrib_feedback n :
    HasCondDistrib (Y (n + 1)) (fun ฯ‰ โ†ฆ (history A Y n ฯ‰, E ฯ‰, A (n + 1) ฯ‰))
      (ฮบ.prodMkLeft _) P
Type uses (2)
Used by (22)

Actions: Source ยท Open Issue

pullCount๐Ÿ”—

DefinitionLearning.pullCount

Number of times action a was chosen up to time t (excluding t).

๐Ÿ”—def
Learning.pullCount.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„•
Learning.pullCount.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„•

Code

noncomputable
def pullCount (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„• :=
  #(filter (fun s โ†ฆ A s ฯ‰ = a) (range t))
Used by (146)

Actions: Source ยท Open Issue

empMean๐Ÿ”—

DefinitionLearning.empMean

Empirical mean reward obtained when pulling action a up to time t (exclusive).

๐Ÿ”—def
Learning.empMean.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„
Learning.empMean.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„

Code

noncomputable
def empMean (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„ :=
  sumRewards A R' a t ฯ‰ / pullCount A a t ฯ‰
Body uses (2)
Used by (34)

Actions: Source ยท Open Issue

actionMean๐Ÿ”—

DefinitionLearning.IsBayesAlgEnvSeq.actionMean

A random variable that gives the mean feedback of action a.

๐Ÿ”—def
Learning.IsBayesAlgEnvSeq.actionMean.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (a : ๐“) (ฯ‰ : ฮฉ) : โ„
Learning.IsBayesAlgEnvSeq.actionMean.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (a : ๐“) (ฯ‰ : ฮฉ) : โ„

Code

noncomputable
def actionMean (ฮบ : Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (a : ๐“) (ฯ‰ : ฮฉ) : โ„ := (ฮบ (E ฯ‰, a))[id]
Used by (12)

Actions: Source ยท Open Issue

All dependencies, transitively (2)

history๐Ÿ”—

DefinitionLearning.history

History of the algorithm-environment sequence up to time n.

๐Ÿ”—def
Learning.history.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {ฮฉ : Type u_3} (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (n : โ„•) (ฯ‰ : ฮฉ) : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ
Learning.history.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {ฮฉ : Type u_3} (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (n : โ„•) (ฯ‰ : ฮฉ) : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ

Code

def history (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (n : โ„•) (ฯ‰ : ฮฉ) : Iic n โ†’ ๐“ ร— ๐“จ :=
  fun i โ†ฆ (A i ฯ‰, Y i ฯ‰)
Used by (72)

Actions: Source ยท Open Issue

sumRewards๐Ÿ”—

DefinitionLearning.sumRewards

Sum of rewards obtained when pulling action a up to time t (exclusive).

๐Ÿ”—def
Learning.sumRewards.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„
Learning.sumRewards.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„

Code

def sumRewards (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„ :=
  โˆ‘ s โˆˆ range t, if A s ฯ‰ = a then R' s ฯ‰ else 0
Used by (44)

Actions: Source ยท Open Issue