LeanMachineLearning exposition

Bandits.pullCount_rate_tendsto_of_sublinear_regret๐Ÿ”—

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

pullCount_rate_tendsto_of_sublinear_regret๐Ÿ”—

LemmaBandits.pullCount_rate_tendsto_of_sublinear_regret

If the regret is sublinear, the rate of suboptimal arm pulls tends to zero.

๐Ÿ”—theorem
Bandits.pullCount_rate_tendsto_of_sublinear_regret.{u_1, u_2} {๐“ : Type u_1} {ฮฉ : Type u_2} [DecidableEq ๐“] {m๐“ : MeasurableSpace ๐“} {ฮฝ : ProbabilityTheory.Kernel ๐“ โ„} {A : โ„• โ†’ ฮฉ โ†’ ๐“} {ฯ‰ : ฮฉ} {a : ๐“} [Finite ๐“] (hr : (fun x => regret ฮฝ A x ฯ‰) =o[Filter.atTop] fun t => โ†‘t) (hg : 0 < gap ฮฝ a) : Filter.Tendsto (fun t => โ†‘(Learning.pullCount A a t ฯ‰) / โ†‘t) Filter.atTop (nhds 0)
Bandits.pullCount_rate_tendsto_of_sublinear_regret.{u_1, u_2} {๐“ : Type u_1} {ฮฉ : Type u_2} [DecidableEq ๐“] {m๐“ : MeasurableSpace ๐“} {ฮฝ : ProbabilityTheory.Kernel ๐“ โ„} {A : โ„• โ†’ ฮฉ โ†’ ๐“} {ฯ‰ : ฮฉ} {a : ๐“} [Finite ๐“] (hr : (fun x => regret ฮฝ A x ฯ‰) =o[Filter.atTop] fun t => โ†‘t) (hg : 0 < gap ฮฝ a) : Filter.Tendsto (fun t => โ†‘(Learning.pullCount A a t ฯ‰) / โ†‘t) Filter.atTop (nhds 0)

Code

lemma pullCount_rate_tendsto_of_sublinear_regret [Finite ๐“]
    (hr : (regret ฮฝ A ยท ฯ‰) =o[atTop] fun t โ†ฆ (t : โ„)) (hg : 0 < gap ฮฝ a) :
    Tendsto (fun t โ†ฆ (pullCount A a t ฯ‰ : โ„) / t) atTop (nhds 0)
Type uses (3)
Body uses (3)

Actions: Source ยท Open Issue

Proof
by
  have := Fintype.ofFinite ๐“
  have hb (t : โ„•) : (pullCount A a t ฯ‰ : โ„) * gap ฮฝ a โ‰ค regret ฮฝ A t ฯ‰ := by
    rw [regret_eq_sum_pullCount_mul_gap]
    exact single_le_sum (f := fun a โ†ฆ pullCount A a t ฯ‰ * gap ฮฝ a)
      (fun _ _ โ†ฆ mul_nonneg (Nat.cast_nonneg _) gap_nonneg) (mem_univ a)
  have hb' (t : โ„•) : (pullCount A a t ฯ‰ : โ„) / t โ‰ค regret ฮฝ A t ฯ‰ / t / gap ฮฝ a := by
    obtain ht | ht := eq_or_ne t 0
    ยท simp [ht]
    ยท calc (pullCount A a t ฯ‰ : โ„) / t
          = pullCount A a t ฯ‰ * gap ฮฝ a / gap ฮฝ a / t := by field_simp
        _ โ‰ค regret ฮฝ A t ฯ‰ / gap ฮฝ a / t := by gcongr; exact hb t
        _ = regret ฮฝ A t ฯ‰ / t / gap ฮฝ a := by ring
  apply squeeze_zero' (Eventually.of_forall fun _ โ†ฆ by positivity) (Eventually.of_forall hb')
  simpa using hr.tendsto_div_nhds_zero.div_const (gap ฮฝ a)

Dependency graph

Type dependencies (3)

regret๐Ÿ”—

DefinitionBandits.regret

Regret of a sequence of pulls k : โ„• โ†’ ๐“ at time t for the reward kernel ฮฝ ; Kernel ๐“ โ„.

๐Ÿ”—def
Bandits.regret.{u_1, u_2} {๐“ : Type u_1} {ฮฉ : Type u_2} {m๐“ : MeasurableSpace ๐“} (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„
Bandits.regret.{u_1, u_2} {๐“ : Type u_1} {ฮฉ : Type u_2} {m๐“ : MeasurableSpace ๐“} (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„

Code

noncomputable
def regret (ฮฝ : Kernel ๐“ โ„) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„ :=
  t * (โจ† a, (ฮฝ a)[id]) - โˆ‘ s โˆˆ range t, (ฮฝ (A s ฯ‰))[id]
Used by (11)

Actions: Source ยท Open Issue

gap๐Ÿ”—

DefinitionBandits.gap

Gap of an action a: difference between the highest mean of the actions and the mean of a.

๐Ÿ”—def
Bandits.gap.{u_1} {๐“ : Type u_1} {m๐“ : MeasurableSpace ๐“} (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) (a : ๐“) : โ„
Bandits.gap.{u_1} {๐“ : Type u_1} {m๐“ : MeasurableSpace ๐“} (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) (a : ๐“) : โ„

Code

noncomputable
def gap (ฮฝ : Kernel ๐“ โ„) (a : ๐“) : โ„ := (โจ† i, (ฮฝ i)[id]) - (ฮฝ a)[id]
Used by (27)

Actions: Source ยท Open Issue

pullCount๐Ÿ”—

DefinitionLearning.pullCount

Number of times action a was chosen up to time t (excluding t).

๐Ÿ”—def
Learning.pullCount.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„•
Learning.pullCount.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„•

Code

noncomputable
def pullCount (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„• :=
  #(filter (fun s โ†ฆ A s ฯ‰ = a) (range t))
Used by (146)

Actions: Source ยท Open Issue