LeanMachineLearning exposition

Bandits.UCB.gap_arm_le_two_mul_ucbWidth๐Ÿ”—

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

gap_arm_le_two_mul_ucbWidth๐Ÿ”—

LemmaBandits.UCB.gap_arm_le_two_mul_ucbWidth

No docstring.

๐Ÿ”—theorem
Bandits.UCB.gap_arm_le_two_mul_ucbWidth.{u_1} {K : โ„•} {c : โ„} {ฮฝ : ProbabilityTheory.Kernel (Fin K) โ„} {ฮฉ : Type u_1} {A : โ„• โ†’ ฮฉ โ†’ Fin K} {R : โ„• โ†’ ฮฉ โ†’ โ„} {n : โ„•} {ฯ‰ : ฮฉ} [Nonempty (Fin K)] (h_best : โˆซ (x : โ„), id x โˆ‚ฮฝ (bestArm ฮฝ) โ‰ค Learning.empMean A R (bestArm ฮฝ) n ฯ‰ + ucbWidth A c (bestArm ฮฝ) n ฯ‰) (h_arm : Learning.empMean A R (A n ฯ‰) n ฯ‰ - ucbWidth A c (A n ฯ‰) n ฯ‰ โ‰ค โˆซ (x : โ„), id x โˆ‚ฮฝ (A n ฯ‰)) (h_le : Learning.empMean A R (bestArm ฮฝ) n ฯ‰ + ucbWidth A c (bestArm ฮฝ) n ฯ‰ โ‰ค Learning.empMean A R (A n ฯ‰) n ฯ‰ + ucbWidth A c (A n ฯ‰) n ฯ‰) : gap ฮฝ (A n ฯ‰) โ‰ค 2 * ucbWidth A c (A n ฯ‰) n ฯ‰
Bandits.UCB.gap_arm_le_two_mul_ucbWidth.{u_1} {K : โ„•} {c : โ„} {ฮฝ : ProbabilityTheory.Kernel (Fin K) โ„} {ฮฉ : Type u_1} {A : โ„• โ†’ ฮฉ โ†’ Fin K} {R : โ„• โ†’ ฮฉ โ†’ โ„} {n : โ„•} {ฯ‰ : ฮฉ} [Nonempty (Fin K)] (h_best : โˆซ (x : โ„), id x โˆ‚ฮฝ (bestArm ฮฝ) โ‰ค Learning.empMean A R (bestArm ฮฝ) n ฯ‰ + ucbWidth A c (bestArm ฮฝ) n ฯ‰) (h_arm : Learning.empMean A R (A n ฯ‰) n ฯ‰ - ucbWidth A c (A n ฯ‰) n ฯ‰ โ‰ค โˆซ (x : โ„), id x โˆ‚ฮฝ (A n ฯ‰)) (h_le : Learning.empMean A R (bestArm ฮฝ) n ฯ‰ + ucbWidth A c (bestArm ฮฝ) n ฯ‰ โ‰ค Learning.empMean A R (A n ฯ‰) n ฯ‰ + ucbWidth A c (A n ฯ‰) n ฯ‰) : gap ฮฝ (A n ฯ‰) โ‰ค 2 * ucbWidth A c (A n ฯ‰) n ฯ‰

Code

lemma gap_arm_le_two_mul_ucbWidth [Nonempty (Fin K)]
    (h_best : (ฮฝ (bestArm ฮฝ))[id] โ‰ค empMean A R (bestArm ฮฝ) n ฯ‰ + ucbWidth A c (bestArm ฮฝ) n ฯ‰)
    (h_arm : empMean A R (A n ฯ‰) n ฯ‰ - ucbWidth A c (A n ฯ‰) n ฯ‰ โ‰ค (ฮฝ (A n ฯ‰))[id])
    (h_le : empMean A R (bestArm ฮฝ) n ฯ‰ + ucbWidth A c (bestArm ฮฝ) n ฯ‰ โ‰ค
      empMean A R (A n ฯ‰) n ฯ‰ + ucbWidth A c (A n ฯ‰) n ฯ‰) :
    gap ฮฝ (A n ฯ‰) โ‰ค 2 * ucbWidth A c (A n ฯ‰) n ฯ‰
Type uses (4)
Body uses (1)
Used by (1)

Actions: Source ยท Open Issue

Proof
by
  rw [gap_eq_bestArm_sub, sub_le_iff_le_add']
  calc (ฮฝ (bestArm ฮฝ))[id]
  _ โ‰ค empMean A R (bestArm ฮฝ) n ฯ‰ + ucbWidth A c (bestArm ฮฝ) n ฯ‰ := h_best
  _ โ‰ค empMean A R (A n ฯ‰) n ฯ‰ + ucbWidth A c (A n ฯ‰) n ฯ‰ := h_le
  _ โ‰ค (ฮฝ (A n ฯ‰))[id] + 2 * ucbWidth A c (A n ฯ‰) n ฯ‰ := by
    rw [two_mul, โ† add_assoc]
    gcongr
    rwa [sub_le_iff_le_add] at h_arm

Dependency graph

Type dependencies (4)

bestArm๐Ÿ”—

DefinitionBandits.bestArm

action with the highest mean.

๐Ÿ”—def
Bandits.bestArm.{u_1} {๐“ : Type u_1} {m๐“ : MeasurableSpace ๐“} [Fintype ๐“] [Nonempty ๐“] (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) : ๐“
Bandits.bestArm.{u_1} {๐“ : Type u_1} {m๐“ : MeasurableSpace ๐“} [Fintype ๐“] [Nonempty ๐“] (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) : ๐“

Code

noncomputable def bestArm (ฮฝ : Kernel ๐“ โ„) : ๐“ :=
  (exists_max_image univ (fun a โ†ฆ (ฮฝ a)[id]) (univ_nonempty_iff.mpr inferInstance)).choose
Used by (18)

Actions: Source ยท Open Issue

empMean๐Ÿ”—

DefinitionLearning.empMean

Empirical mean reward obtained when pulling action a up to time t (exclusive).

๐Ÿ”—def
Learning.empMean.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„
Learning.empMean.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„

Code

noncomputable
def empMean (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„ :=
  sumRewards A R' a t ฯ‰ / pullCount A a t ฯ‰
Body uses (2)
Used by (34)

Actions: Source ยท Open Issue

ucbWidth๐Ÿ”—

DefinitionBandits.UCB.ucbWidth

The exploration bonus of the UCB algorithm, which corresponds to the width of a confidence interval.

๐Ÿ”—def
Bandits.UCB.ucbWidth.{u_1} {K : โ„•} {ฮฉ : Type u_1} (A : โ„• โ†’ ฮฉ โ†’ Fin K) (c : โ„) (a : Fin K) (n : โ„•) (ฯ‰ : ฮฉ) : โ„
Bandits.UCB.ucbWidth.{u_1} {K : โ„•} {ฮฉ : Type u_1} (A : โ„• โ†’ ฮฉ โ†’ Fin K) (c : โ„) (a : Fin K) (n : โ„•) (ฯ‰ : ฮฉ) : โ„

Code

noncomputable def ucbWidth (A : โ„• โ†’ ฮฉ โ†’ Fin K) (c : โ„) (a : Fin K) (n : โ„•) (ฯ‰ : ฮฉ) : โ„ :=
  โˆš(2 * c * log (n + 1) / pullCount A a n ฯ‰)
Body uses (1)
Used by (16)

Actions: Source ยท Open Issue

gap๐Ÿ”—

DefinitionBandits.gap

Gap of an action a: difference between the highest mean of the actions and the mean of a.

๐Ÿ”—def
Bandits.gap.{u_1} {๐“ : Type u_1} {m๐“ : MeasurableSpace ๐“} (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) (a : ๐“) : โ„
Bandits.gap.{u_1} {๐“ : Type u_1} {m๐“ : MeasurableSpace ๐“} (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) (a : ๐“) : โ„

Code

noncomputable
def gap (ฮฝ : Kernel ๐“ โ„) (a : ๐“) : โ„ := (โจ† i, (ฮฝ i)[id]) - (ฮฝ a)[id]
Used by (27)

Actions: Source ยท Open Issue

All dependencies, transitively (2)

sumRewards๐Ÿ”—

DefinitionLearning.sumRewards

Sum of rewards obtained when pulling action a up to time t (exclusive).

๐Ÿ”—def
Learning.sumRewards.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„
Learning.sumRewards.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„

Code

def sumRewards (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„ :=
  โˆ‘ s โˆˆ range t, if A s ฯ‰ = a then R' s ฯ‰ else 0
Used by (44)

Actions: Source ยท Open Issue

pullCount๐Ÿ”—

DefinitionLearning.pullCount

Number of times action a was chosen up to time t (excluding t).

๐Ÿ”—def
Learning.pullCount.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„•
Learning.pullCount.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„•

Code

noncomputable
def pullCount (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„• :=
  #(filter (fun s โ†ฆ A s ฯ‰ = a) (range t))
Used by (146)

Actions: Source ยท Open Issue