LeanMachineLearning exposition

Bandits.ClippedUCB.measurable_uncurry_ucb'🔗

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

measurable_uncurry_ucb'🔗

LemmaBandits.ClippedUCB.measurable_uncurry_ucb'

No docstring.

🔗theorem
Bandits.ClippedUCB.measurable_uncurry_ucb' {K : } {l u σ2 δ : } {n : } : Measurable fun p => ucb' n (Prod.fst p) l u σ2 δ (Prod.snd p)
Bandits.ClippedUCB.measurable_uncurry_ucb' {K : } {l u σ2 δ : } {n : } : Measurable fun p => ucb' n (Prod.fst p) l u σ2 δ (Prod.snd p)

Code

lemma measurable_uncurry_ucb' {n : ℕ} :
    Measurable (fun p : (Iic n → Fin K × ℝ) × Fin K ↦ ucb' n p.1 l u σ2 δ p.2)
Type uses (1)
Body uses (5)
Used by (1)

Actions: Source · Open Issue

Proof
Measurable.ite (by measurability) (by fun_prop) (by fun_prop)

Dependency graph

Type dependencies (1)

ucb'🔗

DefinitionBandits.ClippedUCB.ucb'

Clipped upper confidence bound (history-based version).

🔗def
Bandits.ClippedUCB.ucb' {K : } (n : ) (h : (Finset.Iic n) Fin K × ) (l u σ2 δ : ) (a : Fin K) :
Bandits.ClippedUCB.ucb' {K : } (n : ) (h : (Finset.Iic n) Fin K × ) (l u σ2 δ : ) (a : Fin K) :

Code

noncomputable
def ucb' (n : ℕ) (h : Iic n → Fin K × ℝ) (l u σ2 δ : ℝ) (a : Fin K) : ℝ :=
  if pullCount' n h a = 0 then u
  else max l (min u (empMean' n h a + √(2 * σ2 * Real.log (1 / δ) / (pullCount' n h a))))
Body uses (2)
Used by (3)

Actions: Source · Open Issue

All dependencies, transitively (3)

pullCount'🔗

DefinitionLearning.pullCount'

Number of pulls of arm a up to (and including) time n. This is the number of entries in h in which the arm is a.

🔗def
Learning.pullCount'.{u_1, u_2} {𝓐 : Type u_1} {R : Type u_2} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × R) (a : 𝓐) :
Learning.pullCount'.{u_1, u_2} {𝓐 : Type u_1} {R : Type u_2} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × R) (a : 𝓐) :

Code

noncomputable
def pullCount' (n : ℕ) (h : Iic n → 𝓐 × R) (a : 𝓐) := #{s | (h s).1 = a}
Used by (29)

Actions: Source · Open Issue

sumRewards'🔗

DefinitionLearning.sumRewards'

Sum of rewards of arm a up to (and including) time n.

🔗def
Learning.sumRewards'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × ) (a : 𝓐) :
Learning.sumRewards'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × ) (a : 𝓐) :

Code

noncomputable
def sumRewards' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  ∑ s, if (h s).1 = a then (h s).2 else 0
Used by (9)

Actions: Source · Open Issue

empMean'🔗

DefinitionLearning.empMean'

Empirical mean of arm a at time n.

🔗def
Learning.empMean'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × ) (a : 𝓐) :
Learning.empMean'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × ) (a : 𝓐) :

Code

noncomputable
def empMean' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  (sumRewards' n h a) / (pullCount' n h a)
Body uses (2)
Used by (18)

Actions: Source · Open Issue