LeanMachineLearning exposition

Bandits.UCB.nextArm🔗

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

nextArm🔗

DefinitionBandits.UCB.nextArm

Arm pulled by the UCB algorithm at time n + 1.

🔗def
Bandits.UCB.nextArm {K : } (hK : 0 < K) (c : ) (n : ) (h : (Finset.Iic n) Fin K × ) : Fin K
Bandits.UCB.nextArm {K : } (hK : 0 < K) (c : ) (n : ) (h : (Finset.Iic n) Fin K × ) : Fin K

Code

noncomputable
def UCB.nextArm (hK : 0 < K) (c : ℝ) (n : ℕ) (h : Iic n → Fin K × ℝ) : Fin K :=
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  if n < K - 1 then RoundRobin.nextAction hK n else
  argmax (fun a ↦ empMean' n h a + ucbWidth' c n h a)
Body uses (4)
Used by (7)

Actions: Source · Open Issue

Dependency graph

All dependencies, transitively (8)

nextAction🔗

DefinitionLearning.RoundRobin.nextAction

Action chosen by the Round-Robin algorithm at time n + 1. This is action (n + 1) % K.

🔗def
Learning.RoundRobin.nextAction {K : } (hK : 0 < K) (n : ) : Fin K
Learning.RoundRobin.nextAction {K : } (hK : 0 < K) (n : ) : Fin K

Code

noncomputable
def RoundRobin.nextAction (hK : 0 < K) (n : ℕ) : Fin K := ⟨(n + 1) % K, Nat.mod_lt _ hK⟩
Used by (14)

Actions: Source · Open Issue

max🔗

DefinitionFunction.max

The maximum value of a tuple.

🔗def
Function.max.{u_1, u_2} {ι : Type u_1} {α : Type u_2} [LinearOrder α] [Fintype ι] [Nonempty ι] (f : ι α) : α
Function.max.{u_1, u_2} {ι : Type u_1} {α : Type u_2} [LinearOrder α] [Fintype ι] [Nonempty ι] (f : ι α) : α

Code

abbrev max : α := univ.sup' univ_nonempty f
Used by (8)

Actions: Source · Open Issue

exists_argmax🔗

Lemmaexists_argmax

No docstring.

🔗theorem
exists_argmax.{u_1, u_2} {ι : Type u_1} {α : Type u_2} [LinearOrder α] [Fintype ι] [Nonempty ι] (f : ι α) : i, f i = Function.max f
exists_argmax.{u_1, u_2} {ι : Type u_1} {α : Type u_2} [LinearOrder α] [Fintype ι] [Nonempty ι] (f : ι α) : i, f i = Function.max f

Code

lemma exists_argmax : ∃ i, f i = f.max
Type uses (1)
Used by (3)

Actions: Source · Open Issue

Proof
by
  obtain ⟨i, -, hi⟩ := Finset.exists_mem_eq_sup' (by simp : Finset.univ.Nonempty) f
  exact ⟨i, hi.symm⟩

argmax🔗

Definitionargmax

The index of the maximum value of a tuple.

🔗def
argmax.{u_1, u_2} {ι : Type u_1} {α : Type u_2} [LinearOrder α] [Fintype ι] [Nonempty ι] (f : ι α) : ι
argmax.{u_1, u_2} {ι : Type u_1} {α : Type u_2} [LinearOrder α] [Fintype ι] [Nonempty ι] (f : ι α) : ι

Code

noncomputable def argmax := (exists_argmax f).choose
Body uses (2)
Used by (17)

Actions: Source · Open Issue

sumRewards'🔗

DefinitionLearning.sumRewards'

Sum of rewards of arm a up to (and including) time n.

🔗def
Learning.sumRewards'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × ) (a : 𝓐) :
Learning.sumRewards'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × ) (a : 𝓐) :

Code

noncomputable
def sumRewards' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  ∑ s, if (h s).1 = a then (h s).2 else 0
Used by (9)

Actions: Source · Open Issue

pullCount'🔗

DefinitionLearning.pullCount'

Number of pulls of arm a up to (and including) time n. This is the number of entries in h in which the arm is a.

🔗def
Learning.pullCount'.{u_1, u_2} {𝓐 : Type u_1} {R : Type u_2} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × R) (a : 𝓐) :
Learning.pullCount'.{u_1, u_2} {𝓐 : Type u_1} {R : Type u_2} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × R) (a : 𝓐) :

Code

noncomputable
def pullCount' (n : ℕ) (h : Iic n → 𝓐 × R) (a : 𝓐) := #{s | (h s).1 = a}
Used by (29)

Actions: Source · Open Issue

empMean'🔗

DefinitionLearning.empMean'

Empirical mean of arm a at time n.

🔗def
Learning.empMean'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × ) (a : 𝓐) :
Learning.empMean'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ) (h : (Finset.Iic n) 𝓐 × ) (a : 𝓐) :

Code

noncomputable
def empMean' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  (sumRewards' n h a) / (pullCount' n h a)
Body uses (2)
Used by (18)

Actions: Source · Open Issue

ucbWidth'🔗

DefinitionBandits.ucbWidth'

The exploration bonus of the UCB algorithm, which corresponds to the width of a confidence interval.

🔗def
Bandits.ucbWidth' {K : } (c : ) (n : ) (h : (Finset.Iic n) Fin K × ) (a : Fin K) :
Bandits.ucbWidth' {K : } (c : ) (n : ) (h : (Finset.Iic n) Fin K × ) (a : Fin K) :

Code

noncomputable def ucbWidth' (c : ℝ) (n : ℕ) (h : Iic n → Fin K × ℝ) (a : Fin K) : ℝ :=
  √(2 * c * log (n + 2) / pullCount' n h a)
Body uses (1)
Used by (6)

Actions: Source · Open Issue