LeanMachineLearning exposition

Bandits.ETC.measurable_nextArm๐Ÿ”—

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

measurable_nextArm๐Ÿ”—

LemmaBandits.ETC.measurable_nextArm

The next arm pulled by ETC is chosen in a measurable way.

๐Ÿ”—theorem
Bandits.ETC.measurable_nextArm {K : โ„•} (hK : 0 < K) (m n : โ„•) : Measurable (nextArm hK m n)
Bandits.ETC.measurable_nextArm {K : โ„•} (hK : 0 < K) (m n : โ„•) : Measurable (nextArm hK m n)

Code

lemma ETC.measurable_nextArm (hK : 0 < K) (m n : โ„•) : Measurable (nextArm hK m n)
Type uses (1)
Body uses (6)
Used by (3)

Actions: Source ยท Open Issue

Proof
by
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  unfold nextArm
  simp only [dite_eq_ite]
  refine Measurable.ite (by simp) (by fun_prop) ?_
  refine Measurable.ite (by simp) ?_ (by fun_prop)
  fun_prop

Dependency graph

Type dependencies (1)

nextArm๐Ÿ”—

DefinitionBandits.ETC.nextArm

Arm pulled by the ETC algorithm at time n + 1. For n < K * m - 1, this is arm (n + 1) % K. For n = K * m - 1, this is the arm with the highest empirical mean after the exploration phase. For n โ‰ฅ K * m, this is the same arm as at time n.

๐Ÿ”—def
Bandits.ETC.nextArm {K : โ„•} (hK : 0 < K) (m n : โ„•) (h : โ†ฅ(Finset.Iic n) โ†’ Fin K ร— โ„) : Fin K
Bandits.ETC.nextArm {K : โ„•} (hK : 0 < K) (m n : โ„•) (h : โ†ฅ(Finset.Iic n) โ†’ Fin K ร— โ„) : Fin K

Code

noncomputable
def ETC.nextArm (hK : 0 < K) (m n : โ„•) (h : Iic n โ†’ Fin K ร— โ„) : Fin K :=
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  if hn : n < K * m - 1 then RoundRobin.nextAction hK n
  else
    if hn_eq : n = K * m - 1 then argmax (empMean' n h)
    else (h โŸจn, by simpโŸฉ).1
Body uses (3)
Used by (6)

Actions: Source ยท Open Issue

All dependencies, transitively (7)

nextAction๐Ÿ”—

DefinitionLearning.RoundRobin.nextAction

Action chosen by the Round-Robin algorithm at time n + 1. This is action (n + 1) % K.

๐Ÿ”—def
Learning.RoundRobin.nextAction {K : โ„•} (hK : 0 < K) (n : โ„•) : Fin K
Learning.RoundRobin.nextAction {K : โ„•} (hK : 0 < K) (n : โ„•) : Fin K

Code

noncomputable
def RoundRobin.nextAction (hK : 0 < K) (n : โ„•) : Fin K := โŸจ(n + 1) % K, Nat.mod_lt _ hKโŸฉ
Used by (14)

Actions: Source ยท Open Issue

max๐Ÿ”—

DefinitionFunction.max

The maximum value of a tuple.

๐Ÿ”—def
Function.max.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : ฮฑ
Function.max.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : ฮฑ

Code

abbrev max : ฮฑ := univ.sup' univ_nonempty f
Used by (8)

Actions: Source ยท Open Issue

exists_argmax๐Ÿ”—

Lemmaexists_argmax

No docstring.

๐Ÿ”—theorem
exists_argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : โˆƒ i, f i = Function.max f
exists_argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : โˆƒ i, f i = Function.max f

Code

lemma exists_argmax : โˆƒ i, f i = f.max
Type uses (1)
Used by (3)

Actions: Source ยท Open Issue

Proof
by
  obtain โŸจi, -, hiโŸฉ := Finset.exists_mem_eq_sup' (by simp : Finset.univ.Nonempty) f
  exact โŸจi, hi.symmโŸฉ

argmax๐Ÿ”—

Definitionargmax

The index of the maximum value of a tuple.

๐Ÿ”—def
argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : ฮน
argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : ฮน

Code

noncomputable def argmax := (exists_argmax f).choose
Body uses (2)
Used by (17)

Actions: Source ยท Open Issue

sumRewards'๐Ÿ”—

DefinitionLearning.sumRewards'

Sum of rewards of arm a up to (and including) time n.

๐Ÿ”—def
Learning.sumRewards'.{u_1} {๐“ : Type u_1} [DecidableEq ๐“] (n : โ„•) (h : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— โ„) (a : ๐“) : โ„
Learning.sumRewards'.{u_1} {๐“ : Type u_1} [DecidableEq ๐“] (n : โ„•) (h : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— โ„) (a : ๐“) : โ„

Code

noncomputable
def sumRewards' (n : โ„•) (h : Iic n โ†’ ๐“ ร— โ„) (a : ๐“) :=
  โˆ‘ s, if (h s).1 = a then (h s).2 else 0
Used by (9)

Actions: Source ยท Open Issue

pullCount'๐Ÿ”—

DefinitionLearning.pullCount'

Number of pulls of arm a up to (and including) time n. This is the number of entries in h in which the arm is a.

๐Ÿ”—def
Learning.pullCount'.{u_1, u_2} {๐“ : Type u_1} {R : Type u_2} [DecidableEq ๐“] (n : โ„•) (h : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— R) (a : ๐“) : โ„•
Learning.pullCount'.{u_1, u_2} {๐“ : Type u_1} {R : Type u_2} [DecidableEq ๐“] (n : โ„•) (h : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— R) (a : ๐“) : โ„•

Code

noncomputable
def pullCount' (n : โ„•) (h : Iic n โ†’ ๐“ ร— R) (a : ๐“) := #{s | (h s).1 = a}
Used by (29)

Actions: Source ยท Open Issue

empMean'๐Ÿ”—

DefinitionLearning.empMean'

Empirical mean of arm a at time n.

๐Ÿ”—def
Learning.empMean'.{u_1} {๐“ : Type u_1} [DecidableEq ๐“] (n : โ„•) (h : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— โ„) (a : ๐“) : โ„
Learning.empMean'.{u_1} {๐“ : Type u_1} [DecidableEq ๐“] (n : โ„•) (h : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— โ„) (a : ๐“) : โ„

Code

noncomputable
def empMean' (n : โ„•) (h : Iic n โ†’ ๐“ ร— โ„) (a : ๐“) :=
  (sumRewards' n h a) / (pullCount' n h a)
Body uses (2)
Used by (18)

Actions: Source ยท Open Issue