`Bandits.ETC.measurable_nextArm`🔗

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

`measurable_nextArm`🔗

LemmaBandits.ETC.measurable_nextArm

Details

The next arm pulled by ETC is chosen in a measurable way.

theorem

Bandits.ETC.measurable_nextArm {K : ℕ} (hK : 0 < K) (m n : ℕ) :
  Measurable (nextArm hK m n)
Bandits.ETC.measurable_nextArm {K : ℕ}
  (hK : 0 < K) (m n : ℕ) :
  Measurable (nextArm hK m n)

Code

lemma ETC.measurable_nextArm (hK : 0 < K) (m n : ℕ) : Measurable (nextArm hK m n)

Type uses (1)

nextArm

Body uses (6)

Used by (3)

Actions: Source · Open Issue

Proof

by
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  unfold nextArm
  simp only [dite_eq_ite]
  refine Measurable.ite (by simp) (by fun_prop) ?_
  refine Measurable.ite (by simp) ?_ (by fun_prop)
  fun_prop

Dependency graph

Type dependencies (1)

`nextArm`🔗

DefinitionBandits.ETC.nextArm

Details

Arm pulled by the ETC algorithm at time n + 1. For n < K * m - 1, this is arm (n + 1) % K. For n = K * m - 1, this is the arm with the highest empirical mean after the exploration phase. For n ≥ K * m, this is the same arm as at time n.

def

Bandits.ETC.nextArm {K : ℕ} (hK : 0 < K) (m n : ℕ)
  (h : ↥(Finset.Iic n) → Fin K × ℝ) : Fin K
Bandits.ETC.nextArm {K : ℕ} (hK : 0 < K)
  (m n : ℕ)
  (h : ↥(Finset.Iic n) → Fin K × ℝ) :
  Fin K

Code

noncomputable
def ETC.nextArm (hK : 0 < K) (m n : ℕ) (h : Iic n → Fin K × ℝ) : Fin K :=
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  if hn : n < K * m - 1 then RoundRobin.nextAction hK n
  else
    if hn_eq : n = K * m - 1 then argmax (empMean' n h)
    else (h ⟨n, by simp⟩).1

Body uses (3)

Used by (6)

Actions: Source · Open Issue

All dependencies, transitively (7)

`nextAction`🔗

DefinitionLearning.RoundRobin.nextAction

Details

Action chosen by the Round-Robin algorithm at time n + 1. This is action (n + 1) % K.

def

Learning.RoundRobin.nextAction {K : ℕ} (hK : 0 < K) (n : ℕ) : Fin K
Learning.RoundRobin.nextAction {K : ℕ}
  (hK : 0 < K) (n : ℕ) : Fin K

Code

noncomputable
def RoundRobin.nextAction (hK : 0 < K) (n : ℕ) : Fin K := ⟨(n + 1) % K, Nat.mod_lt _ hK⟩

Used by (14)

Actions: Source · Open Issue

`max`🔗

DefinitionFunction.max

Details

The maximum value of a tuple.

def

Function.max.{u_1, u_2} {ι : Type u_1} {α : Type u_2} [LinearOrder α]
  [Fintype ι] [Nonempty ι] (f : ι → α) : α
Function.max.{u_1, u_2} {ι : Type u_1}
  {α : Type u_2} [LinearOrder α]
  [Fintype ι] [Nonempty ι] (f : ι → α) : α

Code

abbrev max : α := univ.sup' univ_nonempty f

Used by (8)

Actions: Source · Open Issue

`exists_argmax`🔗

Lemmaexists_argmax

Details

No docstring.

theorem

exists_argmax.{u_1, u_2} {ι : Type u_1} {α : Type u_2} [LinearOrder α]
  [Fintype ι] [Nonempty ι] (f : ι → α) : ∃ i, f i = Function.max f
exists_argmax.{u_1, u_2} {ι : Type u_1}
  {α : Type u_2} [LinearOrder α]
  [Fintype ι] [Nonempty ι] (f : ι → α) :
  ∃ i, f i = Function.max f

Code

lemma exists_argmax : ∃ i, f i = f.max

Type uses (1)

max

Used by (3)

Actions: Source · Open Issue

Proof

by
  obtain ⟨i, -, hi⟩ := Finset.exists_mem_eq_sup' (by simp : Finset.univ.Nonempty) f
  exact ⟨i, hi.symm⟩

`argmax`🔗

Definitionargmax

Details

The index of the maximum value of a tuple.

def

argmax.{u_1, u_2} {ι : Type u_1} {α : Type u_2} [LinearOrder α]
  [Fintype ι] [Nonempty ι] (f : ι → α) : ι
argmax.{u_1, u_2} {ι : Type u_1}
  {α : Type u_2} [LinearOrder α]
  [Fintype ι] [Nonempty ι] (f : ι → α) : ι

Code

noncomputable def argmax := (exists_argmax f).choose

Body uses (2)

Used by (17)

Actions: Source · Open Issue

`sumRewards'`🔗

DefinitionLearning.sumRewards'

Details

Sum of rewards of arm a up to (and including) time n.

def

Learning.sumRewards'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) : ℝ
Learning.sumRewards'.{u_1} {𝓐 : Type u_1}
  [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) :
  ℝ

Code

noncomputable
def sumRewards' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  ∑ s, if (h s).1 = a then (h s).2 else 0

Used by (9)

Actions: Source · Open Issue

`pullCount'`🔗

DefinitionLearning.pullCount'

Details

Number of pulls of arm a up to (and including) time n. This is the number of entries in h in which the arm is a.

def

Learning.pullCount'.{u_1, u_2} {𝓐 : Type u_1} {R : Type u_2}
  [DecidableEq 𝓐] (n : ℕ) (h : ↥(Finset.Iic n) → 𝓐 × R) (a : 𝓐) : ℕ
Learning.pullCount'.{u_1, u_2}
  {𝓐 : Type u_1} {R : Type u_2}
  [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × R) (a : 𝓐) :
  ℕ

Code

noncomputable
def pullCount' (n : ℕ) (h : Iic n → 𝓐 × R) (a : 𝓐) := #{s | (h s).1 = a}

Used by (29)

Actions: Source · Open Issue

`empMean'`🔗

DefinitionLearning.empMean'

Details

Empirical mean of arm a at time n.

def

Learning.empMean'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) : ℝ
Learning.empMean'.{u_1} {𝓐 : Type u_1}
  [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) :
  ℝ

Code

noncomputable
def empMean' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  (sumRewards' n h a) / (pullCount' n h a)

Body uses (2)

Used by (18)

Actions: Source · Open Issue

Bandits.ETC.measurable_nextArm🔗

measurable_nextArm🔗

nextArm🔗

nextAction🔗

max🔗

exists_argmax🔗

argmax🔗

sumRewards'🔗

pullCount'🔗

empMean'🔗

`Bandits.ETC.measurable_nextArm`🔗

`measurable_nextArm`🔗

`nextArm`🔗

`nextAction`🔗

`max`🔗

`exists_argmax`🔗

`argmax`🔗

`sumRewards'`🔗

`pullCount'`🔗

`empMean'`🔗