`Bandits.ClippedUCB.ucb'`🔗

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

`ucb'`🔗

DefinitionBandits.ClippedUCB.ucb'

Details

Clipped upper confidence bound (history-based version).

def

Bandits.ClippedUCB.ucb' {K : ℕ} (n : ℕ)
  (h : ↥(Finset.Iic n) → Fin K × ℝ) (l u σ2 δ : ℝ) (a : Fin K) : ℝ
Bandits.ClippedUCB.ucb' {K : ℕ} (n : ℕ)
  (h : ↥(Finset.Iic n) → Fin K × ℝ)
  (l u σ2 δ : ℝ) (a : Fin K) : ℝ

Code

noncomputable
def ucb' (n : ℕ) (h : Iic n → Fin K × ℝ) (l u σ2 δ : ℝ) (a : Fin K) : ℝ :=
  if pullCount' n h a = 0 then u
  else max l (min u (empMean' n h a + √(2 * σ2 * Real.log (1 / δ) / (pullCount' n h a))))

Body uses (2)

Used by (3)

Actions: Source · Open Issue

Dependency graph

All dependencies, transitively (3)

`pullCount'`🔗

DefinitionLearning.pullCount'

Details

Number of pulls of arm a up to (and including) time n. This is the number of entries in h in which the arm is a.

def

Learning.pullCount'.{u_1, u_2} {𝓐 : Type u_1} {R : Type u_2}
  [DecidableEq 𝓐] (n : ℕ) (h : ↥(Finset.Iic n) → 𝓐 × R) (a : 𝓐) : ℕ
Learning.pullCount'.{u_1, u_2}
  {𝓐 : Type u_1} {R : Type u_2}
  [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × R) (a : 𝓐) :
  ℕ

Code

noncomputable
def pullCount' (n : ℕ) (h : Iic n → 𝓐 × R) (a : 𝓐) := #{s | (h s).1 = a}

Used by (29)

Actions: Source · Open Issue

`sumRewards'`🔗

DefinitionLearning.sumRewards'

Details

Sum of rewards of arm a up to (and including) time n.

def

Learning.sumRewards'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) : ℝ
Learning.sumRewards'.{u_1} {𝓐 : Type u_1}
  [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) :
  ℝ

Code

noncomputable
def sumRewards' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  ∑ s, if (h s).1 = a then (h s).2 else 0

Used by (9)

Actions: Source · Open Issue

`empMean'`🔗

DefinitionLearning.empMean'

Details

Empirical mean of arm a at time n.

def

Learning.empMean'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) : ℝ
Learning.empMean'.{u_1} {𝓐 : Type u_1}
  [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) :
  ℝ

Code

noncomputable
def empMean' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  (sumRewards' n h a) / (pullCount' n h a)

Body uses (2)

Used by (18)

Actions: Source · Open Issue

Bandits.ClippedUCB.ucb'🔗

ucb'🔗

pullCount'🔗

sumRewards'🔗

empMean'🔗

`Bandits.ClippedUCB.ucb'`🔗

`ucb'`🔗

`pullCount'`🔗

`sumRewards'`🔗

`empMean'`🔗