`Bandits.ClippedUCB.measurable_uncurry_ucb'`🔗

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

`measurable_uncurry_ucb'`🔗

LemmaBandits.ClippedUCB.measurable_uncurry_ucb'

Details

No docstring.

theorem

Bandits.ClippedUCB.measurable_uncurry_ucb' {K : ℕ} {l u σ2 δ : ℝ}
  {n : ℕ} :
  Measurable fun p => ucb' n (Prod.fst p) l u σ2 δ (Prod.snd p)
Bandits.ClippedUCB.measurable_uncurry_ucb'
  {K : ℕ} {l u σ2 δ : ℝ} {n : ℕ} :
  Measurable fun p =>
    ucb' n (Prod.fst p) l u σ2 δ
      (Prod.snd p)

Code

lemma measurable_uncurry_ucb' {n : ℕ} :
    Measurable (fun p : (Iic n → Fin K × ℝ) × Fin K ↦ ucb' n p.1 l u σ2 δ p.2)

Type uses (1)

ucb'

Body uses (5)

Used by (1)

integral_ucb_action_eq_integral_ucb_bestAction

Actions: Source · Open Issue

Proof

Measurable.ite (by measurability) (by fun_prop) (by fun_prop)

Dependency graph

Type dependencies (1)

`ucb'`🔗

DefinitionBandits.ClippedUCB.ucb'

Details

Clipped upper confidence bound (history-based version).

def

Bandits.ClippedUCB.ucb' {K : ℕ} (n : ℕ)
  (h : ↥(Finset.Iic n) → Fin K × ℝ) (l u σ2 δ : ℝ) (a : Fin K) : ℝ
Bandits.ClippedUCB.ucb' {K : ℕ} (n : ℕ)
  (h : ↥(Finset.Iic n) → Fin K × ℝ)
  (l u σ2 δ : ℝ) (a : Fin K) : ℝ

Code

noncomputable
def ucb' (n : ℕ) (h : Iic n → Fin K × ℝ) (l u σ2 δ : ℝ) (a : Fin K) : ℝ :=
  if pullCount' n h a = 0 then u
  else max l (min u (empMean' n h a + √(2 * σ2 * Real.log (1 / δ) / (pullCount' n h a))))

Body uses (2)

Used by (3)

Actions: Source · Open Issue

All dependencies, transitively (3)

`pullCount'`🔗

DefinitionLearning.pullCount'

Details

Number of pulls of arm a up to (and including) time n. This is the number of entries in h in which the arm is a.

def

Learning.pullCount'.{u_1, u_2} {𝓐 : Type u_1} {R : Type u_2}
  [DecidableEq 𝓐] (n : ℕ) (h : ↥(Finset.Iic n) → 𝓐 × R) (a : 𝓐) : ℕ
Learning.pullCount'.{u_1, u_2}
  {𝓐 : Type u_1} {R : Type u_2}
  [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × R) (a : 𝓐) :
  ℕ

Code

noncomputable
def pullCount' (n : ℕ) (h : Iic n → 𝓐 × R) (a : 𝓐) := #{s | (h s).1 = a}

Used by (29)

Actions: Source · Open Issue

`sumRewards'`🔗

DefinitionLearning.sumRewards'

Details

Sum of rewards of arm a up to (and including) time n.

def

Learning.sumRewards'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) : ℝ
Learning.sumRewards'.{u_1} {𝓐 : Type u_1}
  [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) :
  ℝ

Code

noncomputable
def sumRewards' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  ∑ s, if (h s).1 = a then (h s).2 else 0

Used by (9)

Actions: Source · Open Issue

`empMean'`🔗

DefinitionLearning.empMean'

Details

Empirical mean of arm a at time n.

def

Learning.empMean'.{u_1} {𝓐 : Type u_1} [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) : ℝ
Learning.empMean'.{u_1} {𝓐 : Type u_1}
  [DecidableEq 𝓐] (n : ℕ)
  (h : ↥(Finset.Iic n) → 𝓐 × ℝ) (a : 𝓐) :
  ℝ

Code

noncomputable
def empMean' (n : ℕ) (h : Iic n → 𝓐 × ℝ) (a : 𝓐) :=
  (sumRewards' n h a) / (pullCount' n h a)

Body uses (2)

Used by (18)

Actions: Source · Open Issue

Bandits.ClippedUCB.measurable_uncurry_ucb'🔗

measurable_uncurry_ucb'🔗

ucb'🔗

pullCount'🔗

sumRewards'🔗

empMean'🔗

`Bandits.ClippedUCB.measurable_uncurry_ucb'`🔗

`measurable_uncurry_ucb'`🔗

`ucb'`🔗

`pullCount'`🔗

`sumRewards'`🔗

`empMean'`🔗