`Bandits.ClippedUCB.sum_ucb_sub_mean_le`🔗

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

`sum_ucb_sub_mean_le`🔗

LemmaBandits.ClippedUCB.sum_ucb_sub_mean_le

Details

No docstring.

theorem

Bandits.ClippedUCB.sum_ucb_sub_mean_le.{u_1} {K : ℕ} {l u σ2 δ : ℝ}
  {Ω : Type u_1} {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} {n : ℕ} {ω : Ω}
  (μ : Fin K → ℝ) (hμ : ∀ (a : Fin K), μ a ∈ Set.Icc l u) (hi : l ≤ u)
  (hc :
    ∀ s < n,
      Learning.pullCount A (A s ω) s ω ≠ 0 →
        Learning.empMean A R (A s ω) s ω - μ (A s ω) <
          √(2 * σ2 * Real.log (1 / δ) /
              ↑(Learning.pullCount A (A s ω) s ω))) :
  ∑ s ∈ Finset.range n, (ucb A R l u σ2 δ (A s ω) s ω - μ (A s ω)) ≤
    (u - l) * ↑K + 4 * √(2 * σ2 * Real.log (1 / δ) * ↑K * ↑n)
Bandits.ClippedUCB.sum_ucb_sub_mean_le.{u_1}
  {K : ℕ} {l u σ2 δ : ℝ} {Ω : Type u_1}
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  {n : ℕ} {ω : Ω} (μ : Fin K → ℝ)
  (hμ : ∀ (a : Fin K), μ a ∈ Set.Icc l u)
  (hi : l ≤ u)
  (hc :
    ∀ s < n,
      Learning.pullCount A (A s ω) s ω ≠
          0 →
        Learning.empMean A R (A s ω) s ω -
            μ (A s ω) <
          √(2 * σ2 * Real.log (1 / δ) /
              ↑(Learning.pullCount A
                  (A s ω) s ω))) :
  ∑ s ∈ Finset.range n,
      (ucb A R l u σ2 δ (A s ω) s ω -
        μ (A s ω)) ≤
    (u - l) * ↑K +
      4 *
        √(2 * σ2 * Real.log (1 / δ) * ↑K *
            ↑n)

Code

lemma sum_ucb_sub_mean_le {n : ℕ} {ω : Ω} (μ : Fin K → ℝ) (hμ : ∀ a, μ a ∈ Set.Icc l u) (hi : l ≤ u)
    (hc : ∀ s < n, pullCount A (A s ω) s ω ≠ 0 → empMean A R (A s ω) s ω - μ (A s ω)
      < √(2 * σ2 * Real.log (1 / δ) / (pullCount A (A s ω) s ω))) :
    ∑ s ∈ range n, (ucb A R l u σ2 δ (A s ω) s ω - μ (A s ω))
      ≤ (u - l) * K + 4 * √(2 * σ2 * Real.log (1 / δ) * K * n)

Type uses (3)

Body uses (3)

Used by (1)

integral_sum_range_ucb_action_sub_actionMean_action_le

Actions: Source · Open Issue

Proof

by
  let S₀ := {s ∈ range n | pullCount A (A s ω) s ω = 0}
  let S₁ := {s ∈ range n | pullCount A (A s ω) s ω ≠ 0}
  have hu : S₀ ∪ S₁ = range n := filter_union_filter_not_eq _ _
  have hd : Disjoint S₀ S₁ := disjoint_filter_filter_not _ _ _
  rw [← hu, sum_union hd]
  gcongr
  · calc ∑ s ∈ S₀, (ucb A R l u σ2 δ (A s ω) s ω - μ (A s ω))
        ≤ ∑ s ∈ S₀, (u - l) :=
          have (s : ℕ) : ucb A R l u σ2 δ (A s ω) s ω ∈ Set.Icc l u := ucb_mem_Icc hi
          sum_le_sum (by grind)
      _ = ∑ s ∈ range n, if pullCount A (A s ω) s ω = 0 then (u - l) else 0 := by
          rw [sum_filter]
      _ = ∑ a, ∑ j ∈ range (pullCount A a n ω), if j = 0 then (u - l) else 0 :=
          sum_comp_pullCount (fun j => if j = 0 then (u - l) else 0) n ω
      _ ≤ ∑ a, (u - l) := by
          gcongr
          rw [sum_ite_eq']
          grind
      _ = (u - l) * K := by
          rw [Fin.sum_const, nsmul_eq_mul, mul_comm]
  · calc ∑ s ∈ S₁, (ucb A R l u σ2 δ (A s ω) s ω - μ (A s ω))
          ≤ ∑ s ∈ S₁, 2 * √(2 * σ2 * Real.log (1 / δ) / (pullCount A (A s ω) s ω)) := by
            gcongr with s hs
            unfold ucb
            have : 0 ≤ √(2 * σ2 * Real.log (1 / δ) / (pullCount A (A s ω) s ω)) := by positivity
            grind
        _ ≤ ∑ s ∈ range n, 2 * √(2 * σ2 * Real.log (1 / δ) / (pullCount A (A s ω) s ω)) :=
            sum_le_sum_of_subset_of_nonneg (filter_subset _ _) (fun _ _ _ => by positivity)
        _ = 2 * √(2 * σ2 * Real.log (1 / δ)) * ∑ s ∈ range n, (1 / √(pullCount A (A s ω) s ω)) := by
            rw [mul_sum]
            congr with s
            rw [Real.sqrt_div' _ (by positivity)]
            ring
        _ = 2 * √(2 * σ2 * Real.log (1 / δ)) *
              ∑ a, ∑ j ∈ range (pullCount A a n ω), (1 / √j) := by
            rw [sum_comp_pullCount (fun j => 1 / √j)]
        _ ≤ 2 * √(2 * σ2 * Real.log (1 / δ)) * (2 * ∑ a, √(pullCount A a n ω)) := by -- loose
            rw [mul_sum _ _ 2]
            gcongr with a
            by_cases ha : pullCount A a n ω = 0
            · simp [ha]
            · have hi := sum_inv_sqrt_le (Nat.pos_of_ne_zero ha)
              rw [sum_range_succ] at hi
              have : 0 ≤ 1 / √(pullCount A a n ω) := by positivity
              linarith
        _ ≤ 2 * √(2 * σ2 * Real.log (1 / δ)) * (2 * √(K * ∑ a, (pullCount A a n ω))) := by
            gcongr
            have h := sum_sqrt_le Finset.univ (fun a => Nat.cast_nonneg (pullCount A a n ω))
            rw [Finset.card_fin] at h
            exact_mod_cast h
        _ = 2 * √(2 * σ2 * Real.log (1 / δ)) * (2 * √(K * n)) := by
            congr
            exact sum_pullCount (ω := ω)
        _ = 4 * √(2 * σ2 * Real.log (1 / δ) * K * n) := by
            ring_nf
            rw [← Real.sqrt_mul' _ (by positivity)]
            ring_nf

Dependency graph

Type dependencies (3)

`pullCount`🔗

DefinitionLearning.pullCount

Details

Number of times action a was chosen up to time t (excluding t).

def

Learning.pullCount.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (a : 𝓐) (t : ℕ) (ω : Ω) : ℕ
Learning.pullCount.{u_1, u_3}
  {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (a : 𝓐)
  (t : ℕ) (ω : Ω) : ℕ

Code

noncomputable
def pullCount (A : ℕ → Ω → 𝓐) (a : 𝓐) (t : ℕ) (ω : Ω) : ℕ :=
  #(filter (fun s ↦ A s ω = a) (range t))

Used by (146)

Actions: Source · Open Issue

`empMean`🔗

DefinitionLearning.empMean

Details

Empirical mean reward obtained when pulling action a up to time t (exclusive).

def

Learning.empMean.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ)
  (ω : Ω) : ℝ
Learning.empMean.{u_1, u_3} {𝓐 : Type u_1}
  {Ω : Type u_3} [DecidableEq 𝓐]
  (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐)
  (t : ℕ) (ω : Ω) : ℝ

Code

noncomputable
def empMean (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ) (ω : Ω) : ℝ :=
  sumRewards A R' a t ω / pullCount A a t ω

Body uses (2)

Used by (34)

Actions: Source · Open Issue

`ucb`🔗

DefinitionBandits.ClippedUCB.ucb

Details

Clipped upper confidence bound used in the regret analysis of Thompson sampling.

def

Bandits.ClippedUCB.ucb.{u_1} {K : ℕ} {Ω : Type u_1} (A : ℕ → Ω → Fin K)
  (R : ℕ → Ω → ℝ) (l u σ2 δ : ℝ) (a : Fin K) (n : ℕ) (ω : Ω) : ℝ
Bandits.ClippedUCB.ucb.{u_1} {K : ℕ}
  {Ω : Type u_1} (A : ℕ → Ω → Fin K)
  (R : ℕ → Ω → ℝ) (l u σ2 δ : ℝ)
  (a : Fin K) (n : ℕ) (ω : Ω) : ℝ

Code

noncomputable
def ucb (A : ℕ → Ω → Fin K) (R : ℕ → Ω → ℝ) (l u σ2 δ : ℝ) (a : Fin K) (n : ℕ) (ω : Ω) : ℝ :=
  if pullCount A a n ω = 0 then u
  else max l (min u (empMean A R a n ω + √(2 * σ2 * Real.log (1 / δ) / (pullCount A a n ω))))

Body uses (2)

Used by (12)

Actions: Source · Open Issue

All dependencies, transitively (1)

`sumRewards`🔗

DefinitionLearning.sumRewards

Details

Sum of rewards obtained when pulling action a up to time t (exclusive).

def

Learning.sumRewards.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ)
  (ω : Ω) : ℝ
Learning.sumRewards.{u_1, u_3}
  {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐)
  (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ)
  (ω : Ω) : ℝ

Code

def sumRewards (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ) (ω : Ω) : ℝ :=
  ∑ s ∈ range t, if A s ω = a then R' s ω else 0

Used by (44)

Actions: Source · Open Issue

Bandits.ClippedUCB.sum_ucb_sub_mean_le🔗

sum_ucb_sub_mean_le🔗

pullCount🔗

empMean🔗

ucb🔗

sumRewards🔗

`Bandits.ClippedUCB.sum_ucb_sub_mean_le`🔗

`sum_ucb_sub_mean_le`🔗

`pullCount`🔗

`empMean`🔗

`ucb`🔗

`sumRewards`🔗