LeanMachineLearning exposition

2.6.Β Online.Bandit.Algorithms.Regret.BayesRegretTSπŸ”—

Bayesian regret of Thompson sampling

This file provides a Bayesian regret upper bound (integral_regret_le) for Thompson sampling under the assumption (among others) that it has the correct prior over environments.

The Bayesian regret upper bound relies on a clipped upper confidence bound whose definition and properties are also given in this file.

Main definitions

  • ucb A R l u Οƒ2 Ξ΄ a n : clipped upper confidence bound used in the regret analysis of Thompson sampling for a sequence of actions A : β„• β†’ Ξ© β†’ Fin K, rewards R : β„• β†’ Ξ© β†’ ℝ, reward lower bound l : ℝ, reward upper bound u : ℝ, sub-Gaussian variance proxy Οƒ2 : ℝ, confidence parameter Ξ΄ : ℝ, action a : Fin K, and time n : β„•.

  • ucb' n h l u Οƒ2 Ξ΄ a: clipped upper confidence bound for action a : Fin K at time n : β„• given the history h : Iic n β†’ Fin K Γ— ℝ (rather than the entire sequences of actions and rewards).

Main results

  • integral_regret_le: if Thompson sampling has the correct prior over environments and every environment has K actions, each of which has a corresponding reward between l and u that is sub-Gaussian with variance proxy Οƒ2 after its mean is subtracted, then the Bayesian regret at time n is at most (2 * K + 1) * (u - l) + 8 * √(Οƒ2 * K * n * Real.log n).

Module LeanMachineLearning.Online.Bandit.Algorithms.Regret.BayesRegretTS contains 15 exposed declarations.

ucbπŸ”—

DefinitionBandits.ClippedUCB.ucb

Clipped upper confidence bound used in the regret analysis of Thompson sampling.

πŸ”—def
Bandits.ClippedUCB.ucb.{u_1} {K : β„•} {Ξ© : Type u_1} (A : β„• β†’ Ξ© β†’ Fin K) (R : β„• β†’ Ξ© β†’ ℝ) (l u Οƒ2 Ξ΄ : ℝ) (a : Fin K) (n : β„•) (Ο‰ : Ξ©) : ℝ
Bandits.ClippedUCB.ucb.{u_1} {K : β„•} {Ξ© : Type u_1} (A : β„• β†’ Ξ© β†’ Fin K) (R : β„• β†’ Ξ© β†’ ℝ) (l u Οƒ2 Ξ΄ : ℝ) (a : Fin K) (n : β„•) (Ο‰ : Ξ©) : ℝ

Code

noncomputable
def ucb (A : β„• β†’ Ξ© β†’ Fin K) (R : β„• β†’ Ξ© β†’ ℝ) (l u Οƒ2 Ξ΄ : ℝ) (a : Fin K) (n : β„•) (Ο‰ : Ξ©) : ℝ :=
  if pullCount A a n Ο‰ = 0 then u
  else max l (min u (empMean A R a n Ο‰ + √(2 * Οƒ2 * Real.log (1 / Ξ΄) / (pullCount A a n Ο‰))))
Body uses (2)
Used by (12)

Actions: Source Β· Open Issue

ucb_zeroπŸ”—

LemmaBandits.ClippedUCB.ucb_zero

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.ucb_zero.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {a : Fin K} {Ο‰ : Ξ©} : ucb A R l u Οƒ2 Ξ΄ a 0 Ο‰ = u
Bandits.ClippedUCB.ucb_zero.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {a : Fin K} {Ο‰ : Ξ©} : ucb A R l u Οƒ2 Ξ΄ a 0 Ο‰ = u

Code

lemma ucb_zero {a : Fin K} {Ο‰ : Ξ©} : ucb A R l u Οƒ2 Ξ΄ a 0 Ο‰ = u
Type uses (1)
Body uses (3)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  simp [ucb]

ucb_mem_IccπŸ”—

LemmaBandits.ClippedUCB.ucb_mem_Icc

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.ucb_mem_Icc.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} (h : l ≀ u) {a : Fin K} {n : β„•} {Ο‰ : Ξ©} : ucb A R l u Οƒ2 Ξ΄ a n Ο‰ ∈ Set.Icc l u
Bandits.ClippedUCB.ucb_mem_Icc.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} (h : l ≀ u) {a : Fin K} {n : β„•} {Ο‰ : Ξ©} : ucb A R l u Οƒ2 Ξ΄ a n Ο‰ ∈ Set.Icc l u

Code

lemma ucb_mem_Icc (h : l ≀ u) {a : Fin K} {n : β„•} {Ο‰ : Ξ©} :
    ucb A R l u Οƒ2 Ξ΄ a n Ο‰ ∈ Set.Icc l u
Type uses (1)
Body uses (2)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  unfold ucb
  grind

measurable_ucbπŸ”—

LemmaBandits.ClippedUCB.measurable_ucb

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.measurable_ucb.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [MeasurableSpace Ξ©] {a : Fin K} {n : β„•} (hA : βˆ€ (t : β„•), Measurable (A t)) (hR : βˆ€ (t : β„•), Measurable (R t)) : Measurable (ucb A R l u Οƒ2 Ξ΄ a n)
Bandits.ClippedUCB.measurable_ucb.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [MeasurableSpace Ξ©] {a : Fin K} {n : β„•} (hA : βˆ€ (t : β„•), Measurable (A t)) (hR : βˆ€ (t : β„•), Measurable (R t)) : Measurable (ucb A R l u Οƒ2 Ξ΄ a n)

Code

lemma measurable_ucb [MeasurableSpace Ξ©] {a : Fin K} {n : β„•} (hA : βˆ€ t, Measurable (A t))
    (hR : βˆ€ t, Measurable (R t)) : Measurable (ucb A R l u Οƒ2 Ξ΄ a n)
Type uses (1)
Body uses (4)
Used by (1)

Actions: Source Β· Open Issue

Proof
Measurable.ite (by measurability) (by fun_prop) (by fun_prop)

measurable_uncurry_ucb_compπŸ”—

LemmaBandits.ClippedUCB.measurable_uncurry_ucb_comp

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.measurable_uncurry_ucb_comp.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [MeasurableSpace Ξ©] (hA : βˆ€ (t : β„•), Measurable (A t)) (hR : βˆ€ (t : β„•), Measurable (R t)) {f : Ξ© β†’ Fin K} (hf : Measurable f) {g : Ξ© β†’ β„•} (hg : Measurable g) : Measurable fun Ο‰ => ucb A R l u Οƒ2 Ξ΄ (f Ο‰) (g Ο‰) Ο‰
Bandits.ClippedUCB.measurable_uncurry_ucb_comp.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [MeasurableSpace Ξ©] (hA : βˆ€ (t : β„•), Measurable (A t)) (hR : βˆ€ (t : β„•), Measurable (R t)) {f : Ξ© β†’ Fin K} (hf : Measurable f) {g : Ξ© β†’ β„•} (hg : Measurable g) : Measurable fun Ο‰ => ucb A R l u Οƒ2 Ξ΄ (f Ο‰) (g Ο‰) Ο‰

Code

lemma measurable_uncurry_ucb_comp [MeasurableSpace Ξ©] (hA : βˆ€ t, Measurable (A t))
    (hR : βˆ€ t, Measurable (R t)) {f : Ξ© β†’ Fin K} (hf : Measurable f) {g : Ξ© β†’ β„•}
    (hg : Measurable g) : Measurable (fun Ο‰ ↦ ucb A R l u Οƒ2 Ξ΄ (f Ο‰) (g Ο‰) Ο‰)
Type uses (1)
Body uses (1)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  change Measurable ((fun aΟ‰ ↦ ucb A R l u Οƒ2 Ξ΄ aΟ‰.1 (g aΟ‰.2) aΟ‰.2) ∘ fun Ο‰ ↦ (f Ο‰, Ο‰))
  apply Measurable.comp _ (by fun_prop)
  apply measurable_from_prod_countable_right
  intro a
  change Measurable ((fun tΟ‰ ↦ ucb A R l u Οƒ2 Ξ΄ a tΟ‰.1 tΟ‰.2) ∘ fun Ο‰ ↦ (g Ο‰, Ο‰))
  apply Measurable.comp _ (by fun_prop)
  exact measurable_from_prod_countable_right (fun _ ↦ measurable_ucb hA hR)

integrable_uncurry_ucb_compπŸ”—

LemmaBandits.ClippedUCB.integrable_uncurry_ucb_comp

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.integrable_uncurry_ucb_comp.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [MeasurableSpace Ξ©] (hA : βˆ€ (t : β„•), Measurable (A t)) (hR : βˆ€ (t : β„•), Measurable (R t)) {f : Ξ© β†’ Fin K} (hf : Measurable f) {g : Ξ© β†’ β„•} (hg : Measurable g) {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsFiniteMeasure P] : MeasureTheory.Integrable (fun Ο‰ => ucb A R l u Οƒ2 Ξ΄ (f Ο‰) (g Ο‰) Ο‰) P
Bandits.ClippedUCB.integrable_uncurry_ucb_comp.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [MeasurableSpace Ξ©] (hA : βˆ€ (t : β„•), Measurable (A t)) (hR : βˆ€ (t : β„•), Measurable (R t)) {f : Ξ© β†’ Fin K} (hf : Measurable f) {g : Ξ© β†’ β„•} (hg : Measurable g) {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsFiniteMeasure P] : MeasureTheory.Integrable (fun Ο‰ => ucb A R l u Οƒ2 Ξ΄ (f Ο‰) (g Ο‰) Ο‰) P

Code

lemma integrable_uncurry_ucb_comp [MeasurableSpace Ξ©] (hA : βˆ€ t, Measurable (A t))
    (hR : βˆ€ t, Measurable (R t)) {f : Ξ© β†’ Fin K} (hf : Measurable f) {g : Ξ© β†’ β„•}
    (hg : Measurable g) {P : Measure Ξ©} [IsFiniteMeasure P] :
    Integrable (fun Ο‰ ↦ ucb A R l u Οƒ2 Ξ΄ (f Ο‰) (g Ο‰) Ο‰) P
Type uses (1)
Body uses (3)
Used by (3)

Actions: Source Β· Open Issue

Proof
by
  refine ⟨(measurable_uncurry_ucb_comp hA hR hf hg).aestronglyMeasurable, ?_⟩
  apply HasFiniteIntegral.of_bounded (C := max |l| |u|)
  filter_upwards with Ο‰
  rw [Real.norm_eq_abs]
  unfold ucb
  grind

ucb'πŸ”—

DefinitionBandits.ClippedUCB.ucb'

Clipped upper confidence bound (history-based version).

πŸ”—def
Bandits.ClippedUCB.ucb' {K : β„•} (n : β„•) (h : β†₯(Finset.Iic n) β†’ Fin K Γ— ℝ) (l u Οƒ2 Ξ΄ : ℝ) (a : Fin K) : ℝ
Bandits.ClippedUCB.ucb' {K : β„•} (n : β„•) (h : β†₯(Finset.Iic n) β†’ Fin K Γ— ℝ) (l u Οƒ2 Ξ΄ : ℝ) (a : Fin K) : ℝ

Code

noncomputable
def ucb' (n : β„•) (h : Iic n β†’ Fin K Γ— ℝ) (l u Οƒ2 Ξ΄ : ℝ) (a : Fin K) : ℝ :=
  if pullCount' n h a = 0 then u
  else max l (min u (empMean' n h a + √(2 * Οƒ2 * Real.log (1 / Ξ΄) / (pullCount' n h a))))
Body uses (2)
Used by (3)

Actions: Source Β· Open Issue

measurable_uncurry_ucb'πŸ”—

LemmaBandits.ClippedUCB.measurable_uncurry_ucb'

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.measurable_uncurry_ucb' {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {n : β„•} : Measurable fun p => ucb' n (Prod.fst p) l u Οƒ2 Ξ΄ (Prod.snd p)
Bandits.ClippedUCB.measurable_uncurry_ucb' {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {n : β„•} : Measurable fun p => ucb' n (Prod.fst p) l u Οƒ2 Ξ΄ (Prod.snd p)

Code

lemma measurable_uncurry_ucb' {n : β„•} :
    Measurable (fun p : (Iic n β†’ Fin K Γ— ℝ) Γ— Fin K ↦ ucb' n p.1 l u Οƒ2 Ξ΄ p.2)
Type uses (1)
Body uses (5)
Used by (1)

Actions: Source Β· Open Issue

Proof
Measurable.ite (by measurability) (by fun_prop) (by fun_prop)

ucb_succ_eq_ucb'πŸ”—

LemmaBandits.ClippedUCB.ucb_succ_eq_ucb'

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.ucb_succ_eq_ucb'.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {a : Fin K} {n : β„•} {Ο‰ : Ξ©} : ucb A R l u Οƒ2 Ξ΄ a (n + 1) Ο‰ = ucb' n (Learning.history A R n Ο‰) l u Οƒ2 Ξ΄ a
Bandits.ClippedUCB.ucb_succ_eq_ucb'.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {a : Fin K} {n : β„•} {Ο‰ : Ξ©} : ucb A R l u Οƒ2 Ξ΄ a (n + 1) Ο‰ = ucb' n (Learning.history A R n Ο‰) l u Οƒ2 Ξ΄ a

Code

lemma ucb_succ_eq_ucb' {a : Fin K} {n : β„•} {Ο‰ : Ξ©} :
    ucb A R l u Οƒ2 Ξ΄ a (n + 1) Ο‰ = ucb' n (history A R n Ο‰) l u Οƒ2 Ξ΄ a
Type uses (3)
Body uses (6)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  have hp : pullCount A a (n + 1) Ο‰ = pullCount' n (history A R n Ο‰) a :=
    pullCount_add_one_eq_pullCount'
  have he : empMean A R a (n + 1) Ο‰ = empMean' n (history A R n Ο‰) a :=
    empMean_add_one_eq_empMean'
  rw [ucb, ucb', hp, he]

sum_ucb_sub_mean_leπŸ”—

LemmaBandits.ClippedUCB.sum_ucb_sub_mean_le

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.sum_ucb_sub_mean_le.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {n : β„•} {Ο‰ : Ξ©} (ΞΌ : Fin K β†’ ℝ) (hΞΌ : βˆ€ (a : Fin K), ΞΌ a ∈ Set.Icc l u) (hi : l ≀ u) (hc : βˆ€ s < n, Learning.pullCount A (A s Ο‰) s Ο‰ β‰  0 β†’ Learning.empMean A R (A s Ο‰) s Ο‰ - ΞΌ (A s Ο‰) < √(2 * Οƒ2 * Real.log (1 / Ξ΄) / ↑(Learning.pullCount A (A s Ο‰) s Ο‰))) : βˆ‘ s ∈ Finset.range n, (ucb A R l u Οƒ2 Ξ΄ (A s Ο‰) s Ο‰ - ΞΌ (A s Ο‰)) ≀ (u - l) * ↑K + 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * ↑K * ↑n)
Bandits.ClippedUCB.sum_ucb_sub_mean_le.{u_1} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {n : β„•} {Ο‰ : Ξ©} (ΞΌ : Fin K β†’ ℝ) (hΞΌ : βˆ€ (a : Fin K), ΞΌ a ∈ Set.Icc l u) (hi : l ≀ u) (hc : βˆ€ s < n, Learning.pullCount A (A s Ο‰) s Ο‰ β‰  0 β†’ Learning.empMean A R (A s Ο‰) s Ο‰ - ΞΌ (A s Ο‰) < √(2 * Οƒ2 * Real.log (1 / Ξ΄) / ↑(Learning.pullCount A (A s Ο‰) s Ο‰))) : βˆ‘ s ∈ Finset.range n, (ucb A R l u Οƒ2 Ξ΄ (A s Ο‰) s Ο‰ - ΞΌ (A s Ο‰)) ≀ (u - l) * ↑K + 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * ↑K * ↑n)

Code

lemma sum_ucb_sub_mean_le {n : β„•} {Ο‰ : Ξ©} (ΞΌ : Fin K β†’ ℝ) (hΞΌ : βˆ€ a, ΞΌ a ∈ Set.Icc l u) (hi : l ≀ u)
    (hc : βˆ€ s < n, pullCount A (A s Ο‰) s Ο‰ β‰  0 β†’ empMean A R (A s Ο‰) s Ο‰ - ΞΌ (A s Ο‰)
      < √(2 * Οƒ2 * Real.log (1 / Ξ΄) / (pullCount A (A s Ο‰) s Ο‰))) :
    βˆ‘ s ∈ range n, (ucb A R l u Οƒ2 Ξ΄ (A s Ο‰) s Ο‰ - ΞΌ (A s Ο‰))
      ≀ (u - l) * K + 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * K * n)
Type uses (3)
Body uses (3)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  let Sβ‚€ := {s ∈ range n | pullCount A (A s Ο‰) s Ο‰ = 0}
  let S₁ := {s ∈ range n | pullCount A (A s Ο‰) s Ο‰ β‰  0}
  have hu : Sβ‚€ βˆͺ S₁ = range n := filter_union_filter_not_eq _ _
  have hd : Disjoint Sβ‚€ S₁ := disjoint_filter_filter_not _ _ _
  rw [← hu, sum_union hd]
  gcongr
  Β· calc βˆ‘ s ∈ Sβ‚€, (ucb A R l u Οƒ2 Ξ΄ (A s Ο‰) s Ο‰ - ΞΌ (A s Ο‰))
        ≀ βˆ‘ s ∈ Sβ‚€, (u - l) :=
          have (s : β„•) : ucb A R l u Οƒ2 Ξ΄ (A s Ο‰) s Ο‰ ∈ Set.Icc l u := ucb_mem_Icc hi
          sum_le_sum (by grind)
      _ = βˆ‘ s ∈ range n, if pullCount A (A s Ο‰) s Ο‰ = 0 then (u - l) else 0 := by
          rw [sum_filter]
      _ = βˆ‘ a, βˆ‘ j ∈ range (pullCount A a n Ο‰), if j = 0 then (u - l) else 0 :=
          sum_comp_pullCount (fun j => if j = 0 then (u - l) else 0) n Ο‰
      _ ≀ βˆ‘ a, (u - l) := by
          gcongr
          rw [sum_ite_eq']
          grind
      _ = (u - l) * K := by
          rw [Fin.sum_const, nsmul_eq_mul, mul_comm]
  Β· calc βˆ‘ s ∈ S₁, (ucb A R l u Οƒ2 Ξ΄ (A s Ο‰) s Ο‰ - ΞΌ (A s Ο‰))
          ≀ βˆ‘ s ∈ S₁, 2 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) / (pullCount A (A s Ο‰) s Ο‰)) := by
            gcongr with s hs
            unfold ucb
            have : 0 ≀ √(2 * Οƒ2 * Real.log (1 / Ξ΄) / (pullCount A (A s Ο‰) s Ο‰)) := by positivity
            grind
        _ ≀ βˆ‘ s ∈ range n, 2 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) / (pullCount A (A s Ο‰) s Ο‰)) :=
            sum_le_sum_of_subset_of_nonneg (filter_subset _ _) (fun _ _ _ => by positivity)
        _ = 2 * √(2 * Οƒ2 * Real.log (1 / Ξ΄)) * βˆ‘ s ∈ range n, (1 / √(pullCount A (A s Ο‰) s Ο‰)) := by
            rw [mul_sum]
            congr with s
            rw [Real.sqrt_div' _ (by positivity)]
            ring
        _ = 2 * √(2 * Οƒ2 * Real.log (1 / Ξ΄)) *
              βˆ‘ a, βˆ‘ j ∈ range (pullCount A a n Ο‰), (1 / √j) := by
            rw [sum_comp_pullCount (fun j => 1 / √j)]
        _ ≀ 2 * √(2 * Οƒ2 * Real.log (1 / Ξ΄)) * (2 * βˆ‘ a, √(pullCount A a n Ο‰)) := by -- loose
            rw [mul_sum _ _ 2]
            gcongr with a
            by_cases ha : pullCount A a n Ο‰ = 0
            Β· simp [ha]
            Β· have hi := sum_inv_sqrt_le (Nat.pos_of_ne_zero ha)
              rw [sum_range_succ] at hi
              have : 0 ≀ 1 / √(pullCount A a n Ο‰) := by positivity
              linarith
        _ ≀ 2 * √(2 * Οƒ2 * Real.log (1 / Ξ΄)) * (2 * √(K * βˆ‘ a, (pullCount A a n Ο‰))) := by
            gcongr
            have h := sum_sqrt_le Finset.univ (fun a => Nat.cast_nonneg (pullCount A a n Ο‰))
            rw [Finset.card_fin] at h
            exact_mod_cast h
        _ = 2 * √(2 * Οƒ2 * Real.log (1 / Ξ΄)) * (2 * √(K * n)) := by
            congr
            exact sum_pullCount (Ο‰ := Ο‰)
        _ = 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * K * n) := by
            ring_nf
            rw [← Real.sqrt_mul' _ (by positivity)]
            ring_nf

integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_leπŸ”—

LemmaBandits.ClippedUCB.integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le.{u_1, u_2} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [Nonempty (Fin K)] [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] {E : Ξ© β†’ 𝓔} {Q : MeasureTheory.Measure 𝓔} {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm (Fin K) ℝ} (h : Learning.IsBayesAlgEnvSeq Q ΞΊ alg E A R P) (hlu : l ≀ u) (hm : βˆ€ (e : 𝓔) (a : Fin K), ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a) ∈ Set.Icc l u) (hΟƒ2 : 0 < Οƒ2) (hs : βˆ€ (e : 𝓔) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a)) βŸ¨Οƒ2, β‹―βŸ© (ΞΊ (e, a))) (hΞ΄ : 0 < Ξ΄) (n : β„•) : ∫ (x : Ξ©), (fun Ο‰ => βˆ‘ t ∈ Finset.range n, (Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) Ο‰ - ucb A R l u Οƒ2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) t Ο‰)) x βˆ‚P ≀ (u - l) * (↑n - 1) * ↑n * Ξ΄
Bandits.ClippedUCB.integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le.{u_1, u_2} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [Nonempty (Fin K)] [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] {E : Ξ© β†’ 𝓔} {Q : MeasureTheory.Measure 𝓔} {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm (Fin K) ℝ} (h : Learning.IsBayesAlgEnvSeq Q ΞΊ alg E A R P) (hlu : l ≀ u) (hm : βˆ€ (e : 𝓔) (a : Fin K), ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a) ∈ Set.Icc l u) (hΟƒ2 : 0 < Οƒ2) (hs : βˆ€ (e : 𝓔) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a)) βŸ¨Οƒ2, β‹―βŸ© (ΞΊ (e, a))) (hΞ΄ : 0 < Ξ΄) (n : β„•) : ∫ (x : Ξ©), (fun Ο‰ => βˆ‘ t ∈ Finset.range n, (Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) Ο‰ - ucb A R l u Οƒ2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) t Ο‰)) x βˆ‚P ≀ (u - l) * (↑n - 1) * ↑n * Ξ΄

Code

lemma integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le {alg : Algorithm (Fin K) ℝ}
    (h : IsBayesAlgEnvSeq Q ΞΊ alg E A R P)
    (hlu : l ≀ u) (hm : βˆ€ e a, (ΞΊ (e, a))[id] ∈ (Set.Icc l u)) (hΟƒ2 : 0 < Οƒ2)
    (hs : βˆ€ e a, HasSubgaussianMGF (fun x ↦ x - (ΞΊ (e, a))[id]) βŸ¨Οƒ2, hΟƒ2.le⟩ (ΞΊ (e, a)))
    (hΞ΄ : 0 < Ξ΄) (n : β„•) :
    P[fun Ο‰ ↦ βˆ‘ t ∈ range n,
      (actionMean ΞΊ E (bestAction ΞΊ E Ο‰) Ο‰ - ucb A R l u Οƒ2 Ξ΄ (bestAction ΞΊ E Ο‰) t Ο‰)] ≀
        (u - l) * (n - 1) * n * Ξ΄
Type uses (5)
Body uses (9)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  by_cases hn : n = 0
  Β· simp [hn]
  let F := {Ο‰ | βˆƒ t < n, pullCount A (bestAction ΞΊ E Ο‰) t Ο‰ β‰  0 ∧
    empMean A R (bestAction ΞΊ E Ο‰) t Ο‰ - actionMean ΞΊ E (bestAction ΞΊ E Ο‰) Ο‰ ≀
      -√(2 * Οƒ2 * Real.log (1 / Ξ΄) / (pullCount A (bestAction ΞΊ E Ο‰) t Ο‰))}
  have := h.measurable_action
  have := h.measurable_param
  have := h.measurable_feedback
  have hF : MeasurableSet F := by measurability
  have : Integrable (fun Ο‰ ↦ actionMean ΞΊ E (bestAction ΞΊ E Ο‰) Ο‰) P :=
      IsBayesAlgEnvSeq.integrable_uncurry_actionMean_comp (by fun_prop) (by fun_prop) hm
  calc
    _ ≀ ∫ Ο‰ in F, βˆ‘ t ∈ range n,
          (actionMean ΞΊ E (bestAction ΞΊ E Ο‰) Ο‰ - ucb A R l u Οƒ2 Ξ΄ (bestAction ΞΊ E Ο‰) t Ο‰) βˆ‚P := by
        rw [← integral_add_compl hF (by fun_prop)]
        apply add_le_of_nonpos_right
        apply setIntegral_nonpos hF.compl
        intro ω hω
        apply sum_nonpos
        intro t ht
        rw [Set.mem_compl_iff, Set.mem_setOf_eq] at hω
        push Not at hω
        grind [hω t (mem_range.mp ht), ucb, actionMean]
    _ ≀ ∫ Ο‰ in F, βˆ‘ t ∈ range n, (u - l) βˆ‚P := by
        apply setIntegral_mono_on (Integrable.integrableOn (by fun_prop))
          (Integrable.integrableOn (by fun_prop)) hF
        intro ω hω
        apply sum_le_sum
        intro t ht
        grind [actionMean, ucb]
    _ = P.real F * (n * (u - l)) := by
        simp_rw [sum_const, card_range, nsmul_eq_mul, setIntegral_const, smul_eq_mul]
    _ ≀ ((n - 1) * Ξ΄) * (n * (u - l)) := by
        gcongr
        have : (1 : ℝ) ≀ n := by simp [Nat.one_le_iff_ne_zero, hn]
        apply ENNReal.toReal_le_of_le_ofReal (by nlinarith)
        exact h.prob_empMean_bestAction_sub_actionMean_le_le hσ2 hs hδ n
    _ = _ := by
      ring

integral_sum_range_ucb_action_sub_actionMean_action_leπŸ”—

LemmaBandits.ClippedUCB.integral_sum_range_ucb_action_sub_actionMean_action_le

No docstring.

πŸ”—theorem
Bandits.ClippedUCB.integral_sum_range_ucb_action_sub_actionMean_action_le.{u_1, u_2} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [Nonempty (Fin K)] [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] {E : Ξ© β†’ 𝓔} {Q : MeasureTheory.Measure 𝓔} {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm (Fin K) ℝ} (h : Learning.IsBayesAlgEnvSeq Q ΞΊ alg E A R P) (hlu : l ≀ u) (hm : βˆ€ (e : 𝓔) (a : Fin K), ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a) ∈ Set.Icc l u) (hΟƒ2 : 0 < Οƒ2) (hs : βˆ€ (e : 𝓔) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a)) βŸ¨Οƒ2, β‹―βŸ© (ΞΊ (e, a))) (hΞ΄ : 0 < Ξ΄) (n : β„•) : ∫ (x : Ξ©), (fun Ο‰ => βˆ‘ t ∈ Finset.range n, (ucb A R l u Οƒ2 Ξ΄ (A t Ο‰) t Ο‰ - Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (A t Ο‰) Ο‰)) x βˆ‚P ≀ (u - l) * ↑K + 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * ↑K * ↑n) + (u - l) * ↑K * (↑n - 1) * ↑n * Ξ΄
Bandits.ClippedUCB.integral_sum_range_ucb_action_sub_actionMean_action_le.{u_1, u_2} {K : β„•} {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} [Nonempty (Fin K)] [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] {E : Ξ© β†’ 𝓔} {Q : MeasureTheory.Measure 𝓔} {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm (Fin K) ℝ} (h : Learning.IsBayesAlgEnvSeq Q ΞΊ alg E A R P) (hlu : l ≀ u) (hm : βˆ€ (e : 𝓔) (a : Fin K), ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a) ∈ Set.Icc l u) (hΟƒ2 : 0 < Οƒ2) (hs : βˆ€ (e : 𝓔) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a)) βŸ¨Οƒ2, β‹―βŸ© (ΞΊ (e, a))) (hΞ΄ : 0 < Ξ΄) (n : β„•) : ∫ (x : Ξ©), (fun Ο‰ => βˆ‘ t ∈ Finset.range n, (ucb A R l u Οƒ2 Ξ΄ (A t Ο‰) t Ο‰ - Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (A t Ο‰) Ο‰)) x βˆ‚P ≀ (u - l) * ↑K + 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * ↑K * ↑n) + (u - l) * ↑K * (↑n - 1) * ↑n * Ξ΄

Code

lemma integral_sum_range_ucb_action_sub_actionMean_action_le {alg : Algorithm (Fin K) ℝ}
    (h : IsBayesAlgEnvSeq Q ΞΊ alg E A R P)
    (hlu : l ≀ u) (hm : βˆ€ e a, (ΞΊ (e, a))[id] ∈ (Set.Icc l u)) (hΟƒ2 : 0 < Οƒ2)
    (hs : βˆ€ e a, HasSubgaussianMGF (fun x ↦ x - (ΞΊ (e, a))[id]) βŸ¨Οƒ2, hΟƒ2.le⟩ (ΞΊ (e, a)))
    (hΞ΄ : 0 < Ξ΄) (n : β„•) :
    P[fun Ο‰ ↦ βˆ‘ t ∈ range n, (ucb A R l u Οƒ2 Ξ΄ (A t Ο‰) t Ο‰ - actionMean ΞΊ E (A t Ο‰) Ο‰)] ≀
      (u - l) * K + 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * K * n) + (u - l) * K * (n - 1) * n * Ξ΄
Type uses (4)
Body uses (9)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  by_cases hn : n = 0
  Β· simp [hn, hlu, mul_nonneg]
  let F := {Ο‰ | βˆƒ t < n, βˆƒ a, pullCount A a t Ο‰ β‰  0 ∧
    √(2 * Οƒ2 * Real.log (1 / Ξ΄) / pullCount A a t Ο‰) ≀ empMean A R a t Ο‰ - actionMean ΞΊ E a Ο‰}
  have := h.measurable_action
  have := h.measurable_param
  have := h.measurable_feedback
  have hF : MeasurableSet F := by measurability
  have : βˆ€ t, Integrable (fun Ο‰ ↦ actionMean ΞΊ E (A t Ο‰) Ο‰) P :=
    fun t ↦ IsBayesAlgEnvSeq.integrable_uncurry_actionMean_comp (by fun_prop) (by fun_prop) hm
  calc
    _ ≀ (∫ Ο‰ in F, βˆ‘ t ∈ range n, (u - l) βˆ‚P) +
          ∫ Ο‰ in Fᢜ, (u - l) * K + 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * K * n) βˆ‚P := by
        rw [← integral_add_compl hF (by fun_prop)]
        apply add_le_add
        Β· apply setIntegral_mono_on (Integrable.integrableOn (by fun_prop))
            (Integrable.integrableOn (by fun_prop)) hF
          intro ω hω
          apply sum_le_sum
          intro t ht
          grind [ucb, actionMean]
        Β· apply setIntegral_mono_on (Integrable.integrableOn (by fun_prop))
            (Integrable.integrableOn (by fun_prop)) hF.compl
          intro ω hω
          rw [Set.mem_compl_iff, Set.mem_setOf_eq] at hω
          push Not at hω
          exact sum_ucb_sub_mean_le (fun a ↦ (ΞΊ (E Ο‰, a))[id]) (hm (E Ο‰)) hlu
            (fun t ht hpc ↦ hΟ‰ t ht (A t Ο‰) hpc)
    _ = P.real F * (n * (u - l)) +
          P.real Fᢜ * ((u - l) * K + 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * K * n)) := by
        simp_rw [sum_const, card_range, nsmul_eq_mul, setIntegral_const, smul_eq_mul]
    _ ≀ (K * (n - 1) * Ξ΄) * (n * (u - l)) +
          1 * ((u - l) * K + 4 * √(2 * Οƒ2 * Real.log (1 / Ξ΄) * K * n)) := by
        have : 0 ≀ u - l := sub_nonneg.2 hlu
        gcongr
        Β· have : (0 : ℝ) ≀ n - 1 := by simp [Nat.one_le_iff_ne_zero, hn]
          apply ENNReal.toReal_le_of_le_ofReal (by positivity)
          exact h.prob_empMean_sub_actionMean_ge_le hσ2 hs hδ n
        Β· exact measureReal_le_one
    _ = _ := by
        ring

integral_ucb_action_eq_integral_ucb_bestActionπŸ”—

LemmaBandits.TS.integral_ucb_action_eq_integral_ucb_bestAction

No docstring.

πŸ”—theorem
Bandits.TS.integral_ucb_action_eq_integral_ucb_bestAction.{u_1, u_2} {K : β„•} [Nonempty (Fin K)] {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] [StandardBorelSpace 𝓔] [Nonempty 𝓔] {Q : MeasureTheory.Measure 𝓔} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β†’ 𝓔} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (n : β„•) : ∫ (x : Ξ©), (fun Ο‰ => ClippedUCB.ucb A R l u Οƒ2 Ξ΄ (A n Ο‰) n Ο‰) x βˆ‚P = ∫ (x : Ξ©), (fun Ο‰ => ClippedUCB.ucb A R l u Οƒ2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) n Ο‰) x βˆ‚P
Bandits.TS.integral_ucb_action_eq_integral_ucb_bestAction.{u_1, u_2} {K : β„•} [Nonempty (Fin K)] {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] [StandardBorelSpace 𝓔] [Nonempty 𝓔] {Q : MeasureTheory.Measure 𝓔} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β†’ 𝓔} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (n : β„•) : ∫ (x : Ξ©), (fun Ο‰ => ClippedUCB.ucb A R l u Οƒ2 Ξ΄ (A n Ο‰) n Ο‰) x βˆ‚P = ∫ (x : Ξ©), (fun Ο‰ => ClippedUCB.ucb A R l u Οƒ2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) n Ο‰) x βˆ‚P

Code

lemma integral_ucb_action_eq_integral_ucb_bestAction (hK : 0 < K)
    (h : IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (n : β„•) :
    P[fun Ο‰ ↦ ucb A R l u Οƒ2 Ξ΄ (A n Ο‰) n Ο‰] =
      P[fun Ο‰ ↦ ucb A R l u Οƒ2 Ξ΄ (bestAction ΞΊ E Ο‰) n Ο‰]
Type uses (4)
Body uses (10)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  have := h.measurable_action
  have := h.measurable_param
  have := h.measurable_feedback
  by_cases hn : n = 0
  Β· simp [hn]
  obtain ⟨n, rfl⟩ := Nat.exists_eq_succ_of_ne_zero hn
  let uc (ha : (Iic n β†’ Fin K Γ— ℝ) Γ— Fin K) := ucb' n ha.1 l u Οƒ2 Ξ΄ ha.2
  calc
    _  = P[fun Ο‰ ↦ uc (history A R n Ο‰, A (n + 1) Ο‰)] := by
        simp_rw [uc, ucb_succ_eq_ucb']
    _ = ∫ ha, uc ha βˆ‚P.map (fun Ο‰ ↦ (history A R n Ο‰, A (n + 1) Ο‰)) := by
        rw [← integral_map (by fun_prop) (by fun_prop)]
    _ = ∫ ha, uc ha βˆ‚P.map (fun Ο‰ ↦ (history A R n Ο‰, bestAction ΞΊ E Ο‰)) := by
        rw [← compProd_map_condDistrib (by fun_prop), ← compProd_map_condDistrib (by fun_prop),
            Measure.compProd_congr (hasCondDistrib_action hK h n).condDistrib_eq]
    _ = P[fun Ο‰ ↦ ucb A R l u Οƒ2 Ξ΄ (bestAction ΞΊ E Ο‰) (n + 1) Ο‰] := by
        rw [integral_map (by fun_prop) (by fun_prop)]
        simp_rw [uc, ucb_succ_eq_ucb']

integral_regret_eq_addπŸ”—

LemmaBandits.TS.integral_regret_eq_add

No docstring.

πŸ”—theorem
Bandits.TS.integral_regret_eq_add.{u_1, u_2} {K : β„•} [Nonempty (Fin K)] {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] [StandardBorelSpace 𝓔] [Nonempty 𝓔] {Q : MeasureTheory.Measure 𝓔} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β†’ 𝓔} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (hm : βˆ€ (e : 𝓔) (a : Fin K), ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a) ∈ Set.Icc l u) (n : β„•) : ∫ (x : Ξ©), Learning.IsBayesAlgEnvSeq.regret ΞΊ E A n x βˆ‚P = ∫ (x : Ξ©), (fun Ο‰ => βˆ‘ t ∈ Finset.range n, (Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) Ο‰ - ClippedUCB.ucb A R l u Οƒ2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) t Ο‰)) x βˆ‚P + ∫ (x : Ξ©), (fun Ο‰ => βˆ‘ t ∈ Finset.range n, (ClippedUCB.ucb A R l u Οƒ2 Ξ΄ (A t Ο‰) t Ο‰ - Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (A t Ο‰) Ο‰)) x βˆ‚P
Bandits.TS.integral_regret_eq_add.{u_1, u_2} {K : β„•} [Nonempty (Fin K)] {l u Οƒ2 Ξ΄ : ℝ} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] [StandardBorelSpace 𝓔] [Nonempty 𝓔] {Q : MeasureTheory.Measure 𝓔} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β†’ 𝓔} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (hm : βˆ€ (e : 𝓔) (a : Fin K), ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a) ∈ Set.Icc l u) (n : β„•) : ∫ (x : Ξ©), Learning.IsBayesAlgEnvSeq.regret ΞΊ E A n x βˆ‚P = ∫ (x : Ξ©), (fun Ο‰ => βˆ‘ t ∈ Finset.range n, (Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) Ο‰ - ClippedUCB.ucb A R l u Οƒ2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο‰) t Ο‰)) x βˆ‚P + ∫ (x : Ξ©), (fun Ο‰ => βˆ‘ t ∈ Finset.range n, (ClippedUCB.ucb A R l u Οƒ2 Ξ΄ (A t Ο‰) t Ο‰ - Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (A t Ο‰) Ο‰)) x βˆ‚P

Code

lemma integral_regret_eq_add (hK : 0 < K) (h : IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P)
    (hm : βˆ€ e a, (ΞΊ (e, a))[id] ∈ (Set.Icc l u)) (n : β„•) :
    P[IsBayesAlgEnvSeq.regret ΞΊ E A n] =
      P[fun Ο‰ ↦ βˆ‘ t ∈ range n,
        (actionMean ΞΊ E (bestAction ΞΊ E Ο‰) Ο‰ - ucb A R l u Οƒ2 Ξ΄ (bestAction ΞΊ E Ο‰) t Ο‰)] +
      P[fun Ο‰ ↦ βˆ‘ t ∈ range n,
        (ucb A R l u Οƒ2 Ξ΄ (A t Ο‰) t Ο‰ - actionMean ΞΊ E (A t Ο‰) Ο‰)]
Type uses (6)
Body uses (8)
Used by (1)

Actions: Source Β· Open Issue

Proof
by
  have hua (t : β„•) : Integrable (fun Ο‰ ↦ ucb A R l u Οƒ2 Ξ΄ (A t Ο‰) t Ο‰) P :=
    integrable_uncurry_ucb_comp h.measurable_action h.measurable_feedback (h.measurable_action t)
      measurable_const
  have hub (t : β„•) : Integrable (fun Ο‰ ↦ ucb A R l u Οƒ2 Ξ΄ (bestAction ΞΊ E Ο‰) t Ο‰) P :=
    integrable_uncurry_ucb_comp h.measurable_action h.measurable_feedback
      (IsBayesAlgEnvSeq.measurable_bestAction h.measurable_param) measurable_const
  have haa (t : β„•) : Integrable (fun Ο‰ ↦ actionMean ΞΊ E (A t Ο‰) Ο‰) P :=
    IsBayesAlgEnvSeq.integrable_uncurry_actionMean_comp h.measurable_param
      (h.measurable_action t) hm
  have hab : Integrable (fun Ο‰ ↦ actionMean ΞΊ E (bestAction ΞΊ E Ο‰) Ο‰) P :=
      IsBayesAlgEnvSeq.integrable_uncurry_actionMean_comp h.measurable_param
        (IsBayesAlgEnvSeq.measurable_bestAction h.measurable_param) hm
  calc
    _  = (βˆ‘ t ∈ range n, ∫ Ο‰, actionMean ΞΊ E (bestAction ΞΊ E Ο‰) Ο‰ βˆ‚P) -
            βˆ‘ t ∈ range n, ∫ Ο‰, actionMean ΞΊ E (A t Ο‰) Ο‰ βˆ‚P := by
        simp_rw [IsBayesAlgEnvSeq.regret_eq_sum_gap, IsBayesAlgEnvSeq.gap_eq_sub]
        rw [integral_finsetSum _ (by fun_prop), ← Finset.sum_sub_distrib]
        simp_rw [integral_sub hab (haa _)]
    _ = ((βˆ‘ t ∈ range n, ∫ Ο‰, actionMean ΞΊ E (bestAction ΞΊ E Ο‰) Ο‰ βˆ‚P) -
            βˆ‘ t ∈ range n, ∫ Ο‰, ucb A R l u Οƒ2 Ξ΄ (bestAction ΞΊ E Ο‰) t Ο‰ βˆ‚P) +
          ((βˆ‘ t ∈ range n, ∫ Ο‰, ucb A R l u Οƒ2 Ξ΄ (A t Ο‰) t Ο‰ βˆ‚P) -
            βˆ‘ t ∈ range n, ∫ Ο‰, actionMean ΞΊ E (A t Ο‰) Ο‰ βˆ‚P) := by
        simp [integral_ucb_action_eq_integral_ucb_bestAction hK h]
    _ = (βˆ‘ t ∈ range n, ∫ Ο‰, actionMean ΞΊ E (bestAction ΞΊ E Ο‰) Ο‰ -
              ucb A R l u Οƒ2 Ξ΄ (bestAction ΞΊ E Ο‰) t Ο‰ βˆ‚P) +
          βˆ‘ t ∈ range n, ∫ Ο‰, ucb A R l u Οƒ2 Ξ΄ (A t Ο‰) t Ο‰ -
            actionMean ΞΊ E (A t Ο‰) Ο‰ βˆ‚P := by
        rw [← Finset.sum_sub_distrib, ← Finset.sum_sub_distrib]
        simp_rw [← integral_sub hab (hub _), ← integral_sub (hua _) (haa _)]
    _ = _ := by
        rw [← integral_finsetSum _ (by fun_prop), ← integral_finsetSum _ (by fun_prop)]

integral_regret_leπŸ”—

TheoremBandits.TS.integral_regret_le

If Thompson sampling has the correct prior over environments and every environment has K actions, each of which has a corresponding reward between l and u that is sub-Gaussian with variance proxy Οƒ2 after its mean is subtracted, then the Bayesian regret at time n is at most (2 * K + 1) * (u - l) + 8 * √(Οƒ2 * K * n * Real.log n).

πŸ”—theorem
Bandits.TS.integral_regret_le.{u_1, u_2} {K : β„•} [Nonempty (Fin K)] {l u Οƒ2 : ℝ} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] [StandardBorelSpace 𝓔] [Nonempty 𝓔] {Q : MeasureTheory.Measure 𝓔} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β†’ 𝓔} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (hlu : l ≀ u) (hm : βˆ€ (e : 𝓔) (a : Fin K), ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a) ∈ Set.Icc l u) (hΟƒ2 : 0 < Οƒ2) (hs : βˆ€ (e : 𝓔) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a)) βŸ¨Οƒ2, β‹―βŸ© (ΞΊ (e, a))) (n : β„•) : ∫ (x : Ξ©), Learning.IsBayesAlgEnvSeq.regret ΞΊ E A n x βˆ‚P ≀ (2 * ↑K + 1) * (u - l) + 8 * √(Οƒ2 * ↑K * ↑n * Real.log ↑n)
Bandits.TS.integral_regret_le.{u_1, u_2} {K : β„•} [Nonempty (Fin K)] {l u Οƒ2 : ℝ} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {𝓔 : Type u_2} [MeasurableSpace 𝓔] [StandardBorelSpace 𝓔] [Nonempty 𝓔] {Q : MeasureTheory.Measure 𝓔} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (𝓔 Γ— Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β†’ 𝓔} {A : β„• β†’ Ξ© β†’ Fin K} {R : β„• β†’ Ξ© β†’ ℝ} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (hlu : l ≀ u) (hm : βˆ€ (e : 𝓔) (a : Fin K), ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a) ∈ Set.Icc l u) (hΟƒ2 : 0 < Οƒ2) (hs : βˆ€ (e : 𝓔) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x βˆ‚ΞΊ (e, a)) βŸ¨Οƒ2, β‹―βŸ© (ΞΊ (e, a))) (n : β„•) : ∫ (x : Ξ©), Learning.IsBayesAlgEnvSeq.regret ΞΊ E A n x βˆ‚P ≀ (2 * ↑K + 1) * (u - l) + 8 * √(Οƒ2 * ↑K * ↑n * Real.log ↑n)

Code

theorem integral_regret_le (hK : 0 < K) (h : IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P)
    (hlu : l ≀ u) (hm : βˆ€ e a, (ΞΊ (e, a))[id] ∈ (Set.Icc l u)) (hΟƒ2 : 0 < Οƒ2)
    (hs : βˆ€ e a, HasSubgaussianMGF (fun x ↦ x - (ΞΊ (e, a))[id]) βŸ¨Οƒ2, hΟƒ2.le⟩ (ΞΊ (e, a))) (n : β„•) :
    P[IsBayesAlgEnvSeq.regret ΞΊ E A n]
      ≀ (2 * K + 1) * (u - l) + 8 * √(Οƒ2 * K * n * Real.log n)
Type uses (3)
Body uses (6)

Actions: Source Β· Open Issue

Proof
by
  by_cases hn : n = 0
  Β· simp [hn, IsBayesAlgEnvSeq.regret, Bandits.regret]
    nlinarith
  have hΞ΄ : (0 : ℝ) < 1 / n ^ 2 := by positivity
  calc P[IsBayesAlgEnvSeq.regret ΞΊ E A n]
      = _ :=
        integral_regret_eq_add hK h hm n
    _ ≀ _ :=
        add_le_add
          (integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le h hlu hm hσ2 hs hδ n)
          (integral_sum_range_ucb_action_sub_actionMean_action_le h hlu hm hσ2 hs hδ n)
    _ = K * (u - l) + (K + 1) * (u - l) * ((n - 1) / n)
          + 4 * √((2 : ℝ) ^ 2 * (Οƒ2 * K * n * Real.log n)) := by
        field_simp
        rw [Real.log_pow]
        ring_nf
    _ = K * (u - l) + (K + 1) * (u - l) * ((n - 1) / n) + 8 * √(Οƒ2 * K * n * Real.log n) := by
        rw [Real.sqrt_mul (by positivity), Real.sqrt_sq (by norm_num)]
        ring
    _ ≀ K * (u - l) + (K + 1) * (u - l) * 1 + 8 * √(Οƒ2 * K * n * Real.log n) := by -- loose
        have : 0 ≀ u - l := sub_nonneg.2 hlu
        gcongr
        rw [div_le_one (by positivity)]
        linarith
    _ = _ := by
        ring
  1. Bandits.ClippedUCB.ucb
  2. Bandits.ClippedUCB.ucb_zero
  3. Bandits.ClippedUCB.ucb_mem_Icc
  4. Bandits.ClippedUCB.measurable_ucb
  5. Bandits.ClippedUCB.measurable_uncurry_ucb_comp
  6. Bandits.ClippedUCB.integrable_uncurry_ucb_comp
  7. Bandits.ClippedUCB.ucb'
  8. Bandits.ClippedUCB.measurable_uncurry_ucb'
  9. Bandits.ClippedUCB.ucb_succ_eq_ucb'
  10. Bandits.ClippedUCB.sum_ucb_sub_mean_le
  11. Bandits.ClippedUCB.integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le
  12. Bandits.ClippedUCB.integral_sum_range_ucb_action_sub_actionMean_action_le
  13. Bandits.TS.integral_ucb_action_eq_integral_ucb_bestAction
  14. Bandits.TS.integral_regret_eq_add
  15. Bandits.TS.integral_regret_le