2.6.Β Online.Bandit.Algorithms.Regret.BayesRegretTS
Bayesian regret of Thompson sampling
This file provides a Bayesian regret upper bound (integral_regret_le) for Thompson sampling under
the assumption (among others) that it has the correct prior over environments.
The Bayesian regret upper bound relies on a clipped upper confidence bound whose definition and properties are also given in this file.
Main definitions
-
ucb A R l u Ο2 Ξ΄ a n: clipped upper confidence bound used in the regret analysis of Thompson sampling for a sequence of actionsA : β β Ξ© β Fin K, rewardsR : β β Ξ© β β, reward lower boundl : β, reward upper boundu : β, sub-Gaussian variance proxyΟ2 : β, confidence parameterΞ΄ : β, actiona : Fin K, and timen : β. -
ucb' n h l u Ο2 Ξ΄ a: clipped upper confidence bound for actiona : Fin Kat timen : βgiven the historyh : Iic n β Fin K Γ β(rather than the entire sequences of actions and rewards).
Main results
-
integral_regret_le: if Thompson sampling has the correct prior over environments and every environment hasKactions, each of which has a corresponding reward betweenlanduthat is sub-Gaussian with variance proxyΟ2after its mean is subtracted, then the Bayesian regret at timenis at most(2 * K + 1) * (u - l) + 8 * β(Ο2 * K * n * Real.log n).
Module LeanMachineLearning.Online.Bandit.Algorithms.Regret.BayesRegretTS contains 15 exposed declarations.
ucbπ
Bandits.ClippedUCB.ucbClipped upper confidence bound used in the regret analysis of Thompson sampling.
Bandits.ClippedUCB.ucb.{u_1} {K : β} {Ξ© : Type u_1} (A : β β Ξ© β Fin K) (R : β β Ξ© β β) (l u Ο2 Ξ΄ : β) (a : Fin K) (n : β) (Ο : Ξ©) : βBandits.ClippedUCB.ucb.{u_1} {K : β} {Ξ© : Type u_1} (A : β β Ξ© β Fin K) (R : β β Ξ© β β) (l u Ο2 Ξ΄ : β) (a : Fin K) (n : β) (Ο : Ξ©) : β
Code
noncomputable def ucb (A : β β Ξ© β Fin K) (R : β β Ξ© β β) (l u Ο2 Ξ΄ : β) (a : Fin K) (n : β) (Ο : Ξ©) : β := if pullCount A a n Ο = 0 then u else max l (min u (empMean A R a n Ο + β(2 * Ο2 * Real.log (1 / Ξ΄) / (pullCount A a n Ο))))
Actions: Source Β· Open Issue
ucb_zeroπ
Bandits.ClippedUCB.ucb_zeroNo docstring.
Bandits.ClippedUCB.ucb_zero.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {a : Fin K} {Ο : Ξ©} : ucb A R l u Ο2 Ξ΄ a 0 Ο = uBandits.ClippedUCB.ucb_zero.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {a : Fin K} {Ο : Ξ©} : ucb A R l u Ο2 Ξ΄ a 0 Ο = u
Code
lemma ucb_zero {a : Fin K} {Ο : Ξ©} : ucb A R l u Ο2 Ξ΄ a 0 Ο = uType uses (1)
Body uses (3)
Used by (1)
Actions: Source Β· Open Issue
Proof
by simp [ucb]
ucb_mem_Iccπ
Bandits.ClippedUCB.ucb_mem_IccNo docstring.
Bandits.ClippedUCB.ucb_mem_Icc.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} (h : l β€ u) {a : Fin K} {n : β} {Ο : Ξ©} : ucb A R l u Ο2 Ξ΄ a n Ο β Set.Icc l uBandits.ClippedUCB.ucb_mem_Icc.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} (h : l β€ u) {a : Fin K} {n : β} {Ο : Ξ©} : ucb A R l u Ο2 Ξ΄ a n Ο β Set.Icc l u
Code
lemma ucb_mem_Icc (h : l β€ u) {a : Fin K} {n : β} {Ο : Ξ©} :
ucb A R l u Ο2 Ξ΄ a n Ο β Set.Icc l uType uses (1)
Used by (1)
Actions: Source Β· Open Issue
Proof
by unfold ucb grind
measurable_ucbπ
Bandits.ClippedUCB.measurable_ucbNo docstring.
Bandits.ClippedUCB.measurable_ucb.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [MeasurableSpace Ξ©] {a : Fin K} {n : β} (hA : β (t : β), Measurable (A t)) (hR : β (t : β), Measurable (R t)) : Measurable (ucb A R l u Ο2 Ξ΄ a n)Bandits.ClippedUCB.measurable_ucb.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [MeasurableSpace Ξ©] {a : Fin K} {n : β} (hA : β (t : β), Measurable (A t)) (hR : β (t : β), Measurable (R t)) : Measurable (ucb A R l u Ο2 Ξ΄ a n)
Code
lemma measurable_ucb [MeasurableSpace Ξ©] {a : Fin K} {n : β} (hA : β t, Measurable (A t))
(hR : β t, Measurable (R t)) : Measurable (ucb A R l u Ο2 Ξ΄ a n)Type uses (1)
Body uses (4)
Used by (1)
Actions: Source Β· Open Issue
Proof
Measurable.ite (by measurability) (by fun_prop) (by fun_prop)
measurable_uncurry_ucb_compπ
Bandits.ClippedUCB.measurable_uncurry_ucb_compNo docstring.
Bandits.ClippedUCB.measurable_uncurry_ucb_comp.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [MeasurableSpace Ξ©] (hA : β (t : β), Measurable (A t)) (hR : β (t : β), Measurable (R t)) {f : Ξ© β Fin K} (hf : Measurable f) {g : Ξ© β β} (hg : Measurable g) : Measurable fun Ο => ucb A R l u Ο2 Ξ΄ (f Ο) (g Ο) ΟBandits.ClippedUCB.measurable_uncurry_ucb_comp.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [MeasurableSpace Ξ©] (hA : β (t : β), Measurable (A t)) (hR : β (t : β), Measurable (R t)) {f : Ξ© β Fin K} (hf : Measurable f) {g : Ξ© β β} (hg : Measurable g) : Measurable fun Ο => ucb A R l u Ο2 Ξ΄ (f Ο) (g Ο) Ο
Code
lemma measurable_uncurry_ucb_comp [MeasurableSpace Ξ©] (hA : β t, Measurable (A t))
(hR : β t, Measurable (R t)) {f : Ξ© β Fin K} (hf : Measurable f) {g : Ξ© β β}
(hg : Measurable g) : Measurable (fun Ο β¦ ucb A R l u Ο2 Ξ΄ (f Ο) (g Ο) Ο)Type uses (1)
Body uses (1)
Used by (1)
Actions: Source Β· Open Issue
Proof
by change Measurable ((fun aΟ β¦ ucb A R l u Ο2 Ξ΄ aΟ.1 (g aΟ.2) aΟ.2) β fun Ο β¦ (f Ο, Ο)) apply Measurable.comp _ (by fun_prop) apply measurable_from_prod_countable_right intro a change Measurable ((fun tΟ β¦ ucb A R l u Ο2 Ξ΄ a tΟ.1 tΟ.2) β fun Ο β¦ (g Ο, Ο)) apply Measurable.comp _ (by fun_prop) exact measurable_from_prod_countable_right (fun _ β¦ measurable_ucb hA hR)
integrable_uncurry_ucb_compπ
Bandits.ClippedUCB.integrable_uncurry_ucb_compNo docstring.
Bandits.ClippedUCB.integrable_uncurry_ucb_comp.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [MeasurableSpace Ξ©] (hA : β (t : β), Measurable (A t)) (hR : β (t : β), Measurable (R t)) {f : Ξ© β Fin K} (hf : Measurable f) {g : Ξ© β β} (hg : Measurable g) {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsFiniteMeasure P] : MeasureTheory.Integrable (fun Ο => ucb A R l u Ο2 Ξ΄ (f Ο) (g Ο) Ο) PBandits.ClippedUCB.integrable_uncurry_ucb_comp.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [MeasurableSpace Ξ©] (hA : β (t : β), Measurable (A t)) (hR : β (t : β), Measurable (R t)) {f : Ξ© β Fin K} (hf : Measurable f) {g : Ξ© β β} (hg : Measurable g) {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsFiniteMeasure P] : MeasureTheory.Integrable (fun Ο => ucb A R l u Ο2 Ξ΄ (f Ο) (g Ο) Ο) P
Code
lemma integrable_uncurry_ucb_comp [MeasurableSpace Ξ©] (hA : β t, Measurable (A t))
(hR : β t, Measurable (R t)) {f : Ξ© β Fin K} (hf : Measurable f) {g : Ξ© β β}
(hg : Measurable g) {P : Measure Ξ©} [IsFiniteMeasure P] :
Integrable (fun Ο β¦ ucb A R l u Ο2 Ξ΄ (f Ο) (g Ο) Ο) PType uses (1)
Body uses (3)
Used by (3)
Actions: Source Β· Open Issue
Proof
by refine β¨(measurable_uncurry_ucb_comp hA hR hf hg).aestronglyMeasurable, ?_β© apply HasFiniteIntegral.of_bounded (C := max |l| |u|) filter_upwards with Ο rw [Real.norm_eq_abs] unfold ucb grind
ucb'π
Bandits.ClippedUCB.ucb'Clipped upper confidence bound (history-based version).
Bandits.ClippedUCB.ucb' {K : β} (n : β) (h : β₯(Finset.Iic n) β Fin K Γ β) (l u Ο2 Ξ΄ : β) (a : Fin K) : βBandits.ClippedUCB.ucb' {K : β} (n : β) (h : β₯(Finset.Iic n) β Fin K Γ β) (l u Ο2 Ξ΄ : β) (a : Fin K) : β
Code
noncomputable def ucb' (n : β) (h : Iic n β Fin K Γ β) (l u Ο2 Ξ΄ : β) (a : Fin K) : β := if pullCount' n h a = 0 then u else max l (min u (empMean' n h a + β(2 * Ο2 * Real.log (1 / Ξ΄) / (pullCount' n h a))))
Body uses (2)
Actions: Source Β· Open Issue
measurable_uncurry_ucb'π
Bandits.ClippedUCB.measurable_uncurry_ucb'No docstring.
Bandits.ClippedUCB.measurable_uncurry_ucb' {K : β} {l u Ο2 Ξ΄ : β} {n : β} : Measurable fun p => ucb' n (Prod.fst p) l u Ο2 Ξ΄ (Prod.snd p)Bandits.ClippedUCB.measurable_uncurry_ucb' {K : β} {l u Ο2 Ξ΄ : β} {n : β} : Measurable fun p => ucb' n (Prod.fst p) l u Ο2 Ξ΄ (Prod.snd p)
Code
lemma measurable_uncurry_ucb' {n : β} :
Measurable (fun p : (Iic n β Fin K Γ β) Γ Fin K β¦ ucb' n p.1 l u Ο2 Ξ΄ p.2)Type uses (1)
Body uses (5)
Used by (1)
Actions: Source Β· Open Issue
Proof
Measurable.ite (by measurability) (by fun_prop) (by fun_prop)
ucb_succ_eq_ucb'π
Bandits.ClippedUCB.ucb_succ_eq_ucb'No docstring.
Bandits.ClippedUCB.ucb_succ_eq_ucb'.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {a : Fin K} {n : β} {Ο : Ξ©} : ucb A R l u Ο2 Ξ΄ a (n + 1) Ο = ucb' n (Learning.history A R n Ο) l u Ο2 Ξ΄ aBandits.ClippedUCB.ucb_succ_eq_ucb'.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {a : Fin K} {n : β} {Ο : Ξ©} : ucb A R l u Ο2 Ξ΄ a (n + 1) Ο = ucb' n (Learning.history A R n Ο) l u Ο2 Ξ΄ a
Code
lemma ucb_succ_eq_ucb' {a : Fin K} {n : β} {Ο : Ξ©} :
ucb A R l u Ο2 Ξ΄ a (n + 1) Ο = ucb' n (history A R n Ο) l u Ο2 Ξ΄ aBody uses (6)
Used by (1)
Actions: Source Β· Open Issue
Proof
by
have hp : pullCount A a (n + 1) Ο = pullCount' n (history A R n Ο) a :=
pullCount_add_one_eq_pullCount'
have he : empMean A R a (n + 1) Ο = empMean' n (history A R n Ο) a :=
empMean_add_one_eq_empMean'
rw [ucb, ucb', hp, he]
sum_ucb_sub_mean_leπ
Bandits.ClippedUCB.sum_ucb_sub_mean_leNo docstring.
Bandits.ClippedUCB.sum_ucb_sub_mean_le.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {n : β} {Ο : Ξ©} (ΞΌ : Fin K β β) (hΞΌ : β (a : Fin K), ΞΌ a β Set.Icc l u) (hi : l β€ u) (hc : β s < n, Learning.pullCount A (A s Ο) s Ο β 0 β Learning.empMean A R (A s Ο) s Ο - ΞΌ (A s Ο) < β(2 * Ο2 * Real.log (1 / Ξ΄) / β(Learning.pullCount A (A s Ο) s Ο))) : β s β Finset.range n, (ucb A R l u Ο2 Ξ΄ (A s Ο) s Ο - ΞΌ (A s Ο)) β€ (u - l) * βK + 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * βK * βn)Bandits.ClippedUCB.sum_ucb_sub_mean_le.{u_1} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {n : β} {Ο : Ξ©} (ΞΌ : Fin K β β) (hΞΌ : β (a : Fin K), ΞΌ a β Set.Icc l u) (hi : l β€ u) (hc : β s < n, Learning.pullCount A (A s Ο) s Ο β 0 β Learning.empMean A R (A s Ο) s Ο - ΞΌ (A s Ο) < β(2 * Ο2 * Real.log (1 / Ξ΄) / β(Learning.pullCount A (A s Ο) s Ο))) : β s β Finset.range n, (ucb A R l u Ο2 Ξ΄ (A s Ο) s Ο - ΞΌ (A s Ο)) β€ (u - l) * βK + 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * βK * βn)
Code
lemma sum_ucb_sub_mean_le {n : β} {Ο : Ξ©} (ΞΌ : Fin K β β) (hΞΌ : β a, ΞΌ a β Set.Icc l u) (hi : l β€ u)
(hc : β s < n, pullCount A (A s Ο) s Ο β 0 β empMean A R (A s Ο) s Ο - ΞΌ (A s Ο)
< β(2 * Ο2 * Real.log (1 / Ξ΄) / (pullCount A (A s Ο) s Ο))) :
β s β range n, (ucb A R l u Ο2 Ξ΄ (A s Ο) s Ο - ΞΌ (A s Ο))
β€ (u - l) * K + 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * K * n)Body uses (3)
Actions: Source Β· Open Issue
Proof
by
let Sβ := {s β range n | pullCount A (A s Ο) s Ο = 0}
let Sβ := {s β range n | pullCount A (A s Ο) s Ο β 0}
have hu : Sβ βͺ Sβ = range n := filter_union_filter_not_eq _ _
have hd : Disjoint Sβ Sβ := disjoint_filter_filter_not _ _ _
rw [β hu, sum_union hd]
gcongr
Β· calc β s β Sβ, (ucb A R l u Ο2 Ξ΄ (A s Ο) s Ο - ΞΌ (A s Ο))
β€ β s β Sβ, (u - l) :=
have (s : β) : ucb A R l u Ο2 Ξ΄ (A s Ο) s Ο β Set.Icc l u := ucb_mem_Icc hi
sum_le_sum (by grind)
_ = β s β range n, if pullCount A (A s Ο) s Ο = 0 then (u - l) else 0 := by
rw [sum_filter]
_ = β a, β j β range (pullCount A a n Ο), if j = 0 then (u - l) else 0 :=
sum_comp_pullCount (fun j => if j = 0 then (u - l) else 0) n Ο
_ β€ β a, (u - l) := by
gcongr
rw [sum_ite_eq']
grind
_ = (u - l) * K := by
rw [Fin.sum_const, nsmul_eq_mul, mul_comm]
Β· calc β s β Sβ, (ucb A R l u Ο2 Ξ΄ (A s Ο) s Ο - ΞΌ (A s Ο))
β€ β s β Sβ, 2 * β(2 * Ο2 * Real.log (1 / Ξ΄) / (pullCount A (A s Ο) s Ο)) := by
gcongr with s hs
unfold ucb
have : 0 β€ β(2 * Ο2 * Real.log (1 / Ξ΄) / (pullCount A (A s Ο) s Ο)) := by positivity
grind
_ β€ β s β range n, 2 * β(2 * Ο2 * Real.log (1 / Ξ΄) / (pullCount A (A s Ο) s Ο)) :=
sum_le_sum_of_subset_of_nonneg (filter_subset _ _) (fun _ _ _ => by positivity)
_ = 2 * β(2 * Ο2 * Real.log (1 / Ξ΄)) * β s β range n, (1 / β(pullCount A (A s Ο) s Ο)) := by
rw [mul_sum]
congr with s
rw [Real.sqrt_div' _ (by positivity)]
ring
_ = 2 * β(2 * Ο2 * Real.log (1 / Ξ΄)) *
β a, β j β range (pullCount A a n Ο), (1 / βj) := by
rw [sum_comp_pullCount (fun j => 1 / βj)]
_ β€ 2 * β(2 * Ο2 * Real.log (1 / Ξ΄)) * (2 * β a, β(pullCount A a n Ο)) := by -- loose
rw [mul_sum _ _ 2]
gcongr with a
by_cases ha : pullCount A a n Ο = 0
Β· simp [ha]
Β· have hi := sum_inv_sqrt_le (Nat.pos_of_ne_zero ha)
rw [sum_range_succ] at hi
have : 0 β€ 1 / β(pullCount A a n Ο) := by positivity
linarith
_ β€ 2 * β(2 * Ο2 * Real.log (1 / Ξ΄)) * (2 * β(K * β a, (pullCount A a n Ο))) := by
gcongr
have h := sum_sqrt_le Finset.univ (fun a => Nat.cast_nonneg (pullCount A a n Ο))
rw [Finset.card_fin] at h
exact_mod_cast h
_ = 2 * β(2 * Ο2 * Real.log (1 / Ξ΄)) * (2 * β(K * n)) := by
congr
exact sum_pullCount (Ο := Ο)
_ = 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * K * n) := by
ring_nf
rw [β Real.sqrt_mul' _ (by positivity)]
ring_nf
integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_leπ
Bandits.ClippedUCB.integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_leNo docstring.
Bandits.ClippedUCB.integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le.{u_1, u_2} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [Nonempty (Fin K)] [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] {E : Ξ© β π} {Q : MeasureTheory.Measure π} {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm (Fin K) β} (h : Learning.IsBayesAlgEnvSeq Q ΞΊ alg E A R P) (hlu : l β€ u) (hm : β (e : π) (a : Fin K), β« (x : β), id x βΞΊ (e, a) β Set.Icc l u) (hΟ2 : 0 < Ο2) (hs : β (e : π) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - β« (x : β), id x βΞΊ (e, a)) β¨Ο2, β―β© (ΞΊ (e, a))) (hΞ΄ : 0 < Ξ΄) (n : β) : β« (x : Ξ©), (fun Ο => β t β Finset.range n, (Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) Ο - ucb A R l u Ο2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) t Ο)) x βP β€ (u - l) * (βn - 1) * βn * Ξ΄Bandits.ClippedUCB.integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le.{u_1, u_2} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [Nonempty (Fin K)] [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] {E : Ξ© β π} {Q : MeasureTheory.Measure π} {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm (Fin K) β} (h : Learning.IsBayesAlgEnvSeq Q ΞΊ alg E A R P) (hlu : l β€ u) (hm : β (e : π) (a : Fin K), β« (x : β), id x βΞΊ (e, a) β Set.Icc l u) (hΟ2 : 0 < Ο2) (hs : β (e : π) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - β« (x : β), id x βΞΊ (e, a)) β¨Ο2, β―β© (ΞΊ (e, a))) (hΞ΄ : 0 < Ξ΄) (n : β) : β« (x : Ξ©), (fun Ο => β t β Finset.range n, (Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) Ο - ucb A R l u Ο2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) t Ο)) x βP β€ (u - l) * (βn - 1) * βn * Ξ΄
Code
lemma integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le {alg : Algorithm (Fin K) β}
(h : IsBayesAlgEnvSeq Q ΞΊ alg E A R P)
(hlu : l β€ u) (hm : β e a, (ΞΊ (e, a))[id] β (Set.Icc l u)) (hΟ2 : 0 < Ο2)
(hs : β e a, HasSubgaussianMGF (fun x β¦ x - (ΞΊ (e, a))[id]) β¨Ο2, hΟ2.leβ© (ΞΊ (e, a)))
(hΞ΄ : 0 < Ξ΄) (n : β) :
P[fun Ο β¦ β t β range n,
(actionMean ΞΊ E (bestAction ΞΊ E Ο) Ο - ucb A R l u Ο2 Ξ΄ (bestAction ΞΊ E Ο) t Ο)] β€
(u - l) * (n - 1) * n * Ξ΄Type uses (5)
Body uses (9)
Used by (1)
Actions: Source Β· Open Issue
Proof
by
by_cases hn : n = 0
Β· simp [hn]
let F := {Ο | β t < n, pullCount A (bestAction ΞΊ E Ο) t Ο β 0 β§
empMean A R (bestAction ΞΊ E Ο) t Ο - actionMean ΞΊ E (bestAction ΞΊ E Ο) Ο β€
-β(2 * Ο2 * Real.log (1 / Ξ΄) / (pullCount A (bestAction ΞΊ E Ο) t Ο))}
have := h.measurable_action
have := h.measurable_param
have := h.measurable_feedback
have hF : MeasurableSet F := by measurability
have : Integrable (fun Ο β¦ actionMean ΞΊ E (bestAction ΞΊ E Ο) Ο) P :=
IsBayesAlgEnvSeq.integrable_uncurry_actionMean_comp (by fun_prop) (by fun_prop) hm
calc
_ β€ β« Ο in F, β t β range n,
(actionMean ΞΊ E (bestAction ΞΊ E Ο) Ο - ucb A R l u Ο2 Ξ΄ (bestAction ΞΊ E Ο) t Ο) βP := by
rw [β integral_add_compl hF (by fun_prop)]
apply add_le_of_nonpos_right
apply setIntegral_nonpos hF.compl
intro Ο hΟ
apply sum_nonpos
intro t ht
rw [Set.mem_compl_iff, Set.mem_setOf_eq] at hΟ
push Not at hΟ
grind [hΟ t (mem_range.mp ht), ucb, actionMean]
_ β€ β« Ο in F, β t β range n, (u - l) βP := by
apply setIntegral_mono_on (Integrable.integrableOn (by fun_prop))
(Integrable.integrableOn (by fun_prop)) hF
intro Ο hΟ
apply sum_le_sum
intro t ht
grind [actionMean, ucb]
_ = P.real F * (n * (u - l)) := by
simp_rw [sum_const, card_range, nsmul_eq_mul, setIntegral_const, smul_eq_mul]
_ β€ ((n - 1) * Ξ΄) * (n * (u - l)) := by
gcongr
have : (1 : β) β€ n := by simp [Nat.one_le_iff_ne_zero, hn]
apply ENNReal.toReal_le_of_le_ofReal (by nlinarith)
exact h.prob_empMean_bestAction_sub_actionMean_le_le hΟ2 hs hΞ΄ n
_ = _ := by
ring
integral_sum_range_ucb_action_sub_actionMean_action_leπ
Bandits.ClippedUCB.integral_sum_range_ucb_action_sub_actionMean_action_leNo docstring.
Bandits.ClippedUCB.integral_sum_range_ucb_action_sub_actionMean_action_le.{u_1, u_2} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [Nonempty (Fin K)] [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] {E : Ξ© β π} {Q : MeasureTheory.Measure π} {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm (Fin K) β} (h : Learning.IsBayesAlgEnvSeq Q ΞΊ alg E A R P) (hlu : l β€ u) (hm : β (e : π) (a : Fin K), β« (x : β), id x βΞΊ (e, a) β Set.Icc l u) (hΟ2 : 0 < Ο2) (hs : β (e : π) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - β« (x : β), id x βΞΊ (e, a)) β¨Ο2, β―β© (ΞΊ (e, a))) (hΞ΄ : 0 < Ξ΄) (n : β) : β« (x : Ξ©), (fun Ο => β t β Finset.range n, (ucb A R l u Ο2 Ξ΄ (A t Ο) t Ο - Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (A t Ο) Ο)) x βP β€ (u - l) * βK + 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * βK * βn) + (u - l) * βK * (βn - 1) * βn * Ξ΄Bandits.ClippedUCB.integral_sum_range_ucb_action_sub_actionMean_action_le.{u_1, u_2} {K : β} {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} [Nonempty (Fin K)] [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] {E : Ξ© β π} {Q : MeasureTheory.Measure π} {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm (Fin K) β} (h : Learning.IsBayesAlgEnvSeq Q ΞΊ alg E A R P) (hlu : l β€ u) (hm : β (e : π) (a : Fin K), β« (x : β), id x βΞΊ (e, a) β Set.Icc l u) (hΟ2 : 0 < Ο2) (hs : β (e : π) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - β« (x : β), id x βΞΊ (e, a)) β¨Ο2, β―β© (ΞΊ (e, a))) (hΞ΄ : 0 < Ξ΄) (n : β) : β« (x : Ξ©), (fun Ο => β t β Finset.range n, (ucb A R l u Ο2 Ξ΄ (A t Ο) t Ο - Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (A t Ο) Ο)) x βP β€ (u - l) * βK + 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * βK * βn) + (u - l) * βK * (βn - 1) * βn * Ξ΄
Code
lemma integral_sum_range_ucb_action_sub_actionMean_action_le {alg : Algorithm (Fin K) β}
(h : IsBayesAlgEnvSeq Q ΞΊ alg E A R P)
(hlu : l β€ u) (hm : β e a, (ΞΊ (e, a))[id] β (Set.Icc l u)) (hΟ2 : 0 < Ο2)
(hs : β e a, HasSubgaussianMGF (fun x β¦ x - (ΞΊ (e, a))[id]) β¨Ο2, hΟ2.leβ© (ΞΊ (e, a)))
(hΞ΄ : 0 < Ξ΄) (n : β) :
P[fun Ο β¦ β t β range n, (ucb A R l u Ο2 Ξ΄ (A t Ο) t Ο - actionMean ΞΊ E (A t Ο) Ο)] β€
(u - l) * K + 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * K * n) + (u - l) * K * (n - 1) * n * Ξ΄Type uses (4)
Body uses (9)
Used by (1)
Actions: Source Β· Open Issue
Proof
by
by_cases hn : n = 0
Β· simp [hn, hlu, mul_nonneg]
let F := {Ο | β t < n, β a, pullCount A a t Ο β 0 β§
β(2 * Ο2 * Real.log (1 / Ξ΄) / pullCount A a t Ο) β€ empMean A R a t Ο - actionMean ΞΊ E a Ο}
have := h.measurable_action
have := h.measurable_param
have := h.measurable_feedback
have hF : MeasurableSet F := by measurability
have : β t, Integrable (fun Ο β¦ actionMean ΞΊ E (A t Ο) Ο) P :=
fun t β¦ IsBayesAlgEnvSeq.integrable_uncurry_actionMean_comp (by fun_prop) (by fun_prop) hm
calc
_ β€ (β« Ο in F, β t β range n, (u - l) βP) +
β« Ο in FαΆ, (u - l) * K + 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * K * n) βP := by
rw [β integral_add_compl hF (by fun_prop)]
apply add_le_add
Β· apply setIntegral_mono_on (Integrable.integrableOn (by fun_prop))
(Integrable.integrableOn (by fun_prop)) hF
intro Ο hΟ
apply sum_le_sum
intro t ht
grind [ucb, actionMean]
Β· apply setIntegral_mono_on (Integrable.integrableOn (by fun_prop))
(Integrable.integrableOn (by fun_prop)) hF.compl
intro Ο hΟ
rw [Set.mem_compl_iff, Set.mem_setOf_eq] at hΟ
push Not at hΟ
exact sum_ucb_sub_mean_le (fun a β¦ (ΞΊ (E Ο, a))[id]) (hm (E Ο)) hlu
(fun t ht hpc β¦ hΟ t ht (A t Ο) hpc)
_ = P.real F * (n * (u - l)) +
P.real FαΆ * ((u - l) * K + 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * K * n)) := by
simp_rw [sum_const, card_range, nsmul_eq_mul, setIntegral_const, smul_eq_mul]
_ β€ (K * (n - 1) * Ξ΄) * (n * (u - l)) +
1 * ((u - l) * K + 4 * β(2 * Ο2 * Real.log (1 / Ξ΄) * K * n)) := by
have : 0 β€ u - l := sub_nonneg.2 hlu
gcongr
Β· have : (0 : β) β€ n - 1 := by simp [Nat.one_le_iff_ne_zero, hn]
apply ENNReal.toReal_le_of_le_ofReal (by positivity)
exact h.prob_empMean_sub_actionMean_ge_le hΟ2 hs hΞ΄ n
Β· exact measureReal_le_one
_ = _ := by
ring
integral_ucb_action_eq_integral_ucb_bestActionπ
Bandits.TS.integral_ucb_action_eq_integral_ucb_bestActionNo docstring.
Bandits.TS.integral_ucb_action_eq_integral_ucb_bestAction.{u_1, u_2} {K : β} [Nonempty (Fin K)] {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] [StandardBorelSpace π] [Nonempty π] {Q : MeasureTheory.Measure π} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β π} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (n : β) : β« (x : Ξ©), (fun Ο => ClippedUCB.ucb A R l u Ο2 Ξ΄ (A n Ο) n Ο) x βP = β« (x : Ξ©), (fun Ο => ClippedUCB.ucb A R l u Ο2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) n Ο) x βPBandits.TS.integral_ucb_action_eq_integral_ucb_bestAction.{u_1, u_2} {K : β} [Nonempty (Fin K)] {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] [StandardBorelSpace π] [Nonempty π] {Q : MeasureTheory.Measure π} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β π} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (n : β) : β« (x : Ξ©), (fun Ο => ClippedUCB.ucb A R l u Ο2 Ξ΄ (A n Ο) n Ο) x βP = β« (x : Ξ©), (fun Ο => ClippedUCB.ucb A R l u Ο2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) n Ο) x βP
Code
lemma integral_ucb_action_eq_integral_ucb_bestAction (hK : 0 < K)
(h : IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (n : β) :
P[fun Ο β¦ ucb A R l u Ο2 Ξ΄ (A n Ο) n Ο] =
P[fun Ο β¦ ucb A R l u Ο2 Ξ΄ (bestAction ΞΊ E Ο) n Ο]Type uses (4)
Body uses (10)
Used by (1)
Actions: Source Β· Open Issue
Proof
by
have := h.measurable_action
have := h.measurable_param
have := h.measurable_feedback
by_cases hn : n = 0
Β· simp [hn]
obtain β¨n, rflβ© := Nat.exists_eq_succ_of_ne_zero hn
let uc (ha : (Iic n β Fin K Γ β) Γ Fin K) := ucb' n ha.1 l u Ο2 Ξ΄ ha.2
calc
_ = P[fun Ο β¦ uc (history A R n Ο, A (n + 1) Ο)] := by
simp_rw [uc, ucb_succ_eq_ucb']
_ = β« ha, uc ha βP.map (fun Ο β¦ (history A R n Ο, A (n + 1) Ο)) := by
rw [β integral_map (by fun_prop) (by fun_prop)]
_ = β« ha, uc ha βP.map (fun Ο β¦ (history A R n Ο, bestAction ΞΊ E Ο)) := by
rw [β compProd_map_condDistrib (by fun_prop), β compProd_map_condDistrib (by fun_prop),
Measure.compProd_congr (hasCondDistrib_action hK h n).condDistrib_eq]
_ = P[fun Ο β¦ ucb A R l u Ο2 Ξ΄ (bestAction ΞΊ E Ο) (n + 1) Ο] := by
rw [integral_map (by fun_prop) (by fun_prop)]
simp_rw [uc, ucb_succ_eq_ucb']
integral_regret_eq_addπ
Bandits.TS.integral_regret_eq_addNo docstring.
Bandits.TS.integral_regret_eq_add.{u_1, u_2} {K : β} [Nonempty (Fin K)] {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] [StandardBorelSpace π] [Nonempty π] {Q : MeasureTheory.Measure π} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β π} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (hm : β (e : π) (a : Fin K), β« (x : β), id x βΞΊ (e, a) β Set.Icc l u) (n : β) : β« (x : Ξ©), Learning.IsBayesAlgEnvSeq.regret ΞΊ E A n x βP = β« (x : Ξ©), (fun Ο => β t β Finset.range n, (Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) Ο - ClippedUCB.ucb A R l u Ο2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) t Ο)) x βP + β« (x : Ξ©), (fun Ο => β t β Finset.range n, (ClippedUCB.ucb A R l u Ο2 Ξ΄ (A t Ο) t Ο - Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (A t Ο) Ο)) x βPBandits.TS.integral_regret_eq_add.{u_1, u_2} {K : β} [Nonempty (Fin K)] {l u Ο2 Ξ΄ : β} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] [StandardBorelSpace π] [Nonempty π] {Q : MeasureTheory.Measure π} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β π} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (hm : β (e : π) (a : Fin K), β« (x : β), id x βΞΊ (e, a) β Set.Icc l u) (n : β) : β« (x : Ξ©), Learning.IsBayesAlgEnvSeq.regret ΞΊ E A n x βP = β« (x : Ξ©), (fun Ο => β t β Finset.range n, (Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) Ο - ClippedUCB.ucb A R l u Ο2 Ξ΄ (Learning.IsBayesAlgEnvSeq.bestAction ΞΊ E Ο) t Ο)) x βP + β« (x : Ξ©), (fun Ο => β t β Finset.range n, (ClippedUCB.ucb A R l u Ο2 Ξ΄ (A t Ο) t Ο - Learning.IsBayesAlgEnvSeq.actionMean ΞΊ E (A t Ο) Ο)) x βP
Code
lemma integral_regret_eq_add (hK : 0 < K) (h : IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P)
(hm : β e a, (ΞΊ (e, a))[id] β (Set.Icc l u)) (n : β) :
P[IsBayesAlgEnvSeq.regret ΞΊ E A n] =
P[fun Ο β¦ β t β range n,
(actionMean ΞΊ E (bestAction ΞΊ E Ο) Ο - ucb A R l u Ο2 Ξ΄ (bestAction ΞΊ E Ο) t Ο)] +
P[fun Ο β¦ β t β range n,
(ucb A R l u Ο2 Ξ΄ (A t Ο) t Ο - actionMean ΞΊ E (A t Ο) Ο)]Type uses (6)
Body uses (8)
Used by (1)
Actions: Source Β· Open Issue
Proof
by
have hua (t : β) : Integrable (fun Ο β¦ ucb A R l u Ο2 Ξ΄ (A t Ο) t Ο) P :=
integrable_uncurry_ucb_comp h.measurable_action h.measurable_feedback (h.measurable_action t)
measurable_const
have hub (t : β) : Integrable (fun Ο β¦ ucb A R l u Ο2 Ξ΄ (bestAction ΞΊ E Ο) t Ο) P :=
integrable_uncurry_ucb_comp h.measurable_action h.measurable_feedback
(IsBayesAlgEnvSeq.measurable_bestAction h.measurable_param) measurable_const
have haa (t : β) : Integrable (fun Ο β¦ actionMean ΞΊ E (A t Ο) Ο) P :=
IsBayesAlgEnvSeq.integrable_uncurry_actionMean_comp h.measurable_param
(h.measurable_action t) hm
have hab : Integrable (fun Ο β¦ actionMean ΞΊ E (bestAction ΞΊ E Ο) Ο) P :=
IsBayesAlgEnvSeq.integrable_uncurry_actionMean_comp h.measurable_param
(IsBayesAlgEnvSeq.measurable_bestAction h.measurable_param) hm
calc
_ = (β t β range n, β« Ο, actionMean ΞΊ E (bestAction ΞΊ E Ο) Ο βP) -
β t β range n, β« Ο, actionMean ΞΊ E (A t Ο) Ο βP := by
simp_rw [IsBayesAlgEnvSeq.regret_eq_sum_gap, IsBayesAlgEnvSeq.gap_eq_sub]
rw [integral_finsetSum _ (by fun_prop), β Finset.sum_sub_distrib]
simp_rw [integral_sub hab (haa _)]
_ = ((β t β range n, β« Ο, actionMean ΞΊ E (bestAction ΞΊ E Ο) Ο βP) -
β t β range n, β« Ο, ucb A R l u Ο2 Ξ΄ (bestAction ΞΊ E Ο) t Ο βP) +
((β t β range n, β« Ο, ucb A R l u Ο2 Ξ΄ (A t Ο) t Ο βP) -
β t β range n, β« Ο, actionMean ΞΊ E (A t Ο) Ο βP) := by
simp [integral_ucb_action_eq_integral_ucb_bestAction hK h]
_ = (β t β range n, β« Ο, actionMean ΞΊ E (bestAction ΞΊ E Ο) Ο -
ucb A R l u Ο2 Ξ΄ (bestAction ΞΊ E Ο) t Ο βP) +
β t β range n, β« Ο, ucb A R l u Ο2 Ξ΄ (A t Ο) t Ο -
actionMean ΞΊ E (A t Ο) Ο βP := by
rw [β Finset.sum_sub_distrib, β Finset.sum_sub_distrib]
simp_rw [β integral_sub hab (hub _), β integral_sub (hua _) (haa _)]
_ = _ := by
rw [β integral_finsetSum _ (by fun_prop), β integral_finsetSum _ (by fun_prop)]
integral_regret_leπ
Bandits.TS.integral_regret_le
If Thompson sampling has the correct prior over environments and every environment has K
actions, each of which has a corresponding reward between l and u that is sub-Gaussian with
variance proxy Ο2 after its mean is subtracted, then the Bayesian regret at time n is at most
(2 * K + 1) * (u - l) + 8 * β(Ο2 * K * n * Real.log n).
Bandits.TS.integral_regret_le.{u_1, u_2} {K : β} [Nonempty (Fin K)] {l u Ο2 : β} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] [StandardBorelSpace π] [Nonempty π] {Q : MeasureTheory.Measure π} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β π} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (hlu : l β€ u) (hm : β (e : π) (a : Fin K), β« (x : β), id x βΞΊ (e, a) β Set.Icc l u) (hΟ2 : 0 < Ο2) (hs : β (e : π) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - β« (x : β), id x βΞΊ (e, a)) β¨Ο2, β―β© (ΞΊ (e, a))) (n : β) : β« (x : Ξ©), Learning.IsBayesAlgEnvSeq.regret ΞΊ E A n x βP β€ (2 * βK + 1) * (u - l) + 8 * β(Ο2 * βK * βn * Real.log βn)Bandits.TS.integral_regret_le.{u_1, u_2} {K : β} [Nonempty (Fin K)] {l u Ο2 : β} {Ξ© : Type u_1} [MeasurableSpace Ξ©] {π : Type u_2} [MeasurableSpace π] [StandardBorelSpace π] [Nonempty π] {Q : MeasureTheory.Measure π} [MeasureTheory.IsProbabilityMeasure Q] {ΞΊ : ProbabilityTheory.Kernel (π Γ Fin K) β} [ProbabilityTheory.IsMarkovKernel ΞΊ] {E : Ξ© β π} {A : β β Ξ© β Fin K} {R : β β Ξ© β β} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P) (hlu : l β€ u) (hm : β (e : π) (a : Fin K), β« (x : β), id x βΞΊ (e, a) β Set.Icc l u) (hΟ2 : 0 < Ο2) (hs : β (e : π) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - β« (x : β), id x βΞΊ (e, a)) β¨Ο2, β―β© (ΞΊ (e, a))) (n : β) : β« (x : Ξ©), Learning.IsBayesAlgEnvSeq.regret ΞΊ E A n x βP β€ (2 * βK + 1) * (u - l) + 8 * β(Ο2 * βK * βn * Real.log βn)
Code
theorem integral_regret_le (hK : 0 < K) (h : IsBayesAlgEnvSeq Q ΞΊ (tsAlgorithm hK Q ΞΊ) E A R P)
(hlu : l β€ u) (hm : β e a, (ΞΊ (e, a))[id] β (Set.Icc l u)) (hΟ2 : 0 < Ο2)
(hs : β e a, HasSubgaussianMGF (fun x β¦ x - (ΞΊ (e, a))[id]) β¨Ο2, hΟ2.leβ© (ΞΊ (e, a))) (n : β) :
P[IsBayesAlgEnvSeq.regret ΞΊ E A n]
β€ (2 * K + 1) * (u - l) + 8 * β(Ο2 * K * n * Real.log n)Type uses (3)
Body uses (6)
Actions: Source Β· Open Issue
Proof
by
by_cases hn : n = 0
Β· simp [hn, IsBayesAlgEnvSeq.regret, Bandits.regret]
nlinarith
have hΞ΄ : (0 : β) < 1 / n ^ 2 := by positivity
calc P[IsBayesAlgEnvSeq.regret ΞΊ E A n]
= _ :=
integral_regret_eq_add hK h hm n
_ β€ _ :=
add_le_add
(integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le h hlu hm hΟ2 hs hΞ΄ n)
(integral_sum_range_ucb_action_sub_actionMean_action_le h hlu hm hΟ2 hs hΞ΄ n)
_ = K * (u - l) + (K + 1) * (u - l) * ((n - 1) / n)
+ 4 * β((2 : β) ^ 2 * (Ο2 * K * n * Real.log n)) := by
field_simp
rw [Real.log_pow]
ring_nf
_ = K * (u - l) + (K + 1) * (u - l) * ((n - 1) / n) + 8 * β(Ο2 * K * n * Real.log n) := by
rw [Real.sqrt_mul (by positivity), Real.sqrt_sq (by norm_num)]
ring
_ β€ K * (u - l) + (K + 1) * (u - l) * 1 + 8 * β(Ο2 * K * n * Real.log n) := by -- loose
have : 0 β€ u - l := sub_nonneg.2 hlu
gcongr
rw [div_le_one (by positivity)]
linarith
_ = _ := by
ring-
Bandits.ClippedUCB.ucb -
Bandits.ClippedUCB.ucb_zero -
Bandits.ClippedUCB.ucb_mem_Icc -
Bandits.ClippedUCB.measurable_ucb -
Bandits.ClippedUCB.measurable_uncurry_ucb_comp -
Bandits.ClippedUCB.integrable_uncurry_ucb_comp -
Bandits.ClippedUCB.ucb' -
Bandits.ClippedUCB.measurable_uncurry_ucb' -
Bandits.ClippedUCB.ucb_succ_eq_ucb' -
Bandits.ClippedUCB.sum_ucb_sub_mean_le -
Bandits.ClippedUCB.integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le -
Bandits.ClippedUCB.integral_sum_range_ucb_action_sub_actionMean_action_le -
Bandits.TS.integral_ucb_action_eq_integral_ucb_bestAction -
Bandits.TS.integral_regret_eq_add -
Bandits.TS.integral_regret_le