Learning.IsBayesAlgEnvSeq.prob_empMean_sub_actionMean_ge_le
This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.
prob_empMean_sub_actionMean_ge_le๐
Learning.IsBayesAlgEnvSeq.prob_empMean_sub_actionMean_ge_leNo docstring.
Learning.IsBayesAlgEnvSeq.prob_empMean_sub_actionMean_ge_le.{u_1, u_2} {๐ : Type u_1} {ฮฉ : Type u_2} [MeasurableSpace ๐] [MeasurableSpace ฮฉ] {K : โ} [Nonempty (Fin K)] {Q : MeasureTheory.Measure ๐} {ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ} [ProbabilityTheory.IsMarkovKernel ฮบ] {alg : Algorithm (Fin K) โ} {E : ฮฉ โ ๐} {A : โ โ ฮฉ โ Fin K} {R : โ โ ฮฉ โ โ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] (h : IsBayesAlgEnvSeq Q ฮบ alg E A R P) {ฯ2 : NNReal} (hฯ2 : 0 < ฯ2) (hs : โ (e : ๐) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - โซ (x : โ), id x โฮบ (e, a)) ฯ2 (ฮบ (e, a))) {ฮด : โ} (hฮด : 0 < ฮด) (n : โ) : P {ฯ | โ t < n, โ a, pullCount A a t ฯ โ 0 โง โ(2 * โฯ2 * Real.log (1 / ฮด) / โ(pullCount A a t ฯ)) โค empMean A R a t ฯ - actionMean ฮบ E a ฯ} โค ENNReal.ofReal (โK * (โn - 1) * ฮด)Learning.IsBayesAlgEnvSeq.prob_empMean_sub_actionMean_ge_le.{u_1, u_2} {๐ : Type u_1} {ฮฉ : Type u_2} [MeasurableSpace ๐] [MeasurableSpace ฮฉ] {K : โ} [Nonempty (Fin K)] {Q : MeasureTheory.Measure ๐} {ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ} [ProbabilityTheory.IsMarkovKernel ฮบ] {alg : Algorithm (Fin K) โ} {E : ฮฉ โ ๐} {A : โ โ ฮฉ โ Fin K} {R : โ โ ฮฉ โ โ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] (h : IsBayesAlgEnvSeq Q ฮบ alg E A R P) {ฯ2 : NNReal} (hฯ2 : 0 < ฯ2) (hs : โ (e : ๐) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - โซ (x : โ), id x โฮบ (e, a)) ฯ2 (ฮบ (e, a))) {ฮด : โ} (hฮด : 0 < ฮด) (n : โ) : P {ฯ | โ t < n, โ a, pullCount A a t ฯ โ 0 โง โ(2 * โฯ2 * Real.log (1 / ฮด) / โ(pullCount A a t ฯ)) โค empMean A R a t ฯ - actionMean ฮบ E a ฯ} โค ENNReal.ofReal (โK * (โn - 1) * ฮด)
Code
lemma prob_empMean_sub_actionMean_ge_le (h : IsBayesAlgEnvSeq Q ฮบ alg E A R P) {ฯ2 : โโฅ0}
(hฯ2 : 0 < ฯ2) (hs : โ e a, HasSubgaussianMGF (fun x โฆ x - (ฮบ (e, a))[id]) ฯ2 (ฮบ (e, a)))
{ฮด : โ} (hฮด : 0 < ฮด) (n : โ) :
P {ฯ | โ t < n, โ a, pullCount A a t ฯ โ 0 โง
โ(2 * ฯ2 * Real.log (1 / ฮด) / pullCount A a t ฯ) โค empMean A R a t ฯ - actionMean ฮบ E a ฯ}
โค ENNReal.ofReal (K * (n - 1) * ฮด)Type uses (5)
Body uses (14)
Actions: Source ยท Open Issue
Proof
by
have := h.measurable_param
have := h.measurable_action
have := h.measurable_feedback
let S := {(e, ฯ) | โ a, โ t < n, pullCount IT.action a t ฯ โ 0 โง
โ(2 * pullCount IT.action a t ฯ * ฯ2 * Real.log (1 / ฮด)) โค
sumRewards IT.action IT.feedback a t ฯ - pullCount IT.action a t ฯ * actionMean ฮบ id a e}
calc
_ โค (P.map (fun ฯ โฆ (E ฯ, trajectory A R ฯ))) S := by
rw [Measure.map_apply (by fun_prop) (by measurability)]
apply measure_mono
intro ฯ โจt, ht, a, hpc, hleโฉ
rw [empMean] at hle
exact โจa, t, ht, hpc, sqrt_two_mul_le_sub hpc hleโฉ
_ = (P.map E โโ condDistrib (trajectory A R) E P) S := by
rw [โ compProd_map_condDistrib (by fun_prop)]
_ = โซโป e, condDistrib (trajectory A R) E P e (Prod.mk e โปยน' S) โ(P.map E) :=
Measure.compProd_apply (by measurability)
_ โค โซโป e, ENNReal.ofReal (Fintype.card (Fin K) * (n - 1) * ฮด) โ(P.map E) := by
apply lintegral_mono_ae
rw [h.hasLaw_env.map_eq]
filter_upwards [h.ae_IsAlgEnvSeq] with e he
exact Bandits.prob_sumRewards_sub_pullCount_mul_ge_le_of_Fintype hฯ2 (hs e) he hฮด
_ = ENNReal.ofReal (K * (n - 1) * ฮด) := by
simp [Measure.map_apply h.measurable_param]Dependency graph
Type dependencies (5)
Algorithm๐
Learning.AlgorithmA stochastic, sequential algorithm.
Learning.Algorithm.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)Learning.Algorithm.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)
Code
structure Algorithm (๐ ๐จ : Type*) [MeasurableSpace ๐] [MeasurableSpace ๐จ] where /-- Policy or sampling rule: distribution of the next action. -/ policy : (n : โ) โ Kernel (Iic n โ ๐ ร ๐จ) ๐ /-- The policy is a Markov kernel. -/ [h_policy : โ n, IsMarkovKernel (policy n)] /-- Distribution of the first action. -/ p0 : Measure ๐ /-- The first action distribution is a probability measure. -/ [hp0 : IsProbabilityMeasure p0]
Used by (216)
Actions: Source ยท Open Issue
IsBayesAlgEnvSeq๐
Learning.IsBayesAlgEnvSeq
IsBayesAlgEnvSeq Q ฮบ alg E A Y P states that there is a measure P : Measure ฮฉ such
that the parameter E : ฮฉ โ ๐ has law Q and that the sequences of actions A : โ โ ฮฉ โ ๐
and feedbacks Y : โ โ ฮฉ โ ๐จ are generated by the algorithm alg : Algorithm ๐ ๐จ interacting
with an underlying environment that depends on E and ฮบ (stationaryEnv (ฮบ.sectR (E ฯ))).
Learning.IsBayesAlgEnvSeq.{u_1, u_2, u_3, u_4} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] [MeasurableSpace ฮฉ] (Q : MeasureTheory.Measure ๐) (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) (alg : Algorithm ๐ ๐จ) (E : ฮฉ โ ๐) (A : โ โ ฮฉ โ ๐) (Y : โ โ ฮฉ โ ๐จ) (P : MeasureTheory.Measure ฮฉ) [MeasureTheory.IsFiniteMeasure P] : PropLearning.IsBayesAlgEnvSeq.{u_1, u_2, u_3, u_4} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] [MeasurableSpace ฮฉ] (Q : MeasureTheory.Measure ๐) (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) (alg : Algorithm ๐ ๐จ) (E : ฮฉ โ ๐) (A : โ โ ฮฉ โ ๐) (Y : โ โ ฮฉ โ ๐จ) (P : MeasureTheory.Measure ฮฉ) [MeasureTheory.IsFiniteMeasure P] : Prop
Code
structure IsBayesAlgEnvSeq
(Q : Measure ๐) (ฮบ : Kernel (๐ ร ๐) ๐จ) (alg : Algorithm ๐ ๐จ)
(E : ฮฉ โ ๐) (A : โ โ ฮฉ โ ๐) (Y : โ โ ฮฉ โ ๐จ)
(P : Measure ฮฉ) [IsFiniteMeasure P] : Prop where
measurable_param : Measurable E := by fun_prop
measurable_action n : Measurable (A n) := by fun_prop
measurable_feedback n : Measurable (Y n) := by fun_prop
hasLaw_env : HasLaw E Q P
hasCondDistrib_action_zero : HasCondDistrib (A 0) E (Kernel.const _ alg.p0) P
hasCondDistrib_feedback_zero : HasCondDistrib (Y 0) (fun ฯ โฆ (E ฯ, A 0 ฯ)) ฮบ P
hasCondDistrib_action n :
HasCondDistrib (A (n + 1)) (fun ฯ โฆ (E ฯ, history A Y n ฯ))
((alg.policy n).prodMkLeft _) P
hasCondDistrib_feedback n :
HasCondDistrib (Y (n + 1)) (fun ฯ โฆ (history A Y n ฯ, E ฯ, A (n + 1) ฯ))
(ฮบ.prodMkLeft _) PActions: Source ยท Open Issue
pullCount๐
Learning.pullCount
Number of times action a was chosen up to time t (excluding t).
Learning.pullCount.{u_1, u_3} {๐ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐] (A : โ โ ฮฉ โ ๐) (a : ๐) (t : โ) (ฯ : ฮฉ) : โLearning.pullCount.{u_1, u_3} {๐ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐] (A : โ โ ฮฉ โ ๐) (a : ๐) (t : โ) (ฯ : ฮฉ) : โ
Code
noncomputable def pullCount (A : โ โ ฮฉ โ ๐) (a : ๐) (t : โ) (ฯ : ฮฉ) : โ := #(filter (fun s โฆ A s ฯ = a) (range t))
Actions: Source ยท Open Issue
empMean๐
Learning.empMean
Empirical mean reward obtained when pulling action a up to time t (exclusive).
Learning.empMean.{u_1, u_3} {๐ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐] (A : โ โ ฮฉ โ ๐) (R' : โ โ ฮฉ โ โ) (a : ๐) (t : โ) (ฯ : ฮฉ) : โLearning.empMean.{u_1, u_3} {๐ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐] (A : โ โ ฮฉ โ ๐) (R' : โ โ ฮฉ โ โ) (a : ๐) (t : โ) (ฯ : ฮฉ) : โ
Code
noncomputable def empMean (A : โ โ ฮฉ โ ๐) (R' : โ โ ฮฉ โ โ) (a : ๐) (t : โ) (ฯ : ฮฉ) : โ := sumRewards A R' a t ฯ / pullCount A a t ฯ
Body uses (2)
Actions: Source ยท Open Issue
actionMean๐
Learning.IsBayesAlgEnvSeq.actionMean
A random variable that gives the mean feedback of action a.
Learning.IsBayesAlgEnvSeq.actionMean.{u_1, u_2, u_4} {๐ : Type u_1} {๐ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) โ) (E : ฮฉ โ ๐) (a : ๐) (ฯ : ฮฉ) : โLearning.IsBayesAlgEnvSeq.actionMean.{u_1, u_2, u_4} {๐ : Type u_1} {๐ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) โ) (E : ฮฉ โ ๐) (a : ๐) (ฯ : ฮฉ) : โ
Code
noncomputable def actionMean (ฮบ : Kernel (๐ ร ๐) โ) (E : ฮฉ โ ๐) (a : ๐) (ฯ : ฮฉ) : โ := (ฮบ (E ฯ, a))[id]
Actions: Source ยท Open Issue
All dependencies, transitively (2)
history๐
Learning.history
History of the algorithm-environment sequence up to time n.
Learning.history.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {ฮฉ : Type u_3} (A : โ โ ฮฉ โ ๐) (Y : โ โ ฮฉ โ ๐จ) (n : โ) (ฯ : ฮฉ) : โฅ(Finset.Iic n) โ ๐ ร ๐จLearning.history.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {ฮฉ : Type u_3} (A : โ โ ฮฉ โ ๐) (Y : โ โ ฮฉ โ ๐จ) (n : โ) (ฯ : ฮฉ) : โฅ(Finset.Iic n) โ ๐ ร ๐จ
Code
def history (A : โ โ ฮฉ โ ๐) (Y : โ โ ฮฉ โ ๐จ) (n : โ) (ฯ : ฮฉ) : Iic n โ ๐ ร ๐จ := fun i โฆ (A i ฯ, Y i ฯ)
Actions: Source ยท Open Issue
sumRewards๐
Learning.sumRewards
Sum of rewards obtained when pulling action a up to time t (exclusive).
Learning.sumRewards.{u_1, u_3} {๐ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐] (A : โ โ ฮฉ โ ๐) (R' : โ โ ฮฉ โ โ) (a : ๐) (t : โ) (ฯ : ฮฉ) : โLearning.sumRewards.{u_1, u_3} {๐ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐] (A : โ โ ฮฉ โ ๐) (R' : โ โ ฮฉ โ โ) (a : ๐) (t : โ) (ฯ : ฮฉ) : โ
Code
def sumRewards (A : โ โ ฮฉ โ ๐) (R' : โ โ ฮฉ โ โ) (a : ๐) (t : โ) (ฯ : ฮฉ) : โ := โ s โ range t, if A s ฯ = a then R' s ฯ else 0
Used by (44)
Actions: Source ยท Open Issue