Documentation

LeanMachineLearning.Online.Bandit.BayesRegret

Bayesian regret #

This file defines actionMean, bestAction, gap, and regret as random variables in a measurable space Ω. These definitions are useful when IsBayesAlgEnvSeq Q κ alg E A Y P.

Recall that IsBayesAlgEnvSeq Q κ alg E A Y P states that there is a measure P : Measure Ω such that the parameter E : Ω → 𝓔 has law Q and that the sequences of actions A : ℕ → Ω → 𝓐 and feedbacks Y : ℕ → Ω → 𝓨 are generated by the algorithm alg : Algorithm 𝓐 𝓨 interacting with an underlying environment that depends on E and κ (stationaryEnv (κ.sectR (E ω)))

Main definitions #

actionMean κ E a: the mean feedback associated with action a : 𝓐 based on the parameter E, which defines the underlying stationary environment together with the kernel κ.
bestAction κ E: (one of) the action(s) with the highest associated mean feedback based on E.
gap κ E A n: the difference between the highest mean feedback associated with an action and the mean feedback associated with the action at time n based on E and the sequence of actions A.
regret κ E A n: the regret at time n based on E and the sequence of actions A. If IsBayesAlgEnvSeq Q κ alg E A Y P, then P[regret κ E A n] is the so-called Bayesian regret of algorithm alg under the prior Q.

noncomputable def Learning.IsBayesAlgEnvSeq.actionMean {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] (κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ) (E : Ω → 𝓔) (a : 𝓐) (ω : Ω) :

A random variable that gives the mean feedback of action a.

Equations

Learning.IsBayesAlgEnvSeq.actionMean κ E a ω = ∫ (x : ℝ), id x ∂κ (E ω, a)

Instances For

theorem Learning.IsBayesAlgEnvSeq.measurable_actionMean {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [MeasurableSpace Ω] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {a : 𝓐} (hE : Measurable E) :

Measurable (actionMean κ E a)

theorem Learning.IsBayesAlgEnvSeq.measurable_uncurry_actionMean_comp {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [MeasurableSpace Ω] [Countable 𝓐] [MeasurableSingletonClass 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} (hE : Measurable E) {f : Ω → 𝓐} (hf : Measurable f) :

Measurable fun (ω : Ω) => actionMean κ E (f ω) ω

theorem Learning.IsBayesAlgEnvSeq.integrable_uncurry_actionMean_comp {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [MeasurableSpace Ω] [Countable 𝓐] [MeasurableSingletonClass 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} (hE : Measurable E) {f : Ω → 𝓐} (hf : Measurable f) {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] {l u : ℝ} (hm : ∀ (e : 𝓔) (a : 𝓐), ∫ (x : ℝ), id x ∂κ (e, a) ∈ Set.Icc l u) :

MeasureTheory.Integrable (fun (ω : Ω) => actionMean κ E (f ω) ω) P

noncomputable def Learning.IsBayesAlgEnvSeq.bestAction {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [Nonempty 𝓐] [Fintype 𝓐] [Encodable 𝓐] [MeasurableSingletonClass 𝓐] (κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ) (E : Ω → 𝓔) (ω : Ω) :

𝓐

A random variable that gives the action with the highest mean feedback.

Equations

Learning.IsBayesAlgEnvSeq.bestAction κ E ω = measurableArgmax (fun (ω' : Ω) (a : 𝓐) => Learning.IsBayesAlgEnvSeq.actionMean κ E a ω') ω

Instances For

theorem Learning.IsBayesAlgEnvSeq.measurable_bestAction {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [MeasurableSpace Ω] [Nonempty 𝓐] [Fintype 𝓐] [Encodable 𝓐] [MeasurableSingletonClass 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} (hE : Measurable E) :

Measurable (bestAction κ E)

noncomputable def Learning.IsBayesAlgEnvSeq.gap {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] (κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ) (E : Ω → 𝓔) (A : ℕ → Ω → 𝓐) (n : ℕ) (ω : Ω) :

A random variable that gives the gap at time n.

Equations

Learning.IsBayesAlgEnvSeq.gap κ E A n ω = Bandits.gap (κ.sectR (E ω)) (A n ω)

Instances For

theorem Learning.IsBayesAlgEnvSeq.gap_nonneg_of_le {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {A : ℕ → Ω → 𝓐} {n : ℕ} {ω : Ω} {u : ℝ} (h : ∀ (e : 𝓔) (a : 𝓐), ∫ (x : ℝ), id x ∂κ (e, a) ≤ u) :

0 ≤ gap κ E A n ω

The gap is non-negative if the means are bounded by u : ℝ (even if 𝓐 is not Finite).

theorem Learning.IsBayesAlgEnvSeq.gap_le_of_mem_Icc {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [Nonempty 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {A : ℕ → Ω → 𝓐} {n : ℕ} {ω : Ω} {l u : ℝ} (h : ∀ (e : 𝓔) (a : 𝓐), ∫ (x : ℝ), id x ∂κ (e, a) ∈ Set.Icc l u) :

gap κ E A n ω ≤ u - l

theorem Learning.IsBayesAlgEnvSeq.gap_eq_sub {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [Nonempty 𝓐] [Fintype 𝓐] [Encodable 𝓐] [MeasurableSingletonClass 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {A : ℕ → Ω → 𝓐} {n : ℕ} {ω : Ω} :

gap κ E A n ω = actionMean κ E (bestAction κ E ω) ω - actionMean κ E (A n ω) ω

theorem Learning.IsBayesAlgEnvSeq.measurable_gap {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [MeasurableSpace Ω] [Countable 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {A : ℕ → Ω → 𝓐} {n : ℕ} (hE : Measurable E) (hA : ∀ (t : ℕ), Measurable (A t)) :

Measurable (gap κ E A n)

theorem Learning.IsBayesAlgEnvSeq.integrable_gap {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [MeasurableSpace Ω] [Countable 𝓐] [Nonempty 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {A : ℕ → Ω → 𝓐} {n : ℕ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] (hE : Measurable E) (hA : ∀ (t : ℕ), Measurable (A t)) {l u : ℝ} (h : ∀ (e : 𝓔) (a : 𝓐), ∫ (x : ℝ), id x ∂κ (e, a) ∈ Set.Icc l u) :

MeasureTheory.Integrable (gap κ E A n) P

noncomputable def Learning.IsBayesAlgEnvSeq.regret {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] (κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ) (E : Ω → 𝓔) (A : ℕ → Ω → 𝓐) (n : ℕ) (ω : Ω) :

A random variable that gives the regret at time n.

Equations

Learning.IsBayesAlgEnvSeq.regret κ E A n ω = Bandits.regret (κ.sectR (E ω)) A n ω

Instances For

theorem Learning.IsBayesAlgEnvSeq.regret_eq_sum_gap {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {A : ℕ → Ω → 𝓐} {n : ℕ} {ω : Ω} :

regret κ E A n ω = ∑ s ∈ Finset.range n, gap κ E A s ω

theorem Learning.IsBayesAlgEnvSeq.regret_eq_sum_gap' {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {A : ℕ → Ω → 𝓐} {n : ℕ} :

regret κ E A n = fun (ω : Ω) => ∑ s ∈ Finset.range n, gap κ E A s ω

theorem Learning.IsBayesAlgEnvSeq.measurable_regret {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [MeasurableSpace Ω] [Countable 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {A : ℕ → Ω → 𝓐} {n : ℕ} (hE : Measurable E) (hA : ∀ (t : ℕ), Measurable (A t)) :

Measurable (regret κ E A n)

theorem Learning.IsBayesAlgEnvSeq.integrable_regret {𝓔 : Type u_1} {𝓐 : Type u_2} {Ω : Type u_4} [MeasurableSpace 𝓔] [MeasurableSpace 𝓐] [MeasurableSpace Ω] [Countable 𝓐] [Nonempty 𝓐] {κ : ProbabilityTheory.Kernel (𝓔 × 𝓐) ℝ} {E : Ω → 𝓔} {A : ℕ → Ω → 𝓐} {n : ℕ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P] (hE : Measurable E) (hA : ∀ (t : ℕ), Measurable (A t)) {l u : ℝ} (h : ∀ (e : 𝓔) (a : 𝓐), ∫ (x : ℝ), id x ∂κ (e, a) ∈ Set.Icc l u) :

MeasureTheory.Integrable (regret κ E A n) P