2.1. Online.Bandit.Algorithms.ETC
The Explore-Then-Commit Algorithm
Module LeanMachineLearning.Online.Bandit.Algorithms.ETC contains 18 exposed declarations.
nextArm🔗
Bandits.ETC.nextArm
Arm pulled by the ETC algorithm at time n + 1.
For n < K * m - 1, this is arm (n + 1) % K.
For n = K * m - 1, this is the arm with the highest empirical mean after the exploration phase.
For n ≥ K * m, this is the same arm as at time n.
Bandits.ETC.nextArm {K : ℕ} (hK : 0 < K) (m n : ℕ) (h : ↥(Finset.Iic n) → Fin K × ℝ) : Fin KBandits.ETC.nextArm {K : ℕ} (hK : 0 < K) (m n : ℕ) (h : ↥(Finset.Iic n) → Fin K × ℝ) : Fin K
Code
noncomputable
def ETC.nextArm (hK : 0 < K) (m n : ℕ) (h : Iic n → Fin K × ℝ) : Fin K :=
have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
if hn : n < K * m - 1 then RoundRobin.nextAction hK n
else
if hn_eq : n = K * m - 1 then argmax (empMean' n h)
else (h ⟨n, by simp⟩).1Body uses (3)
Used by (6)
Actions: Source · Open Issue
measurable_nextArm🔗
Bandits.ETC.measurable_nextArmThe next arm pulled by ETC is chosen in a measurable way.
Bandits.ETC.measurable_nextArm {K : ℕ} (hK : 0 < K) (m n : ℕ) : Measurable (nextArm hK m n)Bandits.ETC.measurable_nextArm {K : ℕ} (hK : 0 < K) (m n : ℕ) : Measurable (nextArm hK m n)
Code
lemma ETC.measurable_nextArm (hK : 0 < K) (m n : ℕ) : Measurable (nextArm hK m n)
Type uses (1)
Body uses (6)
Actions: Source · Open Issue
Proof
by have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK unfold nextArm simp only [dite_eq_ite] refine Measurable.ite (by simp) (by fun_prop) ?_ refine Measurable.ite (by simp) ?_ (by fun_prop) fun_prop
etcAlgorithm🔗
Bandits.etcAlgorithm
The Explore-Then-Commit algorithm: deterministic algorithm that chooses the next arm according
to ETC.nextArm.
Bandits.etcAlgorithm {K : ℕ} (hK : 0 < K) (m : ℕ) : Learning.Algorithm (Fin K) ℝBandits.etcAlgorithm {K : ℕ} (hK : 0 < K) (m : ℕ) : Learning.Algorithm (Fin K) ℝ
Code
noncomputable def etcAlgorithm (hK : 0 < K) (m : ℕ) : Algorithm (Fin K) ℝ := detAlgorithm (ETC.nextArm hK m) (by fun_prop) ⟨0, hK⟩
Type uses (1)
Body uses (3)
Used by (15)
Actions: Source · Open Issue
isAlgEnvSeqUntil_roundRobinAlgorithm🔗
Bandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm
Until round K * m - 1, the ETC algorithm behaves like the Round-Robin algorithm.
Bandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) : Learning.IsAlgEnvSeqUntil A R (Learning.roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (K * m - 1)Bandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) : Learning.IsAlgEnvSeqUntil A R (Learning.roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (K * m - 1)
Code
lemma isAlgEnvSeqUntil_roundRobinAlgorithm
(h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) :
IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K * m - 1) where
measurable_actionType uses (5)
Body uses (9)
Used by (3)
Actions: Source · Open Issue
Proof
h.measurable_action
measurable_feedback := h.measurable_feedback
hasLaw_action_zero := h.hasLaw_action_zero
hasCondDistrib_feedback_zero := h.hasCondDistrib_feedback_zero
hasCondDistrib_action n hn := by
convert h.hasCondDistrib_action n using 1
simp only [roundRobinAlgorithm, detAlgorithm_policy, etcAlgorithm]
congr 1 with h
simp [ETC.nextArm, hn]
hasCondDistrib_feedback n _ := h.hasCondDistrib_feedback n
arm_zero🔗
Bandits.ETC.arm_zeroNo docstring.
Bandits.ETC.arm_zero.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) : A 0 =ᵐ[P] fun x => ⟨0, hK⟩Bandits.ETC.arm_zero.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) : A 0 =ᵐ[P] fun x => ⟨0, hK⟩
Code
lemma arm_zero (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) :
A 0 =ᵐ[P] fun _ ↦ ⟨0, hK⟩Type uses (3)
Actions: Source · Open Issue
Proof
RoundRobin.action_zero ((isAlgEnvSeqUntil_roundRobinAlgorithm h).mono zero_le)
arm_ae_eq_etcNextArm🔗
Bandits.ETC.arm_ae_eq_etcNextArmNo docstring.
Bandits.ETC.arm_ae_eq_etcNextArm.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (n : ℕ) : A (n + 1) =ᵐ[P] fun ω => nextArm hK m n (Learning.history A R n ω)Bandits.ETC.arm_ae_eq_etcNextArm.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (n : ℕ) : A (n + 1) =ᵐ[P] fun ω => nextArm hK m n (Learning.history A R n ω)
Code
lemma arm_ae_eq_etcNextArm (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (n : ℕ) :
A (n + 1) =ᵐ[P] fun ω ↦ nextArm hK m n (history A R n ω)Type uses (5)
Body uses (3)
Used by (2)
Actions: Source · Open Issue
Proof
by have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK exact h.action_detAlgorithm_ae_eq n
arm_of_lt🔗
Bandits.ETC.arm_of_lt
For n < K * m, the arm pulled at time n is the arm n % K.
Bandits.ETC.arm_of_lt.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : ℕ} (hn : n < K * m) : A n =ᵐ[P] fun x => ⟨n % K, ⋯⟩Bandits.ETC.arm_of_lt.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : ℕ} (hn : n < K * m) : A n =ᵐ[P] fun x => ⟨n % K, ⋯⟩
Code
lemma arm_of_lt (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
{n : ℕ} (hn : n < K * m) :
A n =ᵐ[P] fun _ ↦ ⟨n % K, Nat.mod_lt _ hK⟩Type uses (3)
Actions: Source · Open Issue
Proof
RoundRobin.action_ae_eq n ((isAlgEnvSeqUntil_roundRobinAlgorithm h).mono (by grind))
arm_mul🔗
Bandits.ETC.arm_mul
The arm pulled at time K * m is the arm with the highest empirical mean after the exploration
phase.
Bandits.ETC.arm_mul.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hm : m ≠ 0) : A (K * m) =ᵐ[P] fun ω => argmax (Learning.empMean' (K * m - 1) (Learning.history A R (K * m - 1) ω))Bandits.ETC.arm_mul.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hm : m ≠ 0) : A (K * m) =ᵐ[P] fun ω => argmax (Learning.empMean' (K * m - 1) (Learning.history A R (K * m - 1) ω))
Code
lemma arm_mul [Nonempty (Fin K)]
(h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (hm : m ≠ 0) :
A (K * m) =ᵐ[P]
fun ω ↦ argmax (empMean' (K * m - 1) (history A R (K * m - 1) ω))Type uses (6)
Body uses (3)
Used by (1)
Actions: Source · Open Issue
Proof
by
have : K * m = (K * m - 1) + 1 := by
have : 0 < K * m := Nat.mul_pos hK hm.bot_lt
grind
rw [this]
filter_upwards [arm_ae_eq_etcNextArm h (K * m - 1)] with ω hn_eq
rw [hn_eq, nextArm, dif_neg (by simp), dif_pos rfl]
exact this ▸ rfl
arm_add_one_of_ge🔗
Bandits.ETC.arm_add_one_of_ge
For n ≥ K * m, the arm pulled at time n + 1 is the same as the arm pulled at time n.
Bandits.ETC.arm_add_one_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) : A (n + 1) =ᵐ[P] fun ω => A n ωBandits.ETC.arm_add_one_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) : A (n + 1) =ᵐ[P] fun ω => A n ω
Code
lemma arm_add_one_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
{n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) :
A (n + 1) =ᵐ[P] fun ω ↦ A n ωType uses (3)
Body uses (6)
Used by (1)
Actions: Source · Open Issue
Proof
by
filter_upwards [arm_ae_eq_etcNextArm h n] with ω hn_eq
rw [hn_eq, nextArm, dif_neg (by grind), dif_neg]
· rfl
· have : 0 < K * m := Nat.mul_pos hK hm.bot_lt
grind
arm_of_ge🔗
Bandits.ETC.arm_of_ge
For n ≥ K * m, the arm pulled at time n is the same as the arm pulled at time K * m.
Bandits.ETC.arm_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) : A n =ᵐ[P] A (K * m)Bandits.ETC.arm_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) : A n =ᵐ[P] A (K * m)
Code
lemma arm_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
{n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) :
A n =ᵐ[P] A (K * m)Type uses (3)
Body uses (1)
Used by (1)
Actions: Source · Open Issue
Proof
by have h_ae n : K * m ≤ n → A (n + 1) =ᵐ[P] fun ω ↦ A n ω := arm_add_one_of_ge h hm simp_rw [Filter.EventuallyEq, ← ae_all_iff] at h_ae filter_upwards [h_ae] with ω h_ae induction n, hn using Nat.le_induction with | base => rfl | succ n hmn h_ind => rw [h_ae n hmn, h_ind]
pullCount_mul🔗
Bandits.ETC.pullCount_mul
At time K * m, the number of pulls of each arm is equal to m.
Bandits.ETC.pullCount_mul.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) : Learning.pullCount A a (K * m) =ᵐ[P] fun x => mBandits.ETC.pullCount_mul.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) : Learning.pullCount A a (K * m) =ᵐ[P] fun x => m
Code
lemma pullCount_mul (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (a : Fin K) :
pullCount A a (K * m) =ᵐ[P] fun _ ↦ mType uses (4)
Body uses (2)
Used by (3)
Actions: Source · Open Issue
Proof
RoundRobin.pullCount_mul m (isAlgEnvSeqUntil_roundRobinAlgorithm h) a
pullCount_add_one_of_ge🔗
Bandits.ETC.pullCount_add_one_of_geNo docstring.
Bandits.ETC.pullCount_add_one_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) : Learning.pullCount A a (n + 1) =ᵐ[P] fun ω => Learning.pullCount A a n ω + Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ωBandits.ETC.pullCount_add_one_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) : Learning.pullCount A a (n + 1) =ᵐ[P] fun ω => Learning.pullCount A a n ω + Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ω
Code
lemma pullCount_add_one_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
(a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
pullCount A a (n + 1)
=ᵐ[P] fun ω ↦ pullCount A a n ω + {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ωType uses (4)
Body uses (2)
Used by (1)
Actions: Source · Open Issue
Proof
by simp_rw [Filter.EventuallyEq, pullCount_add_one] filter_upwards [arm_of_ge h hm hn] with ω h_arm congr 3
pullCount_of_ge🔗
Bandits.ETC.pullCount_of_ge
For n ≥ K * m, the number of pulls of each arm a at time n is equal to m plus
n - K * m if arm a is the best arm after the exploration phase.
Bandits.ETC.pullCount_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) : Learning.pullCount A a n =ᵐ[P] fun ω => m + (n - K * m) * Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ωBandits.ETC.pullCount_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) : Learning.pullCount A a n =ᵐ[P] fun ω => m + (n - K * m) * Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ω
Code
lemma pullCount_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
(a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
pullCount A a n
=ᵐ[P] fun ω ↦ m + (n - K * m) * {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ωType uses (4)
Body uses (2)
Used by (1)
Actions: Source · Open Issue
Proof
by
have h_ae n : K * m ≤ n → pullCount A a (n + 1)
=ᵐ[P] fun ω ↦ pullCount A a n ω + {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω :=
pullCount_add_one_of_ge h a hm
simp_rw [Filter.EventuallyEq, ← ae_all_iff] at h_ae
have h_ae_Km : pullCount A a (K * m) =ᵐ[P] fun _ ↦ m := pullCount_mul h a
filter_upwards [h_ae_Km, h_ae] with ω h_Km h_ae
induction n, hn using Nat.le_induction with
| base => simp [h_Km]
| succ n hmn h_ind =>
rw [h_ae n hmn, h_ind, add_assoc, ← add_one_mul]
congr
grind
sumRewards_bestArm_le_of_arm_mul_eq🔗
Bandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq
If at time K * m the algorithm chooses arm a, then the total reward obtained by pulling
arm a is at least the total reward obtained by pulling the best arm.
Bandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m ≠ 0) : ∀ᵐ (h : Ω) ∂P, A (K * m) h = a → Learning.sumRewards A R (bestArm ν) (K * m) h ≤ Learning.sumRewards A R a (K * m) hBandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m ≠ 0) : ∀ᵐ (h : Ω) ∂P, A (K * m) h = a → Learning.sumRewards A R (bestArm ν) (K * m) h ≤ Learning.sumRewards A R a (K * m) h
Code
lemma sumRewards_bestArm_le_of_arm_mul_eq [Nonempty (Fin K)]
(h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (a : Fin K) (hm : m ≠ 0) :
∀ᵐ h ∂P, A (K * m) h = a → sumRewards A R (bestArm ν) (K * m) h ≤
sumRewards A R a (K * m) hType uses (5)
Body uses (10)
Used by (1)
Actions: Source · Open Issue
Proof
by
filter_upwards [arm_mul h hm, pullCount_mul h a, pullCount_mul h (bestArm ν)]
with h h_arm ha h_best h_eq
have h_max := isMaxOn_argmax
(empMean' (K * m - 1) (history A R (K * m - 1) h)) (bestArm ν)
rw [← h_arm, h_eq] at h_max
rw [sumRewards_eq_pullCount_mul_empMean, sumRewards_eq_pullCount_mul_empMean, ha, h_best]
· gcongr
have : 0 < K * m := Nat.mul_pos hK hm.bot_lt
rwa [empMean_eq_empMean' this.ne', empMean_eq_empMean' this.ne']
· simp [ha, hm]
· simp [h_best, hm]
probReal_sumRewards_le_sumRewards_le🔗
Bandits.ETC.probReal_sumRewards_le_sumRewards_leNo docstring.
Bandits.ETC.probReal_sumRewards_le_sumRewards_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hν : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a)) (a : Fin K) : MeasureTheory.Measure.real P {ω | Learning.sumRewards A R (bestArm ν) (K * m) ω ≤ Learning.sumRewards A R a (K * m) ω} ≤ Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2))Bandits.ETC.probReal_sumRewards_le_sumRewards_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hν : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a)) (a : Fin K) : MeasureTheory.Measure.real P {ω | Learning.sumRewards A R (bestArm ν) (K * m) ω ≤ Learning.sumRewards A R a (K * m) ω} ≤ Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2))
Code
lemma probReal_sumRewards_le_sumRewards_le [Nonempty (Fin K)]
(h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
(hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a)) (a : Fin K) :
P.real {ω | sumRewards A R (bestArm ν) (K * m) ω ≤ sumRewards A R a (K * m) ω} ≤
Real.exp (-↑m * gap ν a ^ 2 / (4 * σ2))Type uses (6)
Body uses (5)
Used by (1)
Actions: Source · Open Issue
Proof
by have h1 := Bandits.probReal_sumRewards_le_sumRewards_le h a (K * m) m m have h2 := probReal_sum_le_sum_streamMeasure hν a m refine le_trans (le_of_eq ?_) (h1.trans h2) simp_rw [measureReal_def] congr 1 refine measure_congr ?_ rw [Filter.eventuallyEq_set] filter_upwards [pullCount_mul h a, pullCount_mul h (bestArm ν)] with ω ha h_best simp [ha, h_best]
prob_arm_mul_eq_le🔗
Bandits.ETC.prob_arm_mul_eq_le
The probability that at time K * m the ETC algorithm chooses arm a is at most
exp(- m * Δ_a^2 / 4).
Bandits.ETC.prob_arm_mul_eq_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hν : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a)) (a : Fin K) (hm : m ≠ 0) : MeasureTheory.Measure.real P {ω | A (K * m) ω = a} ≤ Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2))Bandits.ETC.prob_arm_mul_eq_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hν : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a)) (a : Fin K) (hm : m ≠ 0) : MeasureTheory.Measure.real P {ω | A (K * m) ω = a} ≤ Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2))
Code
lemma prob_arm_mul_eq_le [Nonempty (Fin K)]
(h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
(hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a)) (a : Fin K)
(hm : m ≠ 0) :
P.real {ω | A (K * m) ω = a} ≤ Real.exp (- (m : ℝ) * gap ν a ^ 2 / (4 * σ2))Type uses (4)
Body uses (4)
Used by (1)
Actions: Source · Open Issue
Proof
by
have h_pos : 0 < K * m := Nat.mul_pos hK hm.bot_lt
have h_le : P.real {ω | A (K * m) ω = a}
≤ P.real {ω | sumRewards A R (bestArm ν) (K * m) ω ≤ sumRewards A R a (K * m) ω} := by
simp_rw [measureReal_def]
gcongr 1
· simp
refine measure_mono_ae ?_
exact sumRewards_bestArm_le_of_arm_mul_eq h a hm
exact h_le.trans (probReal_sumRewards_le_sumRewards_le h hν a)
expectation_pullCount_le🔗
Bandits.ETC.expectation_pullCount_leBound on the expectation of the number of pulls of each arm by the ETC algorithm.
Bandits.ETC.expectation_pullCount_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hν : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a)) (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) : ∫ (x : Ω), (fun ω => ↑(Learning.pullCount A a n ω)) x ∂P ≤ ↑m + (↑n - ↑K * ↑m) * Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2))Bandits.ETC.expectation_pullCount_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hν : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a)) (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) : ∫ (x : Ω), (fun ω => ↑(Learning.pullCount A a n ω)) x ∂P ≤ ↑m + (↑n - ↑K * ↑m) * Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2))
Code
lemma expectation_pullCount_le [Nonempty (Fin K)]
(h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
(hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a))
(a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
P[fun ω ↦ (pullCount A a n ω : ℝ)]
≤ m + (n - K * m) * Real.exp (- (m : ℝ) * gap ν a ^ 2 / (4 * σ2))Type uses (5)
Body uses (4)
Used by (1)
Actions: Source · Open Issue
Proof
by
have hA := h.measurable_action
have : (fun ω ↦ (pullCount A a n ω : ℝ))
=ᵐ[P] fun ω ↦ m + (n - K * m) * {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω := by
filter_upwards [pullCount_of_ge h a hm hn] with ω h
simp only [h, Set.indicator_apply, Set.mem_setOf_eq, mul_ite, mul_one, mul_zero, Nat.cast_add,
Nat.cast_ite, CharP.cast_eq_zero, add_right_inj]
norm_cast
rw [integral_congr_ae this, integral_add (integrable_const _), integral_const_mul]
swap
· refine Integrable.const_mul ?_ _
rw [integrable_indicator_iff]
· exact integrableOn_const
· exact (measurableSet_singleton _).preimage (by fun_prop)
simp only [integral_const, probReal_univ, smul_eq_mul, one_mul, neg_mul, add_le_add_iff_left]
gcongr
· norm_cast
simp
rw [integral_indicator_const, smul_eq_mul, mul_one]
· rw [← neg_mul]
exact prob_arm_mul_eq_le h hν a hm
· exact (measurableSet_singleton _).preimage (by fun_prop)
regret_le🔗
Bandits.ETC.regret_leRegret bound for the ETC algorithm.
Bandits.ETC.regret_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hν : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a)) (hm : m ≠ 0) (n : ℕ) (hn : K * m ≤ n) : ∫ (x : Ω), regret ν A n x ∂P ≤ ∑ a, gap ν a * (↑m + (↑n - ↑K * ↑m) * Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2)))Bandits.ETC.regret_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ} [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hν : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a)) (hm : m ≠ 0) (n : ℕ) (hn : K * m ≤ n) : ∫ (x : Ω), regret ν A n x ∂P ≤ ∑ a, gap ν a * (↑m + (↑n - ↑K * ↑m) * Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2)))
Code
theorem regret_le [Nonempty (Fin K)]
(h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
(hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a)) (hm : m ≠ 0)
(n : ℕ) (hn : K * m ≤ n) :
P[regret ν A n] ≤
∑ a, gap ν a * (m + (n - K * m) * Real.exp (- (m : ℝ) * gap ν a ^ 2 / (4 * σ2)))Type uses (5)
Actions: Source · Open Issue
Proof
integral_regret_le_of_forall_integral_pullCount_le h
(fun a _ ↦ expectation_pullCount_le h hν a hm hn)-
Bandits.ETC.nextArm -
Bandits.ETC.measurable_nextArm -
Bandits.etcAlgorithm -
Bandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm -
Bandits.ETC.arm_zero -
Bandits.ETC.arm_ae_eq_etcNextArm -
Bandits.ETC.arm_of_lt -
Bandits.ETC.arm_mul -
Bandits.ETC.arm_add_one_of_ge -
Bandits.ETC.arm_of_ge -
Bandits.ETC.pullCount_mul -
Bandits.ETC.pullCount_add_one_of_ge -
Bandits.ETC.pullCount_of_ge -
Bandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq -
Bandits.ETC.probReal_sumRewards_le_sumRewards_le -
Bandits.ETC.prob_arm_mul_eq_le -
Bandits.ETC.expectation_pullCount_le -
Bandits.ETC.regret_le