LeanMachineLearning exposition

2.1. Online.Bandit.Algorithms.ETC🔗

The Explore-Then-Commit Algorithm

Module LeanMachineLearning.Online.Bandit.Algorithms.ETC contains 18 exposed declarations.

nextArm🔗

DefinitionBandits.ETC.nextArm

Arm pulled by the ETC algorithm at time n + 1. For n < K * m - 1, this is arm (n + 1) % K. For n = K * m - 1, this is the arm with the highest empirical mean after the exploration phase. For n ≥ K * m, this is the same arm as at time n.

🔗def
Bandits.ETC.nextArm {K : } (hK : 0 < K) (m n : ) (h : (Finset.Iic n) Fin K × ) : Fin K
Bandits.ETC.nextArm {K : } (hK : 0 < K) (m n : ) (h : (Finset.Iic n) Fin K × ) : Fin K

Code

noncomputable
def ETC.nextArm (hK : 0 < K) (m n : ℕ) (h : Iic n → Fin K × ℝ) : Fin K :=
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  if hn : n < K * m - 1 then RoundRobin.nextAction hK n
  else
    if hn_eq : n = K * m - 1 then argmax (empMean' n h)
    else (h ⟨n, by simp⟩).1
Body uses (3)
Used by (6)

Actions: Source · Open Issue

measurable_nextArm🔗

LemmaBandits.ETC.measurable_nextArm

The next arm pulled by ETC is chosen in a measurable way.

🔗theorem
Bandits.ETC.measurable_nextArm {K : } (hK : 0 < K) (m n : ) : Measurable (nextArm hK m n)
Bandits.ETC.measurable_nextArm {K : } (hK : 0 < K) (m n : ) : Measurable (nextArm hK m n)

Code

lemma ETC.measurable_nextArm (hK : 0 < K) (m n : ℕ) : Measurable (nextArm hK m n)
Type uses (1)
Body uses (6)
Used by (3)

Actions: Source · Open Issue

Proof
by
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  unfold nextArm
  simp only [dite_eq_ite]
  refine Measurable.ite (by simp) (by fun_prop) ?_
  refine Measurable.ite (by simp) ?_ (by fun_prop)
  fun_prop

etcAlgorithm🔗

DefinitionBandits.etcAlgorithm

The Explore-Then-Commit algorithm: deterministic algorithm that chooses the next arm according to ETC.nextArm.

🔗def
Bandits.etcAlgorithm {K : } (hK : 0 < K) (m : ) : Learning.Algorithm (Fin K)
Bandits.etcAlgorithm {K : } (hK : 0 < K) (m : ) : Learning.Algorithm (Fin K)

Code

noncomputable
def etcAlgorithm (hK : 0 < K) (m : ℕ) : Algorithm (Fin K) ℝ :=
  detAlgorithm (ETC.nextArm hK m) (by fun_prop) ⟨0, hK⟩
Type uses (1)
Body uses (3)
Used by (15)

Actions: Source · Open Issue

isAlgEnvSeqUntil_roundRobinAlgorithm🔗

LemmaBandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm

Until round K * m - 1, the ETC algorithm behaves like the Round-Robin algorithm.

🔗theorem
Bandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) : Learning.IsAlgEnvSeqUntil A R (Learning.roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (K * m - 1)
Bandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) : Learning.IsAlgEnvSeqUntil A R (Learning.roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (K * m - 1)

Code

lemma isAlgEnvSeqUntil_roundRobinAlgorithm
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) :
    IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K * m - 1) where
  measurable_action
Type uses (5)
Body uses (9)
Used by (3)

Actions: Source · Open Issue

Proof
h.measurable_action
  measurable_feedback := h.measurable_feedback
  hasLaw_action_zero := h.hasLaw_action_zero
  hasCondDistrib_feedback_zero := h.hasCondDistrib_feedback_zero
  hasCondDistrib_action n hn := by
    convert h.hasCondDistrib_action n using 1
    simp only [roundRobinAlgorithm, detAlgorithm_policy, etcAlgorithm]
    congr 1 with h
    simp [ETC.nextArm, hn]
  hasCondDistrib_feedback n _ := h.hasCondDistrib_feedback n

arm_zero🔗

LemmaBandits.ETC.arm_zero

No docstring.

🔗theorem
Bandits.ETC.arm_zero.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) : A 0 =ᵐ[P] fun x => 0, hK
Bandits.ETC.arm_zero.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) : A 0 =ᵐ[P] fun x => 0, hK

Code

lemma arm_zero (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) :
    A 0 =ᵐ[P] fun _ ↦ ⟨0, hK⟩
Type uses (3)
Body uses (4)

Actions: Source · Open Issue

Proof
RoundRobin.action_zero ((isAlgEnvSeqUntil_roundRobinAlgorithm h).mono zero_le)

arm_ae_eq_etcNextArm🔗

LemmaBandits.ETC.arm_ae_eq_etcNextArm

No docstring.

🔗theorem
Bandits.ETC.arm_ae_eq_etcNextArm.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (n : ) : A (n + 1) =ᵐ[P] fun ω => nextArm hK m n (Learning.history A R n ω)
Bandits.ETC.arm_ae_eq_etcNextArm.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (n : ) : A (n + 1) =ᵐ[P] fun ω => nextArm hK m n (Learning.history A R n ω)

Code

lemma arm_ae_eq_etcNextArm (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (n : ℕ) :
    A (n + 1) =ᵐ[P] fun ω ↦ nextArm hK m n (history A R n ω)
Type uses (5)
Body uses (3)
Used by (2)

Actions: Source · Open Issue

Proof
by
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  exact h.action_detAlgorithm_ae_eq n

arm_of_lt🔗

LemmaBandits.ETC.arm_of_lt

For n < K * m, the arm pulled at time n is the arm n % K.

🔗theorem
Bandits.ETC.arm_of_lt.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : } (hn : n < K * m) : A n =ᵐ[P] fun x => n % K,
Bandits.ETC.arm_of_lt.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : } (hn : n < K * m) : A n =ᵐ[P] fun x => n % K,

Code

lemma arm_of_lt (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    {n : ℕ} (hn : n < K * m) :
    A n =ᵐ[P] fun _ ↦ ⟨n % K, Nat.mod_lt _ hK⟩
Type uses (3)
Body uses (4)

Actions: Source · Open Issue

Proof
RoundRobin.action_ae_eq n ((isAlgEnvSeqUntil_roundRobinAlgorithm h).mono (by grind))

arm_mul🔗

LemmaBandits.ETC.arm_mul

The arm pulled at time K * m is the arm with the highest empirical mean after the exploration phase.

🔗theorem
Bandits.ETC.arm_mul.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hm : m 0) : A (K * m) =ᵐ[P] fun ω => argmax (Learning.empMean' (K * m - 1) (Learning.history A R (K * m - 1) ω))
Bandits.ETC.arm_mul.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (hm : m 0) : A (K * m) =ᵐ[P] fun ω => argmax (Learning.empMean' (K * m - 1) (Learning.history A R (K * m - 1) ω))

Code

lemma arm_mul [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (hm : m ≠ 0) :
    A (K * m) =ᵐ[P]
      fun ω ↦ argmax (empMean' (K * m - 1) (history A R (K * m - 1) ω))
Type uses (6)
Body uses (3)
Used by (1)

Actions: Source · Open Issue

Proof
by
  have : K * m = (K * m - 1) + 1 := by
    have : 0 < K * m := Nat.mul_pos hK hm.bot_lt
    grind
  rw [this]
  filter_upwards [arm_ae_eq_etcNextArm h (K * m - 1)] with ω hn_eq
  rw [hn_eq, nextArm, dif_neg (by simp), dif_pos rfl]
  exact this ▸ rfl

arm_add_one_of_ge🔗

LemmaBandits.ETC.arm_add_one_of_ge

For n ≥ K * m, the arm pulled at time n + 1 is the same as the arm pulled at time n.

🔗theorem
Bandits.ETC.arm_add_one_of_ge.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : } (hm : m 0) (hn : K * m n) : A (n + 1) =ᵐ[P] fun ω => A n ω
Bandits.ETC.arm_add_one_of_ge.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : } (hm : m 0) (hn : K * m n) : A (n + 1) =ᵐ[P] fun ω => A n ω

Code

lemma arm_add_one_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) :
    A (n + 1) =ᵐ[P] fun ω ↦ A n ω
Type uses (3)
Body uses (6)
Used by (1)

Actions: Source · Open Issue

Proof
by
  filter_upwards [arm_ae_eq_etcNextArm h n] with ω hn_eq
  rw [hn_eq, nextArm, dif_neg (by grind), dif_neg]
  · rfl
  · have : 0 < K * m := Nat.mul_pos hK hm.bot_lt
    grind

arm_of_ge🔗

LemmaBandits.ETC.arm_of_ge

For n ≥ K * m, the arm pulled at time n is the same as the arm pulled at time K * m.

🔗theorem
Bandits.ETC.arm_of_ge.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : } (hm : m 0) (hn : K * m n) : A n =ᵐ[P] A (K * m)
Bandits.ETC.arm_of_ge.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) {n : } (hm : m 0) (hn : K * m n) : A n =ᵐ[P] A (K * m)

Code

lemma arm_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) :
    A n =ᵐ[P] A (K * m)
Type uses (3)
Body uses (1)
Used by (1)

Actions: Source · Open Issue

Proof
by
  have h_ae n : K * m ≤ n → A (n + 1) =ᵐ[P] fun ω ↦ A n ω := arm_add_one_of_ge h hm
  simp_rw [Filter.EventuallyEq, ← ae_all_iff] at h_ae
  filter_upwards [h_ae] with ω h_ae
  induction n, hn using Nat.le_induction with
  | base => rfl
  | succ n hmn h_ind => rw [h_ae n hmn, h_ind]

pullCount_mul🔗

LemmaBandits.ETC.pullCount_mul

At time K * m, the number of pulls of each arm is equal to m.

🔗theorem
Bandits.ETC.pullCount_mul.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) : Learning.pullCount A a (K * m) =ᵐ[P] fun x => m
Bandits.ETC.pullCount_mul.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) : Learning.pullCount A a (K * m) =ᵐ[P] fun x => m

Code

lemma pullCount_mul (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (a : Fin K) :
    pullCount A a (K * m) =ᵐ[P] fun _ ↦ m
Type uses (4)
Body uses (2)
Used by (3)

Actions: Source · Open Issue

Proof
RoundRobin.pullCount_mul m (isAlgEnvSeqUntil_roundRobinAlgorithm h) a

pullCount_add_one_of_ge🔗

LemmaBandits.ETC.pullCount_add_one_of_ge

No docstring.

🔗theorem
Bandits.ETC.pullCount_add_one_of_ge.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m 0) {n : } (hn : K * m n) : Learning.pullCount A a (n + 1) =ᵐ[P] fun ω => Learning.pullCount A a n ω + Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ω
Bandits.ETC.pullCount_add_one_of_ge.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m 0) {n : } (hn : K * m n) : Learning.pullCount A a (n + 1) =ᵐ[P] fun ω => Learning.pullCount A a n ω + Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ω

Code

lemma pullCount_add_one_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
    pullCount A a (n + 1)
      =ᵐ[P] fun ω ↦ pullCount A a n ω + {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω
Type uses (4)
Body uses (2)
Used by (1)

Actions: Source · Open Issue

Proof
by
  simp_rw [Filter.EventuallyEq, pullCount_add_one]
  filter_upwards [arm_of_ge h hm hn] with ω h_arm
  congr 3

pullCount_of_ge🔗

LemmaBandits.ETC.pullCount_of_ge

For n ≥ K * m, the number of pulls of each arm a at time n is equal to m plus n - K * m if arm a is the best arm after the exploration phase.

🔗theorem
Bandits.ETC.pullCount_of_ge.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m 0) {n : } (hn : K * m n) : Learning.pullCount A a n =ᵐ[P] fun ω => m + (n - K * m) * Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ω
Bandits.ETC.pullCount_of_ge.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m 0) {n : } (hn : K * m n) : Learning.pullCount A a n =ᵐ[P] fun ω => m + (n - K * m) * Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ω

Code

lemma pullCount_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
    pullCount A a n
      =ᵐ[P] fun ω ↦ m + (n - K * m) * {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω
Type uses (4)
Body uses (2)
Used by (1)

Actions: Source · Open Issue

Proof
by
  have h_ae n : K * m ≤ n → pullCount A a (n + 1)
      =ᵐ[P] fun ω ↦ pullCount A a n ω + {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω :=
    pullCount_add_one_of_ge h a hm
  simp_rw [Filter.EventuallyEq, ← ae_all_iff] at h_ae
  have h_ae_Km : pullCount A a (K * m) =ᵐ[P] fun _ ↦ m := pullCount_mul h a
  filter_upwards [h_ae_Km, h_ae] with ω h_Km h_ae
  induction n, hn using Nat.le_induction with
  | base => simp [h_Km]
  | succ n hmn h_ind =>
    rw [h_ae n hmn, h_ind, add_assoc, ← add_one_mul]
    congr
    grind

sumRewards_bestArm_le_of_arm_mul_eq🔗

LemmaBandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq

If at time K * m the algorithm chooses arm a, then the total reward obtained by pulling arm a is at least the total reward obtained by pulling the best arm.

🔗theorem
Bandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m 0) : ∀ᵐ (h : Ω) P, A (K * m) h = a Learning.sumRewards A R (bestArm ν) (K * m) h Learning.sumRewards A R a (K * m) h
Bandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) (a : Fin K) (hm : m 0) : ∀ᵐ (h : Ω) P, A (K * m) h = a Learning.sumRewards A R (bestArm ν) (K * m) h Learning.sumRewards A R a (K * m) h

Code

lemma sumRewards_bestArm_le_of_arm_mul_eq [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (a : Fin K) (hm : m ≠ 0) :
    ∀ᵐ h ∂P, A (K * m) h = a → sumRewards A R (bestArm ν) (K * m) h ≤
      sumRewards A R a (K * m) h
Type uses (5)
Body uses (10)
Used by (1)

Actions: Source · Open Issue

Proof
by
  filter_upwards [arm_mul h hm, pullCount_mul h a, pullCount_mul h (bestArm ν)]
    with h h_arm ha h_best h_eq
  have h_max := isMaxOn_argmax
    (empMean' (K * m - 1) (history A R (K * m - 1) h)) (bestArm ν)
  rw [← h_arm, h_eq] at h_max
  rw [sumRewards_eq_pullCount_mul_empMean, sumRewards_eq_pullCount_mul_empMean, ha, h_best]
  · gcongr
    have : 0 < K * m := Nat.mul_pos hK hm.bot_lt
    rwa [empMean_eq_empMean' this.ne', empMean_eq_empMean' this.ne']
  · simp [ha, hm]
  · simp [h_best, hm]

probReal_sumRewards_le_sumRewards_le🔗

LemmaBandits.ETC.probReal_sumRewards_le_sumRewards_le

No docstring.

🔗theorem
Bandits.ETC.probReal_sumRewards_le_sumRewards_le.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) ( : (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - (x : ), id x ν a) σ2 (ν a)) (a : Fin K) : MeasureTheory.Measure.real P {ω | Learning.sumRewards A R (bestArm ν) (K * m) ω Learning.sumRewards A R a (K * m) ω} Real.exp (-m * gap ν a ^ 2 / (4 * σ2))
Bandits.ETC.probReal_sumRewards_le_sumRewards_le.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) ( : (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - (x : ), id x ν a) σ2 (ν a)) (a : Fin K) : MeasureTheory.Measure.real P {ω | Learning.sumRewards A R (bestArm ν) (K * m) ω Learning.sumRewards A R a (K * m) ω} Real.exp (-m * gap ν a ^ 2 / (4 * σ2))

Code

lemma probReal_sumRewards_le_sumRewards_le [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a)) (a : Fin K) :
    P.real {ω | sumRewards A R (bestArm ν) (K * m) ω ≤ sumRewards A R a (K * m) ω} ≤
      Real.exp (-↑m * gap ν a ^ 2 / (4 * σ2))
Type uses (6)
Body uses (5)
Used by (1)

Actions: Source · Open Issue

Proof
by
  have h1 := Bandits.probReal_sumRewards_le_sumRewards_le h a (K * m) m m
  have h2 := probReal_sum_le_sum_streamMeasure hν a m
  refine le_trans (le_of_eq ?_) (h1.trans h2)
  simp_rw [measureReal_def]
  congr 1
  refine measure_congr ?_
  rw [Filter.eventuallyEq_set]
  filter_upwards [pullCount_mul h a, pullCount_mul h (bestArm ν)] with ω ha h_best
  simp [ha, h_best]

prob_arm_mul_eq_le🔗

LemmaBandits.ETC.prob_arm_mul_eq_le

The probability that at time K * m the ETC algorithm chooses arm a is at most exp(- m * Δ_a^2 / 4).

🔗theorem
Bandits.ETC.prob_arm_mul_eq_le.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) ( : (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - (x : ), id x ν a) σ2 (ν a)) (a : Fin K) (hm : m 0) : MeasureTheory.Measure.real P {ω | A (K * m) ω = a} Real.exp (-m * gap ν a ^ 2 / (4 * σ2))
Bandits.ETC.prob_arm_mul_eq_le.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) ( : (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - (x : ), id x ν a) σ2 (ν a)) (a : Fin K) (hm : m 0) : MeasureTheory.Measure.real P {ω | A (K * m) ω = a} Real.exp (-m * gap ν a ^ 2 / (4 * σ2))

Code

lemma prob_arm_mul_eq_le [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a)) (a : Fin K)
    (hm : m ≠ 0) :
    P.real {ω | A (K * m) ω = a} ≤ Real.exp (- (m : ℝ) * gap ν a ^ 2 / (4 * σ2))
Type uses (4)
Body uses (4)
Used by (1)

Actions: Source · Open Issue

Proof
by
  have h_pos : 0 < K * m := Nat.mul_pos hK hm.bot_lt
  have h_le : P.real {ω | A (K * m) ω = a}
      ≤ P.real {ω | sumRewards A R (bestArm ν) (K * m) ω ≤ sumRewards A R a (K * m) ω} := by
    simp_rw [measureReal_def]
    gcongr 1
    · simp
    refine measure_mono_ae ?_
    exact sumRewards_bestArm_le_of_arm_mul_eq h a hm
  exact h_le.trans (probReal_sumRewards_le_sumRewards_le h hν a)

expectation_pullCount_le🔗

LemmaBandits.ETC.expectation_pullCount_le

Bound on the expectation of the number of pulls of each arm by the ETC algorithm.

🔗theorem
Bandits.ETC.expectation_pullCount_le.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) ( : (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - (x : ), id x ν a) σ2 (ν a)) (a : Fin K) (hm : m 0) {n : } (hn : K * m n) : (x : Ω), (fun ω => (Learning.pullCount A a n ω)) x P m + (n - K * m) * Real.exp (-m * gap ν a ^ 2 / (4 * σ2))
Bandits.ETC.expectation_pullCount_le.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) ( : (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - (x : ), id x ν a) σ2 (ν a)) (a : Fin K) (hm : m 0) {n : } (hn : K * m n) : (x : Ω), (fun ω => (Learning.pullCount A a n ω)) x P m + (n - K * m) * Real.exp (-m * gap ν a ^ 2 / (4 * σ2))

Code

lemma expectation_pullCount_le [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a))
    (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
    P[fun ω ↦ (pullCount A a n ω : ℝ)]
      ≤ m + (n - K * m) * Real.exp (- (m : ℝ) * gap ν a ^ 2 / (4 * σ2))
Type uses (5)
Body uses (4)
Used by (1)

Actions: Source · Open Issue

Proof
by
  have hA := h.measurable_action
  have : (fun ω ↦ (pullCount A a n ω : ℝ))
      =ᵐ[P] fun ω ↦ m + (n - K * m) * {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω := by
    filter_upwards [pullCount_of_ge h a hm hn] with ω h
    simp only [h, Set.indicator_apply, Set.mem_setOf_eq, mul_ite, mul_one, mul_zero, Nat.cast_add,
      Nat.cast_ite, CharP.cast_eq_zero, add_right_inj]
    norm_cast
  rw [integral_congr_ae this, integral_add (integrable_const _), integral_const_mul]
  swap
  · refine Integrable.const_mul ?_ _
    rw [integrable_indicator_iff]
    · exact integrableOn_const
    · exact (measurableSet_singleton _).preimage (by fun_prop)
  simp only [integral_const, probReal_univ, smul_eq_mul, one_mul, neg_mul, add_le_add_iff_left]
  gcongr
  · norm_cast
    simp
  rw [integral_indicator_const, smul_eq_mul, mul_one]
  · rw [← neg_mul]
    exact prob_arm_mul_eq_le h hν a hm
  · exact (measurableSet_singleton _).preimage (by fun_prop)

regret_le🔗

TheoremBandits.ETC.regret_le

Regret bound for the ETC algorithm.

🔗theorem
Bandits.ETC.regret_le.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) ( : (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - (x : ), id x ν a) σ2 (ν a)) (hm : m 0) (n : ) (hn : K * m n) : (x : Ω), regret ν A n x P a, gap ν a * (m + (n - K * m) * Real.exp (-m * gap ν a ^ 2 / (4 * σ2)))
Bandits.ETC.regret_le.{u_1} {K : } {hK : 0 < K} {m : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ω Fin K} {R : Ω } {σ2 : NNReal} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (etcAlgorithm hK m) (Learning.stationaryEnv ν) P) ( : (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - (x : ), id x ν a) σ2 (ν a)) (hm : m 0) (n : ) (hn : K * m n) : (x : Ω), regret ν A n x P a, gap ν a * (m + (n - K * m) * Real.exp (-m * gap ν a ^ 2 / (4 * σ2)))

Code

theorem regret_le [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a)) (hm : m ≠ 0)
    (n : ℕ) (hn : K * m ≤ n) :
    P[regret ν A n] ≤
      ∑ a, gap ν a * (m + (n - K * m) * Real.exp (- (m : ℝ) * gap ν a ^ 2 / (4 * σ2)))
Type uses (5)
Body uses (2)

Actions: Source · Open Issue

Proof
integral_regret_le_of_forall_integral_pullCount_le h
    (fun a _ ↦ expectation_pullCount_le h hν a hm hn)
  1. Bandits.ETC.nextArm
  2. Bandits.ETC.measurable_nextArm
  3. Bandits.etcAlgorithm
  4. Bandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm
  5. Bandits.ETC.arm_zero
  6. Bandits.ETC.arm_ae_eq_etcNextArm
  7. Bandits.ETC.arm_of_lt
  8. Bandits.ETC.arm_mul
  9. Bandits.ETC.arm_add_one_of_ge
  10. Bandits.ETC.arm_of_ge
  11. Bandits.ETC.pullCount_mul
  12. Bandits.ETC.pullCount_add_one_of_ge
  13. Bandits.ETC.pullCount_of_ge
  14. Bandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq
  15. Bandits.ETC.probReal_sumRewards_le_sumRewards_le
  16. Bandits.ETC.prob_arm_mul_eq_le
  17. Bandits.ETC.expectation_pullCount_le
  18. Bandits.ETC.regret_le