Online.Bandit.Algorithms.ETC

`nextArm`🔗

DefinitionBandits.ETC.nextArm

Details

Arm pulled by the ETC algorithm at time n + 1. For n < K * m - 1, this is arm (n + 1) % K. For n = K * m - 1, this is the arm with the highest empirical mean after the exploration phase. For n ≥ K * m, this is the same arm as at time n.

def

Bandits.ETC.nextArm {K : ℕ} (hK : 0 < K) (m n : ℕ)
  (h : ↥(Finset.Iic n) → Fin K × ℝ) : Fin K
Bandits.ETC.nextArm {K : ℕ} (hK : 0 < K)
  (m n : ℕ)
  (h : ↥(Finset.Iic n) → Fin K × ℝ) :
  Fin K

Code

noncomputable
def ETC.nextArm (hK : 0 < K) (m n : ℕ) (h : Iic n → Fin K × ℝ) : Fin K :=
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  if hn : n < K * m - 1 then RoundRobin.nextAction hK n
  else
    if hn_eq : n = K * m - 1 then argmax (empMean' n h)
    else (h ⟨n, by simp⟩).1

Body uses (3)

Used by (6)

Actions: Source · Open Issue

`measurable_nextArm`🔗

LemmaBandits.ETC.measurable_nextArm

Details

The next arm pulled by ETC is chosen in a measurable way.

theorem

Bandits.ETC.measurable_nextArm {K : ℕ} (hK : 0 < K) (m n : ℕ) :
  Measurable (nextArm hK m n)
Bandits.ETC.measurable_nextArm {K : ℕ}
  (hK : 0 < K) (m n : ℕ) :
  Measurable (nextArm hK m n)

Code

lemma ETC.measurable_nextArm (hK : 0 < K) (m n : ℕ) : Measurable (nextArm hK m n)

Type uses (1)

nextArm

Body uses (6)

Used by (3)

Actions: Source · Open Issue

Proof

by
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  unfold nextArm
  simp only [dite_eq_ite]
  refine Measurable.ite (by simp) (by fun_prop) ?_
  refine Measurable.ite (by simp) ?_ (by fun_prop)
  fun_prop

`etcAlgorithm`🔗

DefinitionBandits.etcAlgorithm

Details

The Explore-Then-Commit algorithm: deterministic algorithm that chooses the next arm according to ETC.nextArm.

def

Bandits.etcAlgorithm {K : ℕ} (hK : 0 < K) (m : ℕ) :
  Learning.Algorithm (Fin K) ℝ
Bandits.etcAlgorithm {K : ℕ} (hK : 0 < K)
  (m : ℕ) : Learning.Algorithm (Fin K) ℝ

Code

noncomputable
def etcAlgorithm (hK : 0 < K) (m : ℕ) : Algorithm (Fin K) ℝ :=
  detAlgorithm (ETC.nextArm hK m) (by fun_prop) ⟨0, hK⟩

Type uses (1)

Algorithm

Body uses (3)

Used by (15)

Actions: Source · Open Issue

`isAlgEnvSeqUntil_roundRobinAlgorithm`🔗

LemmaBandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm

Details

Until round K * m - 1, the ETC algorithm behaves like the Round-Robin algorithm.

theorem

Bandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P) :
  Learning.IsAlgEnvSeqUntil A R (Learning.roundRobinAlgorithm hK)
    (Learning.stationaryEnv ν) P (K * m - 1)
Bandits.ETC.isAlgEnvSeqUntil_roundRobinAlgorithm.{u_1}
  {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P) :
  Learning.IsAlgEnvSeqUntil A R
    (Learning.roundRobinAlgorithm hK)
    (Learning.stationaryEnv ν) P
    (K * m - 1)

Code

lemma isAlgEnvSeqUntil_roundRobinAlgorithm
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) :
    IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K * m - 1) where
  measurable_action

Type uses (5)

Body uses (9)

Used by (3)

Actions: Source · Open Issue

Proof

h.measurable_action
  measurable_feedback := h.measurable_feedback
  hasLaw_action_zero := h.hasLaw_action_zero
  hasCondDistrib_feedback_zero := h.hasCondDistrib_feedback_zero
  hasCondDistrib_action n hn := by
    convert h.hasCondDistrib_action n using 1
    simp only [roundRobinAlgorithm, detAlgorithm_policy, etcAlgorithm]
    congr 1 with h
    simp [ETC.nextArm, hn]
  hasCondDistrib_feedback n _ := h.hasCondDistrib_feedback n

`arm_zero`🔗

LemmaBandits.ETC.arm_zero

Details

No docstring.

theorem

Bandits.ETC.arm_zero.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P) :
  A 0 =ᵐ[P] fun x => ⟨0, hK⟩
Bandits.ETC.arm_zero.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P) :
  A 0 =ᵐ[P] fun x => ⟨0, hK⟩

Code

lemma arm_zero (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) :
    A 0 =ᵐ[P] fun _ ↦ ⟨0, hK⟩

Type uses (3)

Body uses (4)

Actions: Source · Open Issue

Proof

RoundRobin.action_zero ((isAlgEnvSeqUntil_roundRobinAlgorithm h).mono zero_le)

`arm_ae_eq_etcNextArm`🔗

LemmaBandits.ETC.arm_ae_eq_etcNextArm

Details

No docstring.

theorem

Bandits.ETC.arm_ae_eq_etcNextArm.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (n : ℕ) :
  A (n + 1) =ᵐ[P] fun ω => nextArm hK m n (Learning.history A R n ω)
Bandits.ETC.arm_ae_eq_etcNextArm.{u_1}
  {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (n : ℕ) :
  A (n + 1) =ᵐ[P] fun ω =>
    nextArm hK m n
      (Learning.history A R n ω)

Code

lemma arm_ae_eq_etcNextArm (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (n : ℕ) :
    A (n + 1) =ᵐ[P] fun ω ↦ nextArm hK m n (history A R n ω)

Type uses (5)

Body uses (3)

Used by (2)

Actions: Source · Open Issue

Proof

by
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  exact h.action_detAlgorithm_ae_eq n

`arm_of_lt`🔗

LemmaBandits.ETC.arm_of_lt

Details

For n < K * m, the arm pulled at time n is the arm n % K.

theorem

Bandits.ETC.arm_of_lt.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  {n : ℕ} (hn : n < K * m) : A n =ᵐ[P] fun x => ⟨n % K, ⋯⟩
Bandits.ETC.arm_of_lt.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  {n : ℕ} (hn : n < K * m) :
  A n =ᵐ[P] fun x => ⟨n % K, ⋯⟩

Code

lemma arm_of_lt (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    {n : ℕ} (hn : n < K * m) :
    A n =ᵐ[P] fun _ ↦ ⟨n % K, Nat.mod_lt _ hK⟩

Type uses (3)

Body uses (4)

Actions: Source · Open Issue

Proof

RoundRobin.action_ae_eq n ((isAlgEnvSeqUntil_roundRobinAlgorithm h).mono (by grind))

`arm_mul`🔗

LemmaBandits.ETC.arm_mul

Details

The arm pulled at time K * m is the arm with the highest empirical mean after the exploration phase.

theorem

Bandits.ETC.arm_mul.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hm : m ≠ 0) :
  A (K * m) =ᵐ[P] fun ω =>
    argmax
      (Learning.empMean' (K * m - 1)
        (Learning.history A R (K * m - 1) ω))
Bandits.ETC.arm_mul.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hm : m ≠ 0) :
  A (K * m) =ᵐ[P] fun ω =>
    argmax
      (Learning.empMean' (K * m - 1)
        (Learning.history A R (K * m - 1)
          ω))

Code

lemma arm_mul [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (hm : m ≠ 0) :
    A (K * m) =ᵐ[P]
      fun ω ↦ argmax (empMean' (K * m - 1) (history A R (K * m - 1) ω))

Type uses (6)

Body uses (3)

Used by (1)

sumRewards_bestArm_le_of_arm_mul_eq

Actions: Source · Open Issue

Proof

by
  have : K * m = (K * m - 1) + 1 := by
    have : 0 < K * m := Nat.mul_pos hK hm.bot_lt
    grind
  rw [this]
  filter_upwards [arm_ae_eq_etcNextArm h (K * m - 1)] with ω hn_eq
  rw [hn_eq, nextArm, dif_neg (by simp), dif_pos rfl]
  exact this ▸ rfl

`arm_add_one_of_ge`🔗

LemmaBandits.ETC.arm_add_one_of_ge

Details

For n ≥ K * m, the arm pulled at time n + 1 is the same as the arm pulled at time n.

theorem

Bandits.ETC.arm_add_one_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) : A (n + 1) =ᵐ[P] fun ω => A n ω
Bandits.ETC.arm_add_one_of_ge.{u_1}
  {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) :
  A (n + 1) =ᵐ[P] fun ω => A n ω

Code

lemma arm_add_one_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) :
    A (n + 1) =ᵐ[P] fun ω ↦ A n ω

Type uses (3)

Body uses (6)

Used by (1)

arm_of_ge

Actions: Source · Open Issue

Proof

by
  filter_upwards [arm_ae_eq_etcNextArm h n] with ω hn_eq
  rw [hn_eq, nextArm, dif_neg (by grind), dif_neg]
  · rfl
  · have : 0 < K * m := Nat.mul_pos hK hm.bot_lt
    grind

`arm_of_ge`🔗

LemmaBandits.ETC.arm_of_ge

Details

For n ≥ K * m, the arm pulled at time n is the same as the arm pulled at time K * m.

theorem

Bandits.ETC.arm_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) : A n =ᵐ[P] A (K * m)
Bandits.ETC.arm_of_ge.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) :
  A n =ᵐ[P] A (K * m)

Code

lemma arm_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    {n : ℕ} (hm : m ≠ 0) (hn : K * m ≤ n) :
    A n =ᵐ[P] A (K * m)

Type uses (3)

Body uses (1)

arm_add_one_of_ge

Used by (1)

pullCount_add_one_of_ge

Actions: Source · Open Issue

Proof

by
  have h_ae n : K * m ≤ n → A (n + 1) =ᵐ[P] fun ω ↦ A n ω := arm_add_one_of_ge h hm
  simp_rw [Filter.EventuallyEq, ← ae_all_iff] at h_ae
  filter_upwards [h_ae] with ω h_ae
  induction n, hn using Nat.le_induction with
  | base => rfl
  | succ n hmn h_ind => rw [h_ae n hmn, h_ind]

`pullCount_mul`🔗

LemmaBandits.ETC.pullCount_mul

Details

At time K * m, the number of pulls of each arm is equal to m.

theorem

Bandits.ETC.pullCount_mul.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (a : Fin K) : Learning.pullCount A a (K * m) =ᵐ[P] fun x => m
Bandits.ETC.pullCount_mul.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (a : Fin K) :
  Learning.pullCount A a (K * m) =ᵐ[P]
    fun x => m

Code

lemma pullCount_mul (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (a : Fin K) :
    pullCount A a (K * m) =ᵐ[P] fun _ ↦ m

Type uses (4)

Body uses (2)

Used by (3)

Actions: Source · Open Issue

Proof

RoundRobin.pullCount_mul m (isAlgEnvSeqUntil_roundRobinAlgorithm h) a

`pullCount_add_one_of_ge`🔗

LemmaBandits.ETC.pullCount_add_one_of_ge

Details

No docstring.

theorem

Bandits.ETC.pullCount_add_one_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
  Learning.pullCount A a (n + 1) =ᵐ[P] fun ω =>
    Learning.pullCount A a n ω +
      Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ω
Bandits.ETC.pullCount_add_one_of_ge.{u_1}
  {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (a : Fin K) (hm : m ≠ 0) {n : ℕ}
  (hn : K * m ≤ n) :
  Learning.pullCount A a (n + 1) =ᵐ[P]
    fun ω =>
    Learning.pullCount A a n ω +
      Set.indicator
        {ω' | A (K * m) ω' = a}
        (fun x => 1) ω

Code

lemma pullCount_add_one_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
    pullCount A a (n + 1)
      =ᵐ[P] fun ω ↦ pullCount A a n ω + {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω

Type uses (4)

Body uses (2)

Used by (1)

pullCount_of_ge

Actions: Source · Open Issue

Proof

by
  simp_rw [Filter.EventuallyEq, pullCount_add_one]
  filter_upwards [arm_of_ge h hm hn] with ω h_arm
  congr 3

`pullCount_of_ge`🔗

LemmaBandits.ETC.pullCount_of_ge

Details

For n ≥ K * m, the number of pulls of each arm a at time n is equal to m plus n - K * m if arm a is the best arm after the exploration phase.

theorem

Bandits.ETC.pullCount_of_ge.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
  Learning.pullCount A a n =ᵐ[P] fun ω =>
    m +
      (n - K * m) * Set.indicator {ω' | A (K * m) ω' = a} (fun x => 1) ω
Bandits.ETC.pullCount_of_ge.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (a : Fin K) (hm : m ≠ 0) {n : ℕ}
  (hn : K * m ≤ n) :
  Learning.pullCount A a n =ᵐ[P] fun ω =>
    m +
      (n - K * m) *
        Set.indicator
          {ω' | A (K * m) ω' = a}
          (fun x => 1) ω

Code

lemma pullCount_of_ge (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
    pullCount A a n
      =ᵐ[P] fun ω ↦ m + (n - K * m) * {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω

Type uses (4)

Body uses (2)

Used by (1)

expectation_pullCount_le

Actions: Source · Open Issue

Proof

by
  have h_ae n : K * m ≤ n → pullCount A a (n + 1)
      =ᵐ[P] fun ω ↦ pullCount A a n ω + {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω :=
    pullCount_add_one_of_ge h a hm
  simp_rw [Filter.EventuallyEq, ← ae_all_iff] at h_ae
  have h_ae_Km : pullCount A a (K * m) =ᵐ[P] fun _ ↦ m := pullCount_mul h a
  filter_upwards [h_ae_Km, h_ae] with ω h_Km h_ae
  induction n, hn using Nat.le_induction with
  | base => simp [h_Km]
  | succ n hmn h_ind =>
    rw [h_ae n hmn, h_ind, add_assoc, ← add_one_mul]
    congr
    grind

`sumRewards_bestArm_le_of_arm_mul_eq`🔗

LemmaBandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq

Details

If at time K * m the algorithm chooses arm a, then the total reward obtained by pulling arm a is at least the total reward obtained by pulling the best arm.

theorem

Bandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (a : Fin K) (hm : m ≠ 0) :
  ∀ᵐ (h : Ω) ∂P,
    A (K * m) h = a →
      Learning.sumRewards A R (bestArm ν) (K * m) h ≤
        Learning.sumRewards A R a (K * m) h
Bandits.ETC.sumRewards_bestArm_le_of_arm_mul_eq.{u_1}
  {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (a : Fin K) (hm : m ≠ 0) :
  ∀ᵐ (h : Ω) ∂P,
    A (K * m) h = a →
      Learning.sumRewards A R (bestArm ν)
          (K * m) h ≤
        Learning.sumRewards A R a (K * m)
          h

Code

lemma sumRewards_bestArm_le_of_arm_mul_eq [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P) (a : Fin K) (hm : m ≠ 0) :
    ∀ᵐ h ∂P, A (K * m) h = a → sumRewards A R (bestArm ν) (K * m) h ≤
      sumRewards A R a (K * m) h

Type uses (5)

Body uses (10)

Used by (1)

prob_arm_mul_eq_le

Actions: Source · Open Issue

Proof

by
  filter_upwards [arm_mul h hm, pullCount_mul h a, pullCount_mul h (bestArm ν)]
    with h h_arm ha h_best h_eq
  have h_max := isMaxOn_argmax
    (empMean' (K * m - 1) (history A R (K * m - 1) h)) (bestArm ν)
  rw [← h_arm, h_eq] at h_max
  rw [sumRewards_eq_pullCount_mul_empMean, sumRewards_eq_pullCount_mul_empMean, ha, h_best]
  · gcongr
    have : 0 < K * m := Nat.mul_pos hK hm.bot_lt
    rwa [empMean_eq_empMean' this.ne', empMean_eq_empMean' this.ne']
  · simp [ha, hm]
  · simp [h_best, hm]

`probReal_sumRewards_le_sumRewards_le`🔗

LemmaBandits.ETC.probReal_sumRewards_le_sumRewards_le

Details

No docstring.

theorem

Bandits.ETC.probReal_sumRewards_le_sumRewards_le.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ} {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hν :
    ∀ (a : Fin K),
      ProbabilityTheory.HasSubgaussianMGF
        (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a))
  (a : Fin K) :
  MeasureTheory.Measure.real P
      {ω |
        Learning.sumRewards A R (bestArm ν) (K * m) ω ≤
          Learning.sumRewards A R a (K * m) ω} ≤
    Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2))
Bandits.ETC.probReal_sumRewards_le_sumRewards_le.{u_1}
  {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  {σ2 : NNReal} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hν :
    ∀ (a : Fin K),
      ProbabilityTheory.HasSubgaussianMGF
        (fun x =>
          x - ∫ (x : ℝ), id x ∂ν a)
        σ2 (ν a))
  (a : Fin K) :
  MeasureTheory.Measure.real P
      {ω |
        Learning.sumRewards A R
            (bestArm ν) (K * m) ω ≤
          Learning.sumRewards A R a
            (K * m) ω} ≤
    Real.exp
      (-↑m * gap ν a ^ 2 / (4 * ↑σ2))

Code

lemma probReal_sumRewards_le_sumRewards_le [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a)) (a : Fin K) :
    P.real {ω | sumRewards A R (bestArm ν) (K * m) ω ≤ sumRewards A R a (K * m) ω} ≤
      Real.exp (-↑m * gap ν a ^ 2 / (4 * σ2))

Type uses (6)

Body uses (5)

Used by (1)

prob_arm_mul_eq_le

Actions: Source · Open Issue

Proof

by
  have h1 := Bandits.probReal_sumRewards_le_sumRewards_le h a (K * m) m m
  have h2 := probReal_sum_le_sum_streamMeasure hν a m
  refine le_trans (le_of_eq ?_) (h1.trans h2)
  simp_rw [measureReal_def]
  congr 1
  refine measure_congr ?_
  rw [Filter.eventuallyEq_set]
  filter_upwards [pullCount_mul h a, pullCount_mul h (bestArm ν)] with ω ha h_best
  simp [ha, h_best]

`prob_arm_mul_eq_le`🔗

LemmaBandits.ETC.prob_arm_mul_eq_le

Details

The probability that at time K * m the ETC algorithm chooses arm a is at most exp(- m * Δ_a^2 / 4).

theorem

Bandits.ETC.prob_arm_mul_eq_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hν :
    ∀ (a : Fin K),
      ProbabilityTheory.HasSubgaussianMGF
        (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a))
  (a : Fin K) (hm : m ≠ 0) :
  MeasureTheory.Measure.real P {ω | A (K * m) ω = a} ≤
    Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2))
Bandits.ETC.prob_arm_mul_eq_le.{u_1}
  {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  {σ2 : NNReal} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hν :
    ∀ (a : Fin K),
      ProbabilityTheory.HasSubgaussianMGF
        (fun x =>
          x - ∫ (x : ℝ), id x ∂ν a)
        σ2 (ν a))
  (a : Fin K) (hm : m ≠ 0) :
  MeasureTheory.Measure.real P
      {ω | A (K * m) ω = a} ≤
    Real.exp
      (-↑m * gap ν a ^ 2 / (4 * ↑σ2))

Code

lemma prob_arm_mul_eq_le [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a)) (a : Fin K)
    (hm : m ≠ 0) :
    P.real {ω | A (K * m) ω = a} ≤ Real.exp (- (m : ℝ) * gap ν a ^ 2 / (4 * σ2))

Type uses (4)

Body uses (4)

Used by (1)

expectation_pullCount_le

Actions: Source · Open Issue

Proof

by
  have h_pos : 0 < K * m := Nat.mul_pos hK hm.bot_lt
  have h_le : P.real {ω | A (K * m) ω = a}
      ≤ P.real {ω | sumRewards A R (bestArm ν) (K * m) ω ≤ sumRewards A R a (K * m) ω} := by
    simp_rw [measureReal_def]
    gcongr 1
    · simp
    refine measure_mono_ae ?_
    exact sumRewards_bestArm_le_of_arm_mul_eq h a hm
  exact h_le.trans (probReal_sumRewards_le_sumRewards_le h hν a)

`expectation_pullCount_le`🔗

LemmaBandits.ETC.expectation_pullCount_le

Details

Bound on the expectation of the number of pulls of each arm by the ETC algorithm.

theorem

Bandits.ETC.expectation_pullCount_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hν :
    ∀ (a : Fin K),
      ProbabilityTheory.HasSubgaussianMGF
        (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a))
  (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
  ∫ (x : Ω), (fun ω => ↑(Learning.pullCount A a n ω)) x ∂P ≤
    ↑m + (↑n - ↑K * ↑m) * Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2))
Bandits.ETC.expectation_pullCount_le.{u_1}
  {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  {σ2 : NNReal} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hν :
    ∀ (a : Fin K),
      ProbabilityTheory.HasSubgaussianMGF
        (fun x =>
          x - ∫ (x : ℝ), id x ∂ν a)
        σ2 (ν a))
  (a : Fin K) (hm : m ≠ 0) {n : ℕ}
  (hn : K * m ≤ n) :
  ∫ (x : Ω),
      (fun ω =>
          ↑(Learning.pullCount A a n ω))
        x ∂P ≤
    ↑m +
      (↑n - ↑K * ↑m) *
        Real.exp
          (-↑m * gap ν a ^ 2 / (4 * ↑σ2))

Code

lemma expectation_pullCount_le [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a))
    (a : Fin K) (hm : m ≠ 0) {n : ℕ} (hn : K * m ≤ n) :
    P[fun ω ↦ (pullCount A a n ω : ℝ)]
      ≤ m + (n - K * m) * Real.exp (- (m : ℝ) * gap ν a ^ 2 / (4 * σ2))

Type uses (5)

Body uses (4)

Used by (1)

regret_le

Actions: Source · Open Issue

Proof

by
  have hA := h.measurable_action
  have : (fun ω ↦ (pullCount A a n ω : ℝ))
      =ᵐ[P] fun ω ↦ m + (n - K * m) * {ω' | A (K * m) ω' = a}.indicator (fun _ ↦ 1) ω := by
    filter_upwards [pullCount_of_ge h a hm hn] with ω h
    simp only [h, Set.indicator_apply, Set.mem_setOf_eq, mul_ite, mul_one, mul_zero, Nat.cast_add,
      Nat.cast_ite, CharP.cast_eq_zero, add_right_inj]
    norm_cast
  rw [integral_congr_ae this, integral_add (integrable_const _), integral_const_mul]
  swap
  · refine Integrable.const_mul ?_ _
    rw [integrable_indicator_iff]
    · exact integrableOn_const
    · exact (measurableSet_singleton _).preimage (by fun_prop)
  simp only [integral_const, probReal_univ, smul_eq_mul, one_mul, neg_mul, add_le_add_iff_left]
  gcongr
  · norm_cast
    simp
  rw [integral_indicator_const, smul_eq_mul, mul_one]
  · rw [← neg_mul]
    exact prob_arm_mul_eq_le h hν a hm
  · exact (measurableSet_singleton _).preimage (by fun_prop)

`regret_le`🔗

TheoremBandits.ETC.regret_le

Details

Regret bound for the ETC algorithm.

theorem

Bandits.ETC.regret_le.{u_1} {K : ℕ} {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1}
  {mΩ : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P] {A : ℕ → Ω → Fin K}
  {R : ℕ → Ω → ℝ} {σ2 : NNReal} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hν :
    ∀ (a : Fin K),
      ProbabilityTheory.HasSubgaussianMGF
        (fun x => x - ∫ (x : ℝ), id x ∂ν a) σ2 (ν a))
  (hm : m ≠ 0) (n : ℕ) (hn : K * m ≤ n) :
  ∫ (x : Ω), regret ν A n x ∂P ≤
    ∑ a,
      gap ν a *
        (↑m + (↑n - ↑K * ↑m) * Real.exp (-↑m * gap ν a ^ 2 / (4 * ↑σ2)))
Bandits.ETC.regret_le.{u_1} {K : ℕ}
  {hK : 0 < K} {m : ℕ}
  {ν : ProbabilityTheory.Kernel (Fin K) ℝ}
  [ProbabilityTheory.IsMarkovKernel ν]
  {Ω : Type u_1} {mΩ : MeasurableSpace Ω}
  {P : MeasureTheory.Measure Ω}
  [MeasureTheory.IsProbabilityMeasure P]
  {A : ℕ → Ω → Fin K} {R : ℕ → Ω → ℝ}
  {σ2 : NNReal} [Nonempty (Fin K)]
  (h :
    Learning.IsAlgEnvSeq A R
      (etcAlgorithm hK m)
      (Learning.stationaryEnv ν) P)
  (hν :
    ∀ (a : Fin K),
      ProbabilityTheory.HasSubgaussianMGF
        (fun x =>
          x - ∫ (x : ℝ), id x ∂ν a)
        σ2 (ν a))
  (hm : m ≠ 0) (n : ℕ) (hn : K * m ≤ n) :
  ∫ (x : Ω), regret ν A n x ∂P ≤
    ∑ a,
      gap ν a *
        (↑m +
          (↑n - ↑K * ↑m) *
            Real.exp
              (-↑m * gap ν a ^ 2 /
                (4 * ↑σ2)))

Code

theorem regret_le [Nonempty (Fin K)]
    (h : IsAlgEnvSeq A R (etcAlgorithm hK m) (stationaryEnv ν) P)
    (hν : ∀ a, HasSubgaussianMGF (fun x ↦ x - (ν a)[id]) σ2 (ν a)) (hm : m ≠ 0)
    (n : ℕ) (hn : K * m ≤ n) :
    P[regret ν A n] ≤
      ∑ a, gap ν a * (m + (n - K * m) * Real.exp (- (m : ℝ) * gap ν a ^ 2 / (4 * σ2)))

Type uses (5)

Body uses (2)

Actions: Source · Open Issue

Proof

integral_regret_le_of_forall_integral_pullCount_le h
    (fun a _ ↦ expectation_pullCount_le h hν a hm hn)

LeanMachineLearning exposition

2.1. Online.Bandit.Algorithms.ETC🔗

`nextArm`🔗

`measurable_nextArm`🔗

`etcAlgorithm`🔗

`isAlgEnvSeqUntil_roundRobinAlgorithm`🔗

`arm_zero`🔗

`arm_ae_eq_etcNextArm`🔗

`arm_of_lt`🔗

`arm_mul`🔗

`arm_add_one_of_ge`🔗

`arm_of_ge`🔗

`pullCount_mul`🔗

`pullCount_add_one_of_ge`🔗

`pullCount_of_ge`🔗

`sumRewards_bestArm_le_of_arm_mul_eq`🔗

`probReal_sumRewards_le_sumRewards_le`🔗

`prob_arm_mul_eq_le`🔗

`expectation_pullCount_le`🔗

`regret_le`🔗

2.1. Online.Bandit.Algorithms.ETC🔗

nextArm🔗

measurable_nextArm🔗

etcAlgorithm🔗

isAlgEnvSeqUntil_roundRobinAlgorithm🔗

arm_zero🔗

arm_ae_eq_etcNextArm🔗

arm_of_lt🔗

arm_mul🔗

arm_add_one_of_ge🔗

arm_of_ge🔗

pullCount_mul🔗

pullCount_add_one_of_ge🔗

pullCount_of_ge🔗

sumRewards_bestArm_le_of_arm_mul_eq🔗

probReal_sumRewards_le_sumRewards_le🔗

prob_arm_mul_eq_le🔗

expectation_pullCount_le🔗

regret_le🔗

`nextArm`🔗

`measurable_nextArm`🔗

`etcAlgorithm`🔗

`isAlgEnvSeqUntil_roundRobinAlgorithm`🔗

`arm_zero`🔗

`arm_ae_eq_etcNextArm`🔗

`arm_of_lt`🔗

`arm_mul`🔗

`arm_add_one_of_ge`🔗

`arm_of_ge`🔗

`pullCount_mul`🔗

`pullCount_add_one_of_ge`🔗

`pullCount_of_ge`🔗

`sumRewards_bestArm_le_of_arm_mul_eq`🔗

`probReal_sumRewards_le_sumRewards_le`🔗

`prob_arm_mul_eq_le`🔗

`expectation_pullCount_le`🔗

`regret_le`🔗