Round-Robin algorithm #
That algorithm chooses each of finitely many actions in a round-robin fashion.
That is, if there are K actions numbered from 0 to K - 1, then at time n it chooses
he action n % K.
Main definitions #
roundRobinAlgorithm: the Round-Robin algorithm.
noncomputable def
Learning.roundRobinAlgorithm
{𝓨 : Type u_1}
{m𝓨 : MeasurableSpace 𝓨}
{K : ℕ}
(hK : 0 < K)
:
The Round-Robin algorithm: deterministic algorithm that chooses action n % K at time n.
Equations
- Learning.roundRobinAlgorithm hK = Learning.detAlgorithm (fun (n : ℕ) (x : ↥(Finset.Iic n) → Fin K × 𝓨) => Learning.RoundRobin.nextAction hK n) ⋯ ⟨0, hK⟩
Instances For
theorem
Learning.RoundRobin.action_zero
{𝓨 : Type u_1}
{m𝓨 : MeasurableSpace 𝓨}
{K : ℕ}
[StandardBorelSpace 𝓨]
[Nonempty 𝓨]
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) 𝓨}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_2}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{Y : ℕ → Ω → 𝓨}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A Y (roundRobinAlgorithm hK) (stationaryEnv ν) P 0)
:
theorem
Learning.RoundRobin.action_ae_eq_roundRobinNextAction
{𝓨 : Type u_1}
{m𝓨 : MeasurableSpace 𝓨}
{K : ℕ}
[StandardBorelSpace 𝓨]
[Nonempty 𝓨]
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) 𝓨}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_2}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{Y : ℕ → Ω → 𝓨}
[Nonempty (Fin K)]
(n : ℕ)
(h : IsAlgEnvSeqUntil A Y (roundRobinAlgorithm hK) (stationaryEnv ν) P (n + 1))
:
A (n + 1) =ᵐ[P] fun (x : Ω) => nextAction hK n
theorem
Learning.RoundRobin.action_ae_eq
{𝓨 : Type u_1}
{m𝓨 : MeasurableSpace 𝓨}
{K : ℕ}
[StandardBorelSpace 𝓨]
[Nonempty 𝓨]
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) 𝓨}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_2}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{Y : ℕ → Ω → 𝓨}
[Nonempty (Fin K)]
(n : ℕ)
(h : IsAlgEnvSeqUntil A Y (roundRobinAlgorithm hK) (stationaryEnv ν) P n)
:
The action chosen at time n is the action n % K.
theorem
Learning.RoundRobin.pullCount_mul
{𝓨 : Type u_1}
{m𝓨 : MeasurableSpace 𝓨}
{K : ℕ}
[StandardBorelSpace 𝓨]
[Nonempty 𝓨]
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) 𝓨}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_2}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{Y : ℕ → Ω → 𝓨}
[Nonempty (Fin K)]
(m : ℕ)
(h : IsAlgEnvSeqUntil A Y (roundRobinAlgorithm hK) (stationaryEnv ν) P (K * m - 1))
(a : Fin K)
:
At time K * m, the number of times each action is chosen is equal to m.
theorem
Learning.RoundRobin.pullCount_eq_one
{𝓨 : Type u_1}
{m𝓨 : MeasurableSpace 𝓨}
{K : ℕ}
[StandardBorelSpace 𝓨]
[Nonempty 𝓨]
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) 𝓨}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_2}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{Y : ℕ → Ω → 𝓨}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A Y (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1))
(a : Fin K)
:
theorem
Learning.RoundRobin.time_gt_of_pullCount_gt_one
{𝓨 : Type u_1}
{m𝓨 : MeasurableSpace 𝓨}
{K : ℕ}
[StandardBorelSpace 𝓨]
[Nonempty 𝓨]
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) 𝓨}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_2}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{Y : ℕ → Ω → 𝓨}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A Y (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1))
(a : Fin K)
:
theorem
Learning.RoundRobin.pullCount_pos_of_time_ge
{𝓨 : Type u_1}
{m𝓨 : MeasurableSpace 𝓨}
{K : ℕ}
[StandardBorelSpace 𝓨]
[Nonempty 𝓨]
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) 𝓨}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_2}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{Y : ℕ → Ω → 𝓨}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A Y (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1))
:
theorem
Learning.RoundRobin.pullCount_pos_of_pullCount_gt_one
{𝓨 : Type u_1}
{m𝓨 : MeasurableSpace 𝓨}
{K : ℕ}
[StandardBorelSpace 𝓨]
[Nonempty 𝓨]
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) 𝓨}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_2}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{Y : ℕ → Ω → 𝓨}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A Y (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1))
(a : Fin K)
: