Laws of stepsUntil and rewardByCount #
theorem
Bandits.hasLaw_Z
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
(a : π)
(m : β)
:
ProbabilityTheory.HasLaw (fun (Ο : Ξ© Γ (β β π β β)) => Ο.2 m a) (Ξ½ a) (P.prod (streamMeasure Ξ½))
Law of Y conditioned on the event s.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Law of Y conditioned on the event that X is in s.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Law of Y conditioned on the event that X equals x.
Equations
- One or more equations did not get rendered due to their size.
Instances For
theorem
Bandits.condDistrib_reward''
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[Countable π]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(n : β)
:
theorem
Bandits.reward_cond_action
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[Countable π]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(a : π)
(n : β)
(hΞΌa : (MeasureTheory.Measure.map (fun (Ο : Ξ© Γ (β β π β β)) => A n Ο.1) (P.prod (streamMeasure Ξ½))) {a} β 0)
:
theorem
Bandits.condIndepFun_reward_stepsUntil_action'
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[DecidableEq π]
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[StandardBorelSpace Ξ©]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(a : π)
(m n : β)
:
ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) β― (R n)
({Ο : Ξ© | Learning.stepsUntil A a m Ο = βn}.indicator fun (x : Ξ©) => 1) P
theorem
Bandits.condIndepFun_reward_stepsUntil_action
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[DecidableEq π]
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[StandardBorelSpace Ξ©]
[Countable π]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(a : π)
(m n : β)
:
ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (fun (Ο : Ξ© Γ (β β π β β)) => A n Ο.1) mπ) β―
(fun (Ο : Ξ© Γ (β β π β β)) => R n Ο.1)
({Ο : Ξ© Γ (β β π β β) | Learning.stepsUntil A a m Ο.1 = βn}.indicator fun (x : Ξ© Γ (β β π β β)) => 1)
(P.prod (streamMeasure Ξ½))
theorem
Bandits.reward_cond_stepsUntil
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[DecidableEq π]
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[StandardBorelSpace Ξ©]
[Countable π]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(a : π)
(m n : β)
(hm : m β 0)
(hΞΌn : (P.prod (streamMeasure Ξ½)) ((fun (Ο : Ξ© Γ (β β π β β)) => Learning.stepsUntil A a m Ο.1) β»ΒΉ' {βn}) β 0)
:
theorem
Bandits.condDistrib_rewardByCount_stepsUntil
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[DecidableEq π]
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[StandardBorelSpace Ξ©]
[Countable π]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(a : π)
(m : β)
(hm : m β 0)
:
βπ[Learning.rewardByCount A R a m | fun (Ο : Ξ© Γ (β β π β β)) => Learning.stepsUntil A a m Ο.1; P.prod
(streamMeasure
Ξ½)] =α΅[MeasureTheory.Measure.map (fun (Ο : Ξ© Γ (β β π β β)) => Learning.stepsUntil A a m Ο.1)
(P.prod (streamMeasure Ξ½))] β(ProbabilityTheory.Kernel.const ββ (Ξ½ a))
The conditional distribution of the reward received at the m-th pull of action a
given the time at which number of pulls is m is the constant kernel with value Ξ½ a.
theorem
Bandits.hasLaw_rewardByCount
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[DecidableEq π]
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[StandardBorelSpace Ξ©]
[Countable π]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(a : π)
(m : β)
(hm : m β 0)
:
ProbabilityTheory.HasLaw (Learning.rewardByCount A R a m) (Ξ½ a) (P.prod (streamMeasure Ξ½))
The reward received at the m-th pull of action a has law Ξ½ a.
theorem
Bandits.identDistrib_rewardByCount
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[DecidableEq π]
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[StandardBorelSpace Ξ©]
[Countable π]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(a : π)
(n m : β)
(hn : n β 0)
(hm : m β 0)
:
ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) (Learning.rewardByCount A R a m)
(P.prod (streamMeasure Ξ½)) (P.prod (streamMeasure Ξ½))
theorem
Bandits.identDistrib_rewardByCount_id
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[DecidableEq π]
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[StandardBorelSpace Ξ©]
[Countable π]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(a : π)
(n : β)
(hn : n β 0)
:
ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) id (P.prod (streamMeasure Ξ½)) (Ξ½ a)
theorem
Bandits.identDistrib_rewardByCount_eval
{π : Type u_1}
{Ξ© : Type u_2}
{mπ : MeasurableSpace π}
{mΞ© : MeasurableSpace Ξ©}
[DecidableEq π]
[StandardBorelSpace π]
[Nonempty π]
{A : β β Ξ© β π}
{R : β β Ξ© β β}
{P : MeasureTheory.Measure Ξ©}
[MeasureTheory.IsProbabilityMeasure P]
{alg : Learning.Algorithm π β}
{Ξ½ : ProbabilityTheory.Kernel π β}
[ProbabilityTheory.IsMarkovKernel Ξ½]
[StandardBorelSpace Ξ©]
[Countable π]
(h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv Ξ½) P)
(a : π)
(n m : β)
(hn : n β 0)
:
ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) (fun (Ο : β β π β β) => Ο m a)
(P.prod (streamMeasure Ξ½)) (streamMeasure Ξ½)