Bandits.tsAlgorithm
This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.
tsAlgorithm๐
Bandits.tsAlgorithm
The Thompson sampling algorithm with actions in Fin K, where Q : Measure ๐ is a prior
distribution over parameters, and ฮบ : Kernel (๐ ร Fin K) โ is a Markov kernel that defines the
stationary environment stationaryEnv (ฮบ.sectR e) that corresponds to a parameter e : ๐.
At every time n, the Thompson sampling policy uses the posterior over the parameters given the
history up to time n to derive the probability of each action being optimal. The action for time
n is sampled according to these probabilities.
Bandits.tsAlgorithm.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] [StandardBorelSpace ๐] [Nonempty ๐] (hK : 0 < K) (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ) [ProbabilityTheory.IsMarkovKernel ฮบ] : Learning.Algorithm (Fin K) โBandits.tsAlgorithm.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] [StandardBorelSpace ๐] [Nonempty ๐] (hK : 0 < K) (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ) [ProbabilityTheory.IsMarkovKernel ฮบ] : Learning.Algorithm (Fin K) โ
Code
noncomputable
def tsAlgorithm (hK : 0 < K) (Q : Measure ๐) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐ ร Fin K) โ)
[IsMarkovKernel ฮบ] : Algorithm (Fin K) โ where
policy := TS.policy hK Q ฮบ
p0 := TS.initialPolicy hK Q ฮบType uses (1)
Body uses (4)
Used by (4)
Actions: Source ยท Open Issue
Dependency graph
Type dependencies (1)
Algorithm๐
Learning.AlgorithmA stochastic, sequential algorithm.
Learning.Algorithm.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)Learning.Algorithm.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)
Code
structure Algorithm (๐ ๐จ : Type*) [MeasurableSpace ๐] [MeasurableSpace ๐จ] where /-- Policy or sampling rule: distribution of the next action. -/ policy : (n : โ) โ Kernel (Iic n โ ๐ ร ๐จ) ๐ /-- The policy is a Markov kernel. -/ [h_policy : โ n, IsMarkovKernel (policy n)] /-- Distribution of the first action. -/ p0 : Measure ๐ /-- The first action distribution is a probability measure. -/ [hp0 : IsProbabilityMeasure p0]
Used by (216)
Actions: Source ยท Open Issue
All dependencies, transitively (29)
history๐
Learning.history
History of the algorithm-environment sequence up to time n.
Learning.history.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {ฮฉ : Type u_3} (A : โ โ ฮฉ โ ๐) (Y : โ โ ฮฉ โ ๐จ) (n : โ) (ฯ : ฮฉ) : โฅ(Finset.Iic n) โ ๐ ร ๐จLearning.history.{u_1, u_2, u_3} {๐ : Type u_1} {๐จ : Type u_2} {ฮฉ : Type u_3} (A : โ โ ฮฉ โ ๐) (Y : โ โ ฮฉ โ ๐จ) (n : โ) (ฯ : ฮฉ) : โฅ(Finset.Iic n) โ ๐ ร ๐จ
Code
def history (A : โ โ ฮฉ โ ๐) (Y : โ โ ฮฉ โ ๐จ) (n : โ) (ฯ : ฮฉ) : Iic n โ ๐ ร ๐จ := fun i โฆ (A i ฯ, Y i ฯ)
Actions: Source ยท Open Issue
action๐
Learning.IT.action
action n is the action pulled at time n. This is a random variable on the measurable space
โ โ ๐ ร ๐จ.
Learning.IT.action.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} (n : โ) (h : โ โ ๐ ร ๐จ) : ๐Learning.IT.action.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} (n : โ) (h : โ โ ๐ ร ๐จ) : ๐
Code
def action (n : โ) (h : โ โ ๐ ร ๐จ) : ๐ := (h n).1
Actions: Source ยท Open Issue
Environment๐
Learning.EnvironmentA stochastic environment.
Learning.Environment.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)Learning.Environment.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)
Code
structure Environment (๐ ๐จ : Type*) [MeasurableSpace ๐] [MeasurableSpace ๐จ] where /-- Distribution of the next observation as function of the past history. -/ feedback : (n : โ) โ Kernel ((Iic n โ ๐ ร ๐จ) ร ๐) ๐จ /-- The feedback kernels are Markov kernels. -/ [h_feedback : โ n, IsMarkovKernel (feedback n)] /-- Distribution of the first observation given the first action. -/ ฮฝ0 : Kernel ๐ ๐จ /-- The initial observation kernel is a Markov kernel. -/ [hp0 : IsMarkovKernel ฮฝ0]
Actions: Source ยท Open Issue
stepKernel๐
Learning.stepKernel
Kernel describing the distribution of the next action-feedback pair given the history
up to n.
Learning.stepKernel.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.Kernel (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (๐ ร ๐จ)Learning.stepKernel.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.Kernel (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (๐ ร ๐จ)
Code
noncomputable
def stepKernel (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) :
Kernel (Iic n โ ๐ ร ๐จ) (๐ ร ๐จ) :=
alg.policy n โโ env.feedback n
deriving IsMarkovKernelType uses (2)
Actions: Source ยท Open Issue
instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy๐
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicyNo docstring.
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (Algorithm.policy alg n)Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (Algorithm.policy alg n)
Code
instance (alg : Algorithm ๐ ๐จ) (n : โ) : IsMarkovKernel (alg.policy n)
Type uses (1)
Actions: Source ยท Open Issue
Proof
alg.h_policy n
instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback๐
Learning.instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedbackNo docstring.
Learning.instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (Environment.feedback env n)Learning.instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (Environment.feedback env n)
Code
instance (env : Environment ๐ ๐จ) (n : โ) : IsMarkovKernel (env.feedback n)
Type uses (1)
Used by (5)
Actions: Source ยท Open Issue
Proof
env.h_feedback n
instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel๐
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernelNo docstring.
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (stepKernel alg env n)Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (stepKernel alg env n)
Code
deriving IsMarkovKernel
Type uses (3)
Body uses (2)
Used by (10)
Actions: Source ยท Open Issue
Proof
deriving IsMarkovKernel
trajMeasure๐
Learning.trajMeasureMeasure on the sequence of actions and observations generated by the algorithm/environment.
Learning.trajMeasure.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) : MeasureTheory.Measure (โ โ ๐ ร ๐จ)Learning.trajMeasure.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) : MeasureTheory.Measure (โ โ ๐ ร ๐จ)
Code
noncomputable
def trajMeasure (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) :
Measure (โ โ ๐ ร ๐จ) :=
Kernel.trajMeasure (alg.p0 โโ env.ฮฝ0) (stepKernel alg env)
deriving IsProbabilityMeasureType uses (2)
Used by (19)
Actions: Source ยท Open Issue
instIsProbabilityMeasureP0๐
Learning.instIsProbabilityMeasureP0No docstring.
Learning.instIsProbabilityMeasureP0.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) : MeasureTheory.IsProbabilityMeasure (Algorithm.p0 alg)Learning.instIsProbabilityMeasureP0.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) : MeasureTheory.IsProbabilityMeasure (Algorithm.p0 alg)
Code
instance (alg : Algorithm ๐ ๐จ) : IsProbabilityMeasure alg.p0
Type uses (1)
Used by (13)
Actions: Source ยท Open Issue
Proof
alg.hp0
prodLeft๐
Learning.Algorithm.prodLeft
An algorithm with observations in ๐ง ร ๐จ obtained from an algorithm with observations in ๐จ
by ignoring the ๐ง component of each observation.
Learning.Algorithm.prodLeft.{u_1, u_2, u_4} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (๐ง : Type u_4) [MeasurableSpace ๐ง] (alg : Algorithm ๐ ๐จ) : Algorithm ๐ (๐ง ร ๐จ)Learning.Algorithm.prodLeft.{u_1, u_2, u_4} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (๐ง : Type u_4) [MeasurableSpace ๐ง] (alg : Algorithm ๐ ๐จ) : Algorithm ๐ (๐ง ร ๐จ)
Code
def Algorithm.prodLeft (๐ง : Type*) [MeasurableSpace ๐ง] (alg : Algorithm ๐ ๐จ) :
Algorithm ๐ (๐ง ร ๐จ) where
policy n := (alg.policy n).comap (fun h i โฆ ((h i).1, (h i).2.2)) (by fun_prop)
p0 := alg.p0Type uses (1)
Used by (6)
Actions: Source ยท Open Issue
bayesStationaryEnv๐
Learning.bayesStationaryEnv
An environment with observations in ๐ ร ๐จ. The first element e of an observation is
sampled from Q once and remains constant. The second element of an observation is sampled from
ฮบ (e, a), where a is the corresponding action.
Learning.bayesStationaryEnv.{u_1, u_2, u_3} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] : Environment ๐ (๐ ร ๐จ)Learning.bayesStationaryEnv.{u_1, u_2, u_3} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] : Environment ๐ (๐ ร ๐จ)
Code
noncomputable
def bayesStationaryEnv (Q : Measure ๐) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐ ร ๐) ๐จ)
[IsMarkovKernel ฮบ] : Environment ๐ (๐ ร ๐จ) where
feedback n :=
let g : (Iic n โ ๐ ร ๐ ร ๐จ) ร ๐ โ ๐ ร ๐ := fun (h, a) => ((h โจ0, by simpโฉ).2.1, a)
(Kernel.deterministic (Prod.fst โ g) (by fun_prop)) รโ (ฮบ.comap g (by fun_prop))
ฮฝ0 := (Kernel.const _ Q) โโ ฮบ.swapLeftType uses (1)
Used by (4)
Actions: Source ยท Open Issue
bayesTrajMeasure๐
Learning.IT.bayesTrajMeasure
A measure P on a measurable space that carries random variables E, A, and Y such that
IsBayesAlgEnvSeq Q ฮบ alg E A Y P.
Learning.IT.bayesTrajMeasure.{u_1, u_2, u_3} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐ ๐จ) : MeasureTheory.Measure (โ โ ๐ ร ๐ ร ๐จ)Learning.IT.bayesTrajMeasure.{u_1, u_2, u_3} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐ ๐จ) : MeasureTheory.Measure (โ โ ๐ ร ๐ ร ๐จ)
Code
noncomputable
def bayesTrajMeasure (Q : Measure ๐) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐ ร ๐) ๐จ)
[IsMarkovKernel ฮบ] (alg : Algorithm ๐ ๐จ) : Measure (โ โ ๐ ร ๐ ร ๐จ) :=
trajMeasure (alg.prodLeft ๐) (bayesStationaryEnv Q ฮบ)
deriving IsProbabilityMeasureType uses (1)
Body uses (3)
Used by (5)
Actions: Source ยท Open Issue
instIsMarkovKernelฮฝ0๐
Learning.instIsMarkovKernelฮฝ0No docstring.
Learning.instIsMarkovKernelฮฝ0.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) : ProbabilityTheory.IsMarkovKernel (Environment.ฮฝ0 env)Learning.instIsMarkovKernelฮฝ0.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) : ProbabilityTheory.IsMarkovKernel (Environment.ฮฝ0 env)
Code
instance (env : Environment ๐ ๐จ) : IsMarkovKernel env.ฮฝ0
Type uses (1)
Used by (8)
Actions: Source ยท Open Issue
Proof
env.hp0
instIsProbabilityMeasureForallNatProdTrajMeasure๐
Learning.instIsProbabilityMeasureForallNatProdTrajMeasureNo docstring.
Learning.instIsProbabilityMeasureForallNatProdTrajMeasure.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) : MeasureTheory.IsProbabilityMeasure (trajMeasure alg env)Learning.instIsProbabilityMeasureForallNatProdTrajMeasure.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) : MeasureTheory.IsProbabilityMeasure (trajMeasure alg env)
Code
deriving IsProbabilityMeasure
Type uses (3)
Body uses (4)
Used by (8)
Actions: Source ยท Open Issue
Proof
deriving IsProbabilityMeasure
instIsProbabilityMeasureForallNatProdBayesTrajMeasure๐
Learning.IT.instIsProbabilityMeasureForallNatProdBayesTrajMeasureNo docstring.
Learning.IT.instIsProbabilityMeasureForallNatProdBayesTrajMeasure.{u_1, u_2, u_3} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐ ๐จ) : MeasureTheory.IsProbabilityMeasure (bayesTrajMeasure Q ฮบ alg)Learning.IT.instIsProbabilityMeasureForallNatProdBayesTrajMeasure.{u_1, u_2, u_3} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐ ๐จ) : MeasureTheory.IsProbabilityMeasure (bayesTrajMeasure Q ฮบ alg)
Code
deriving IsProbabilityMeasure
Type uses (2)
Used by (4)
Actions: Source ยท Open Issue
Proof
deriving IsProbabilityMeasure
bayesTrajMeasurePosterior๐
Learning.IT.bayesTrajMeasurePosterior
A kernel that represents the posterior over E given the history up to time n.
Learning.IT.bayesTrajMeasurePosterior.{u_1, u_2, u_3} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] [StandardBorelSpace ๐] [Nonempty ๐] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐ ๐จ) (n : โ) : ProbabilityTheory.Kernel (โฅ(Finset.Iic n) โ ๐ ร ๐จ) ๐Learning.IT.bayesTrajMeasurePosterior.{u_1, u_2, u_3} {๐ : Type u_1} {๐ : Type u_2} {๐จ : Type u_3} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] [StandardBorelSpace ๐] [Nonempty ๐] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐ ๐จ) (n : โ) : ProbabilityTheory.Kernel (โฅ(Finset.Iic n) โ ๐ ร ๐จ) ๐
Code
noncomputable
def bayesTrajMeasurePosterior [StandardBorelSpace ๐] [Nonempty ๐]
(Q : Measure ๐) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐ ร ๐) ๐จ) [IsMarkovKernel ฮบ]
(alg : Algorithm ๐ ๐จ) (n : โ) : Kernel (Iic n โ ๐ ร ๐จ) ๐ :=
condDistrib (fun ฯ โฆ (ฯ 0).2.1) (history action (fun n ฯ โฆ (ฯ n).2.2) n)
(bayesTrajMeasure Q ฮบ alg)
deriving IsMarkovKernelType uses (1)
Used by (4)
Actions: Source ยท Open Issue
randomSampling๐
Learning.randomSamplingThe Random Sampling algorithm, which samples from a fixed probability measure at each iteration.
Learning.randomSampling.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮผ : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure ฮผ] : Algorithm ๐ ๐จLearning.randomSampling.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮผ : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure ฮผ] : Algorithm ๐ ๐จ
Code
noncomputable def randomSampling (ฮผ : Measure ๐) [IsProbabilityMeasure ฮผ] : Algorithm ๐ ๐จ where policy _ := Kernel.const _ ฮผ p0 := ฮผ
Type uses (1)
Actions: Source ยท Open Issue
uniformAlgorithm๐
Learning.uniformAlgorithmThe Uniform algorithm: actions are chosen uniformly at random.
Learning.uniformAlgorithm.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} [Finite ๐] [Nonempty ๐] : Algorithm ๐ ๐จLearning.uniformAlgorithm.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} [Finite ๐] [Nonempty ๐] : Algorithm ๐ ๐จ
Code
noncomputable def uniformAlgorithm [Finite ๐] [Nonempty ๐] : Algorithm ๐ ๐จ := randomSampling (uniformOn Set.univ)
Type uses (1)
Body uses (1)
Used by (4)
Actions: Source ยท Open Issue
max๐
Function.maxThe maximum value of a tuple.
Function.max.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ ฮฑ) : ฮฑFunction.max.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ ฮฑ) : ฮฑ
Code
abbrev max : ฮฑ := univ.sup' univ_nonempty f
Used by (8)
Actions: Source ยท Open Issue
exists_argmax๐
exists_argmaxNo docstring.
exists_argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ ฮฑ) : โ i, f i = Function.max fexists_argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ ฮฑ) : โ i, f i = Function.max f
Code
lemma exists_argmax : โ i, f i = f.max
Type uses (1)
Used by (3)
Actions: Source ยท Open Issue
Proof
by obtain โจi, -, hiโฉ := Finset.exists_mem_eq_sup' (by simp : Finset.univ.Nonempty) f exact โจi, hi.symmโฉ
argmax๐
argmaxThe index of the maximum value of a tuple.
argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ ฮฑ) : ฮนargmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ ฮฑ) : ฮน
Code
noncomputable def argmax := (exists_argmax f).choose
Body uses (2)
Used by (17)
Actions: Source ยท Open Issue
actionMean๐
Learning.IsBayesAlgEnvSeq.actionMean
A random variable that gives the mean feedback of action a.
Learning.IsBayesAlgEnvSeq.actionMean.{u_1, u_2, u_4} {๐ : Type u_1} {๐ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) โ) (E : ฮฉ โ ๐) (a : ๐) (ฯ : ฮฉ) : โLearning.IsBayesAlgEnvSeq.actionMean.{u_1, u_2, u_4} {๐ : Type u_1} {๐ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) โ) (E : ฮฉ โ ๐) (a : ๐) (ฯ : ฮฉ) : โ
Code
noncomputable def actionMean (ฮบ : Kernel (๐ ร ๐) โ) (E : ฮฉ โ ๐) (a : ๐) (ฯ : ฮฉ) : โ := (ฮบ (E ฯ, a))[id]
Actions: Source ยท Open Issue
bestAction๐
Learning.IsBayesAlgEnvSeq.bestActionA random variable that gives the action with the highest mean feedback.
Learning.IsBayesAlgEnvSeq.bestAction.{u_1, u_2, u_4} {๐ : Type u_1} {๐ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] [Nonempty ๐] [Fintype ๐] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) โ) (E : ฮฉ โ ๐) (ฯ : ฮฉ) : ๐Learning.IsBayesAlgEnvSeq.bestAction.{u_1, u_2, u_4} {๐ : Type u_1} {๐ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] [Nonempty ๐] [Fintype ๐] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) โ) (E : ฮฉ โ ๐) (ฯ : ฮฉ) : ๐
Code
noncomputable def bestAction [Nonempty ๐] [Fintype ๐] (ฮบ : Kernel (๐ ร ๐) โ) (E : ฮฉ โ ๐) (ฯ : ฮฉ) : ๐ := argmax (fun a โฆ actionMean ฮบ E a ฯ)
Body uses (2)
Used by (12)
Actions: Source ยท Open Issue
policy๐
Bandits.TS.policyThe Thompson sampling policy samples an action according to its probability of being optimal under the posterior over environments given the history so far. The posterior under a uniform algorithm is used to avoid a circular definition.
Bandits.TS.policy.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] [StandardBorelSpace ๐] [Nonempty ๐] (hK : 0 < K) (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ) [ProbabilityTheory.IsMarkovKernel ฮบ] (n : โ) : ProbabilityTheory.Kernel (โฅ(Finset.Iic n) โ Fin K ร โ) (Fin K)Bandits.TS.policy.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] [StandardBorelSpace ๐] [Nonempty ๐] (hK : 0 < K) (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ) [ProbabilityTheory.IsMarkovKernel ฮบ] (n : โ) : ProbabilityTheory.Kernel (โฅ(Finset.Iic n) โ Fin K ร โ) (Fin K)
Code
noncomputable
def TS.policy (hK : 0 < K) (Q : Measure ๐) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐ ร Fin K) โ)
[IsMarkovKernel ฮบ] (n : โ) : Kernel (Iic n โ (Fin K) ร โ) (Fin K) :=
have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
(IT.bayesTrajMeasurePosterior Q ฮบ uniformAlgorithm n).map (bestAction ฮบ id)Body uses (3)
Actions: Source ยท Open Issue
instIsMarkovKernelForallSubtypeNatMemFinsetIicProdBayesTrajMeasurePosterior๐
Learning.IT.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdBayesTrajMeasurePosteriorNo docstring.
Learning.IT.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdBayesTrajMeasurePosterior.{u_1, u_2, u_3} {๐ : Type u_3} {๐ : Type u_1} {๐จ : Type u_2} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] [StandardBorelSpace ๐] [Nonempty ๐] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (bayesTrajMeasurePosterior Q ฮบ alg n)Learning.IT.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdBayesTrajMeasurePosterior.{u_1, u_2, u_3} {๐ : Type u_3} {๐ : Type u_1} {๐จ : Type u_2} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ๐จ] [StandardBorelSpace ๐] [Nonempty ๐] (Q : MeasureTheory.Measure ๐) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) ๐จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (bayesTrajMeasurePosterior Q ฮบ alg n)
Code
deriving IsMarkovKernel
Type uses (2)
Actions: Source ยท Open Issue
Proof
deriving IsMarkovKernel
measurable_bestAction๐
Learning.IsBayesAlgEnvSeq.measurable_bestActionNo docstring.
Learning.IsBayesAlgEnvSeq.measurable_bestAction.{u_1, u_2, u_4} {๐ : Type u_1} {๐ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ฮฉ] [Nonempty ๐] [Fintype ๐] {ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) โ} {E : ฮฉ โ ๐} (hE : Measurable E) : Measurable (bestAction ฮบ E)Learning.IsBayesAlgEnvSeq.measurable_bestAction.{u_1, u_2, u_4} {๐ : Type u_1} {๐ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐] [MeasurableSpace ๐] [MeasurableSpace ฮฉ] [Nonempty ๐] [Fintype ๐] {ฮบ : ProbabilityTheory.Kernel (๐ ร ๐) โ} {E : ฮฉ โ ๐} (hE : Measurable E) : Measurable (bestAction ฮบ E)
Code
lemma measurable_bestAction [Nonempty ๐] [Fintype ๐] {ฮบ : Kernel (๐ ร ๐) โ} {E : ฮฉ โ ๐}
(hE : Measurable E) : Measurable (bestAction ฮบ E)Type uses (1)
Body uses (4)
Used by (7)
Actions: Source ยท Open Issue
Proof
by unfold bestAction fun_prop
instIsMarkovKernelForallSubtypeNatMemFinsetIicProdFinRealPolicy๐
Bandits.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdFinRealPolicyNo docstring.
Bandits.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdFinRealPolicy.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] [StandardBorelSpace ๐] [Nonempty ๐] {hK : 0 < K} {Q : MeasureTheory.Measure ๐} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ} [ProbabilityTheory.IsMarkovKernel ฮบ] {n : โ} : ProbabilityTheory.IsMarkovKernel (TS.policy hK Q ฮบ n)Bandits.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdFinRealPolicy.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] [StandardBorelSpace ๐] [Nonempty ๐] {hK : 0 < K} {Q : MeasureTheory.Measure ๐} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ} [ProbabilityTheory.IsMarkovKernel ฮบ] {n : โ} : ProbabilityTheory.IsMarkovKernel (TS.policy hK Q ฮบ n)
Code
instance {hK : 0 < K} {Q : Measure ๐} [IsProbabilityMeasure Q] {ฮบ : Kernel (๐ ร Fin K) โ}
[IsMarkovKernel ฮบ] {n : โ} : IsMarkovKernel (TS.policy hK Q ฮบ n)Type uses (1)
Body uses (5)
Used by (1)
Actions: Source ยท Open Issue
Proof
Kernel.IsMarkovKernel.map _ (by fun_prop)
initialPolicy๐
Bandits.TS.initialPolicyThe initial action is sampled according to its probability of being optimal under the prior over environments.
Bandits.TS.initialPolicy.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] (hK : 0 < K) (Q : MeasureTheory.Measure ๐) (ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ) : MeasureTheory.Measure (Fin K)Bandits.TS.initialPolicy.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] (hK : 0 < K) (Q : MeasureTheory.Measure ๐) (ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ) : MeasureTheory.Measure (Fin K)
Code
noncomputable def TS.initialPolicy (hK : 0 < K) (Q : Measure ๐) (ฮบ : Kernel (๐ ร Fin K) โ) : Measure (Fin K) := have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK Q.map (bestAction ฮบ id)
Body uses (1)
Used by (2)
Actions: Source ยท Open Issue
instIsProbabilityMeasureFinInitialPolicy๐
Bandits.instIsProbabilityMeasureFinInitialPolicyNo docstring.
Bandits.instIsProbabilityMeasureFinInitialPolicy.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] {hK : 0 < K} {Q : MeasureTheory.Measure ๐} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ} : MeasureTheory.IsProbabilityMeasure (TS.initialPolicy hK Q ฮบ)Bandits.instIsProbabilityMeasureFinInitialPolicy.{u_1} {K : โ} {๐ : Type u_1} [MeasurableSpace ๐] {hK : 0 < K} {Q : MeasureTheory.Measure ๐} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐ ร Fin K) โ} : MeasureTheory.IsProbabilityMeasure (TS.initialPolicy hK Q ฮบ)
Code
instance {hK : 0 < K} {Q : Measure ๐} [IsProbabilityMeasure Q] {ฮบ : Kernel (๐ ร Fin K) โ} :
IsProbabilityMeasure (TS.initialPolicy hK Q ฮบ)Type uses (1)
Body uses (2)
Used by (1)
Actions: Source ยท Open Issue
Proof
Measure.isProbabilityMeasure_map (by fun_prop)