Probability space for algorithm-environment interactions #
For any algorithm and environment, we construct a probability space on which we can define a sequence of random variables representing the actions and feedback generated by the interaction of the algorithm and the environment. The main ingredient of the construction is the Ionescu-Tulcea theorem.
Main statements #
isAlgEnvSeq_unique: the law of the sequence of actions and observations generated by an algorithm-environment pair is unique: it does not depend on the probability space used. IfA₁,R₁andA₂,R₂are two algorithm-environment sequences generated by the same algorithm-environment pair on probability spaces(Ω, P)and(Ω', P'), thenP.map (fun ω n ↦ (A₁ n ω, R₁ n ω)) = P'.map (fun ω n ↦ (A₂ n ω, R₂ n ω)).
Measure on the sequence of actions and observations generated by the algorithm/environment.
Equations
- Learning.trajMeasure alg env = ProbabilityTheory.Kernel.trajMeasure (alg.p0.compProd env.ν0) (Learning.stepKernel alg env)
Instances For
The law of the sequence of actions and observations generated by an algorithm-environment pair is unique: it does not depend on the probability space used.
The law of the sequence of actions and observations generated by an algorithm-environment pair is unique: it does not depend on the probability space used.
action n is the action pulled at time n. This is a random variable on the measurable space
ℕ → 𝓐 × 𝓨.
Equations
- Learning.IT.action n h = (h n).1
Instances For
feedback n is the feedback at time n. This is a random variable on the measurable space
ℕ → 𝓐 × 𝓨.
Equations
- Learning.IT.feedback n h = (h n).2
Instances For
hist n is the history up to time n. This is a random variable on the measurable space
ℕ → 𝓐 × 𝓨.
Equations
- Learning.IT.hist n h i = h ↑i
Instances For
Filtration of the algorithm Seq.
Equations
Instances For
Filtration generated by the history at time n-1 together with the action at time n.
Equations
- One or more equations did not get rendered due to their size.