Algorithms and environments #
We define structures for stochastic, sequential algorithms and environments, and the notion of an algorithm-environment sequence, which is a sequence of actions and feedbacks generated by an algorithm interacting with an environment.
Main definitions #
Algorithm π π¨: a stochastic, sequential algorithm.Environment π π¨: a stochastic environment.IsAlgEnvSeq A π¨ alg env P: an algorithm-environment sequence. That is, a sequence of actionsAand feedbackYthat have the correct conditional distributions to be generated by an algorithmalginteracting with an environmentenv, defined on a probability space(Ξ©, P).IsAlgEnvSeqUntil A Y alg env P N:AandYform an algorithm-environment sequence until timeN.prod_left alg: anAlgorithm π (π§ Γ π¨)obtained from an algorithmalg : Algorithm π π¨by ignoring theπ§component of each observation.
A stochastic, sequential algorithm.
- policy (n : β) : ProbabilityTheory.Kernel (β₯(Finset.Iic n) β π Γ π¨) π
Policy or sampling rule: distribution of the next action.
- h_policy (n : β) : ProbabilityTheory.IsMarkovKernel (self.policy n)
The policy is a Markov kernel.
- p0 : MeasureTheory.Measure π
Distribution of the first action.
- hp0 : MeasureTheory.IsProbabilityMeasure self.p0
The first action distribution is a probability measure.
Instances For
An algorithm with observations in π§ Γ π¨ obtained from an algorithm with observations in π¨
by ignoring the π§ component of each observation.
Equations
- One or more equations did not get rendered due to their size.
Instances For
A stochastic environment.
- feedback (n : β) : ProbabilityTheory.Kernel ((β₯(Finset.Iic n) β π Γ π¨) Γ π) π¨
Distribution of the next observation as function of the past history.
- h_feedback (n : β) : ProbabilityTheory.IsMarkovKernel (self.feedback n)
The feedback kernels are Markov kernels.
- Ξ½0 : ProbabilityTheory.Kernel π π¨
Distribution of the first observation given the first action.
- hp0 : ProbabilityTheory.IsMarkovKernel self.Ξ½0
The initial observation kernel is a Markov kernel.
Instances For
Kernel describing the distribution of the next action-feedback pair given the history
up to n.
Equations
- Learning.stepKernel alg env n = (alg.policy n).compProd (env.feedback n)
Instances For
History of the algorithm-environment sequence up to time n.
Equations
- Learning.history A Y n Ο i = (A (βi) Ο, Y (βi) Ο)
Instances For
An algorithm-environment sequence: a sequence of actions and feedbacks generated by an algorithm interacting with an environment.
- measurable_action (n : β) : Measurable (A n)
The action sequence is measurable.
- measurable_feedback (n : β) : Measurable (Y n)
The feedback sequence is measurable.
- hasLaw_action_zero : ProbabilityTheory.HasLaw (fun (Ο : Ξ©) => A 0 Ο) alg.p0 P
The first action has the correct law.
- hasCondDistrib_feedback_zero : ProbabilityTheory.HasCondDistrib (Y 0) (A 0) env.Ξ½0 P
The first feedback has the correct conditional distribution.
- hasCondDistrib_action (n : β) : ProbabilityTheory.HasCondDistrib (A (n + 1)) (history A Y n) (alg.policy n) P
The next action has the correct conditional distribution given the history.
- hasCondDistrib_feedback (n : β) : ProbabilityTheory.HasCondDistrib (Y (n + 1)) (fun (Ο : Ξ©) => (history A Y n Ο, A (n + 1) Ο)) (env.feedback n) P
The next feedback has the correct conditional distribution given the history and next action.
Instances For
An algorithm-environment sequence: a sequence of actions and feedbacks generated by an algorithm interacting with an environment.
- measurable_action (n : β) : Measurable (A n)
The action sequence is measurable.
- measurable_feedback (n : β) : Measurable (Y n)
The feedback sequence is measurable.
- hasLaw_action_zero : ProbabilityTheory.HasLaw (fun (Ο : Ξ©) => A 0 Ο) alg.p0 P
The first action has the correct law.
- hasCondDistrib_feedback_zero : ProbabilityTheory.HasCondDistrib (Y 0) (A 0) env.Ξ½0 P
The first feedback has the correct conditional distribution.
- hasCondDistrib_action (n : β) (hn : n < N) : ProbabilityTheory.HasCondDistrib (A (n + 1)) (history A Y n) (alg.policy n) P
The next action has the correct conditional distribution given the history.
- hasCondDistrib_feedback (n : β) (hn : n < N) : ProbabilityTheory.HasCondDistrib (Y (n + 1)) (fun (Ο : Ξ©) => (history A Y n Ο, A (n + 1) Ο)) (env.feedback n) P
The next feedback has the correct conditional distribution given the history and next action.
Instances For
Filtration generated by the history up to time n.
Equations
- Learning.IsAlgEnvSeq.filtration hA hY = { seq := fun (i : β) => MeasurableSpace.comap (Learning.history A Y i) inferInstance, mono' := β―, le' := β― }
Instances For
Filtration generated by the history at time n-1 together with the action at time n.
Equations
- One or more equations did not get rendered due to their size.