Deterministic algorithms and environments #
A deterministic algorithm chooses its action in a deterministic way. That is, that action is given by a measurable function of the history instead of a general Markov kernel. Similarly, a deterministic environment gives feedback in a deterministic way.
Main definitions #
We introduce two typeclasses IsDeterministicAlg and IsDeterministicEnv to express that
an algorithm or an environment is deterministic. We also give definitions for the initial action
and the next action of a deterministic algorithm, and for the feedback functions of a deterministic
environment. Finally, we give a construction of a deterministic algorithm and environment from
measurable functions.
IsDeterministicAlg alg: a typeclass expressing that the algorithmalgis deterministic.IsDeterministicEnv env: a typeclass expressing that the environmentenvis deterministic.actionZero alg: the initial action of a deterministic algorithmalg.nextAction alg n: the function that gives the next action of a deterministic algorithmalgat stepn, as a function of the history.feedbackFunZero env: the function that gives the initial feedback of a deterministic environmentenv.feedbackFun env n: the function that gives the feedback of a deterministic environmentenvat stepn, as a function of the history and the current action.detAlgorithm nextA h_next action0: a deterministic algorithm that chooses its action according to the measurable functionnextA(with proof of measurabilityh_next), with initial actionaction0.
An algorithm is deterministic if its initial action and subsequent actions are determined by measurable functions (and not possibly random kernels).
- exists_action0 : โ (action0 : ๐), alg.p0 = MeasureTheory.Measure.dirac action0
- exists_nextAction (n : โ) : โ (nextAction : (โฅ(Finset.Iic n) โ ๐ ร ๐จ) โ ๐) (h_meas : Measurable nextAction), alg.policy n = ProbabilityTheory.Kernel.deterministic nextAction h_meas
Instances
The initial action of a deterministic algorithm.
Equations
- Learning.actionZero alg = โฏ.choose
Instances For
The next action of a deterministic algorithm after step n.
Equations
- Learning.nextAction alg n = โฏ.choose
Instances For
An environment is deterministic if its initial feedbacks are determined by measurable functions (and not possibly random kernels).
- exists_f0 : โ (f0 : ๐ โ ๐จ) (hf0 : Measurable f0), env.ฮฝ0 = ProbabilityTheory.Kernel.deterministic f0 hf0
- exists_f (n : โ) : โ (f : (โฅ(Finset.Iic n) โ ๐ ร ๐จ) ร ๐ โ ๐จ) (hf : Measurable f), env.feedback n = ProbabilityTheory.Kernel.deterministic f hf
Instances
The initial feedback function of a deterministic environment.
Equations
- Learning.feedbackFunZero env = โฏ.choose
Instances For
The feedback function of a deterministic environment at step n.
Equations
- Learning.feedbackFun env n = โฏ.choose
Instances For
A deterministic algorithm, which chooses the action given by the function nextAction.
Equations
- Learning.detAlgorithm nextA h_next action0 = { policy := fun (n : โ) => ProbabilityTheory.Kernel.deterministic (nextA n) โฏ, h_policy := โฏ, p0 := MeasureTheory.Measure.dirac action0, hp0 := โฏ }
Instances For
A deterministic environment, where the feedback is given by evaluating fixed measurable functions.
Equations
- One or more equations did not get rendered due to their size.