Oblivious and stationary environments #
An oblivious environment is an environment in which the distribution of the next feedback depends only on the last action (and not on the past history). If the kernel that gives the distribution of the next feedback given the last action is the same at every time step, then we say that the environment is stationary.
Main definitions #
We define a Prop-valued typeclass IsObliviousEnv to express that an environment is oblivious,
and we define two constructors for oblivious environments.
Typeclass and related definitions:
IsObliviousEnv env: the environmentenvis oblivious.feedbackCondAction env n: the kernel representing the conditional distribution of the feedback given the action at timenin an oblivious environmentenv.
Constructors for oblivious environments:
obliviousEnv ν: an oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner, and is given by a sequence of Markov kernelsν : ℕ → Kernel 𝓐 𝓨.stationaryEnv ν: a stationary environment, in which the distribution of the next feedback depends only on the last action (and not on the past history), and is given by a Markov kernelν : Kernel 𝓐 𝓨.
An environment is oblivious if the distribution of the next feedback depends only on the last action and not on the past history.
- exists_eq_prodMkLeft : ∃ (ν : ℕ → ProbabilityTheory.Kernel 𝓐 𝓨), (∀ (n : ℕ), ProbabilityTheory.IsMarkovKernel (ν n)) ∧ env.ν0 = ν 0 ∧ ∀ (n : ℕ), env.feedback n = ProbabilityTheory.Kernel.prodMkLeft (↥(Finset.Iic n) → 𝓐 × 𝓨) (ν (n + 1))
Instances
The kernel representing the conditional distribution of the feedback given the action
at time n in an oblivious environment.
Equations
- Learning.feedbackCondAction env n = ⋯.choose n
Instances For
The feedback at time n + 1 is conditionally independent of the history up to time n
given the action at time n + 1.
An oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner.
Equations
- Learning.obliviousEnv ν = { feedback := fun (n : ℕ) => ProbabilityTheory.Kernel.prodMkLeft (↥(Finset.Iic n) → 𝓐 × 𝓨) (ν (n + 1)), h_feedback := ⋯, ν0 := ν 0, hp0 := ⋯ }
Instances For
A stationary environment, in which the distribution of the next feedback depends only on the last action.
Equations
- Learning.stationaryEnv ν = Learning.obliviousEnv fun (x : ℕ) => ν
Instances For
The conditional distribution of the feedback at time n given the action at time n is ν.
The conditional distribution of the feedback at time n given the action at time n is ν.
The feedback at time n + 1 is conditionally independent of the history up to time n
given the action at time n + 1.