Learning.instIsObliviousEnvEvalEnv
This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.
instIsObliviousEnvEvalEnv๐
Learning.instIsObliviousEnvEvalEnvNo docstring.
Learning.instIsObliviousEnvEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {f : ๐ โ ๐จ} {hf : Measurable f} : IsObliviousEnv (evalEnv f hf)Learning.instIsObliviousEnvEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {f : ๐ โ ๐จ} {hf : Measurable f} : IsObliviousEnv (evalEnv f hf)
Code
instance : IsObliviousEnv (evalEnv f hf)
Type uses (2)
Body uses (2)
Actions: Source ยท Open Issue
Proof
by unfold evalEnv; infer_instance
Dependency graph
Type dependencies (2)
IsObliviousEnv๐
Learning.IsObliviousEnvAn environment is oblivious if the distribution of the next feedback depends only on the last action and not on the past history.
Learning.IsObliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) : PropLearning.IsObliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) : Prop
Code
class IsObliviousEnv (env : Environment ๐ ๐จ) : Prop where
exists_eq_prodMkLeft : โ ฮฝ : โ โ Kernel ๐ ๐จ, (โ n, IsMarkovKernel (ฮฝ n)) โง
(env.ฮฝ0 = ฮฝ 0) โง (โ n, env.feedback n = (ฮฝ (n + 1)).prodMkLeft _)Type uses (1)
Used by (13)
Actions: Source ยท Open Issue
evalEnv๐
Learning.evalEnv
The evaluation environment where the feedback is given by evaluating a fixed measurable function
f at the chosen action.
Learning.evalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (f : ๐ โ ๐จ) (hf : Measurable f) : Environment ๐ ๐จLearning.evalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (f : ๐ โ ๐จ) (hf : Measurable f) : Environment ๐ ๐จ
Code
noncomputable def evalEnv (f : ๐ โ ๐จ) (hf : Measurable f) := onlineEvalEnv (fun _ โฆ f) (fun _ โฆ hf)
Type uses (1)
Body uses (1)
Used by (9)
Actions: Source ยท Open Issue
All dependencies, transitively (4)
Environment๐
Learning.EnvironmentA stochastic environment.
Learning.Environment.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)Learning.Environment.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)
Code
structure Environment (๐ ๐จ : Type*) [MeasurableSpace ๐] [MeasurableSpace ๐จ] where /-- Distribution of the next observation as function of the past history. -/ feedback : (n : โ) โ Kernel ((Iic n โ ๐ ร ๐จ) ร ๐) ๐จ /-- The feedback kernels are Markov kernels. -/ [h_feedback : โ n, IsMarkovKernel (feedback n)] /-- Distribution of the first observation given the first action. -/ ฮฝ0 : Kernel ๐ ๐จ /-- The initial observation kernel is a Markov kernel. -/ [hp0 : IsMarkovKernel ฮฝ0]
Actions: Source ยท Open Issue
obliviousEnv๐
Learning.obliviousEnvAn oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner.
Learning.obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment ๐ ๐จLearning.obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment ๐ ๐จ
Code
def obliviousEnv (ฮฝ : โ โ Kernel ๐ ๐จ) [โ n, IsMarkovKernel (ฮฝ n)] : Environment ๐ ๐จ where feedback n := (ฮฝ (n + 1)).prodMkLeft _ ฮฝ0 := ฮฝ 0
Type uses (1)
Used by (10)
Actions: Source ยท Open Issue
onlineEvalEnv๐
Learning.onlineEvalEnv
The evaluation environment where the feedback is given by evaluating a fixed measurable function
f at the chosen action.
Learning.onlineEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (g : โ โ ๐ โ ๐จ) (hg : โ (n : โ), Measurable (g n)) : Environment ๐ ๐จLearning.onlineEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (g : โ โ ๐ โ ๐จ) (hg : โ (n : โ), Measurable (g n)) : Environment ๐ ๐จ
Code
noncomputable def onlineEvalEnv (g : โ โ ๐ โ ๐จ) (hg : โ n, Measurable (g n)) := obliviousEnv (fun n โฆ Kernel.deterministic (g n) (hg n))
Type uses (1)
Body uses (1)
Used by (11)
Actions: Source ยท Open Issue
instIsObliviousEnvOnlineEvalEnv๐
Learning.instIsObliviousEnvOnlineEvalEnvNo docstring.
Learning.instIsObliviousEnvOnlineEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {g : โ โ ๐ โ ๐จ} {hg : โ (n : โ), Measurable (g n)} : IsObliviousEnv (onlineEvalEnv g hg)Learning.instIsObliviousEnvOnlineEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {g : โ โ ๐ โ ๐จ} {hg : โ (n : โ), Measurable (g n)} : IsObliviousEnv (onlineEvalEnv g hg)
Code
instance : IsObliviousEnv (onlineEvalEnv g hg)
Type uses (2)
Body uses (1)
Used by (3)
Actions: Source ยท Open Issue
Proof
โจโจfun n โฆ Kernel.deterministic (g n) (hg n), fun _ โฆ inferInstance, rfl, fun _ โฆ rflโฉโฉ