Learning.feedbackCondAction_evalEnv
This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.
feedbackCondAction_evalEnv๐
Learning.feedbackCondAction_evalEnvNo docstring.
Learning.feedbackCondAction_evalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {f : ๐ โ ๐จ} {hf : Measurable f} (n : โ) : feedbackCondAction (evalEnv f hf) n = ProbabilityTheory.Kernel.deterministic f hfLearning.feedbackCondAction_evalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {f : ๐ โ ๐จ} {hf : Measurable f} (n : โ) : feedbackCondAction (evalEnv f hf) n = ProbabilityTheory.Kernel.deterministic f hf
Code
lemma feedbackCondAction_evalEnv (n : โ) :
feedbackCondAction (evalEnv f hf) n = Kernel.deterministic f hfType uses (3)
Body uses (1)
Used by (1)
Actions: Source ยท Open Issue
Proof
by simp [evalEnv]
Dependency graph
Type dependencies (3)
feedbackCondAction๐
Learning.feedbackCondAction
The kernel representing the conditional distribution of the feedback given the action
at time n in an oblivious environment.
Learning.feedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [h_obl : IsObliviousEnv env] (n : โ) : ProbabilityTheory.Kernel ๐ ๐จLearning.feedbackCondAction.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) [h_obl : IsObliviousEnv env] (n : โ) : ProbabilityTheory.Kernel ๐ ๐จ
Code
noncomputable def feedbackCondAction (env : Environment ๐ ๐จ) [h_obl : IsObliviousEnv env] (n : โ) : Kernel ๐ ๐จ := h_obl.exists_eq_prodMkLeft.choose n
Type uses (2)
Used by (12)
Actions: Source ยท Open Issue
evalEnv๐
Learning.evalEnv
The evaluation environment where the feedback is given by evaluating a fixed measurable function
f at the chosen action.
Learning.evalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (f : ๐ โ ๐จ) (hf : Measurable f) : Environment ๐ ๐จLearning.evalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (f : ๐ โ ๐จ) (hf : Measurable f) : Environment ๐ ๐จ
Code
noncomputable def evalEnv (f : ๐ โ ๐จ) (hf : Measurable f) := onlineEvalEnv (fun _ โฆ f) (fun _ โฆ hf)
Type uses (1)
Body uses (1)
Used by (9)
Actions: Source ยท Open Issue
instIsObliviousEnvEvalEnv๐
Learning.instIsObliviousEnvEvalEnvNo docstring.
Learning.instIsObliviousEnvEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {f : ๐ โ ๐จ} {hf : Measurable f} : IsObliviousEnv (evalEnv f hf)Learning.instIsObliviousEnvEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {f : ๐ โ ๐จ} {hf : Measurable f} : IsObliviousEnv (evalEnv f hf)
Code
instance : IsObliviousEnv (evalEnv f hf)
Type uses (2)
Body uses (2)
Actions: Source ยท Open Issue
Proof
by unfold evalEnv; infer_instance
All dependencies, transitively (5)
Environment๐
Learning.EnvironmentA stochastic environment.
Learning.Environment.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)Learning.Environment.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)
Code
structure Environment (๐ ๐จ : Type*) [MeasurableSpace ๐] [MeasurableSpace ๐จ] where /-- Distribution of the next observation as function of the past history. -/ feedback : (n : โ) โ Kernel ((Iic n โ ๐ ร ๐จ) ร ๐) ๐จ /-- The feedback kernels are Markov kernels. -/ [h_feedback : โ n, IsMarkovKernel (feedback n)] /-- Distribution of the first observation given the first action. -/ ฮฝ0 : Kernel ๐ ๐จ /-- The initial observation kernel is a Markov kernel. -/ [hp0 : IsMarkovKernel ฮฝ0]
Actions: Source ยท Open Issue
IsObliviousEnv๐
Learning.IsObliviousEnvAn environment is oblivious if the distribution of the next feedback depends only on the last action and not on the past history.
Learning.IsObliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) : PropLearning.IsObliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) : Prop
Code
class IsObliviousEnv (env : Environment ๐ ๐จ) : Prop where
exists_eq_prodMkLeft : โ ฮฝ : โ โ Kernel ๐ ๐จ, (โ n, IsMarkovKernel (ฮฝ n)) โง
(env.ฮฝ0 = ฮฝ 0) โง (โ n, env.feedback n = (ฮฝ (n + 1)).prodMkLeft _)Type uses (1)
Used by (13)
Actions: Source ยท Open Issue
obliviousEnv๐
Learning.obliviousEnvAn oblivious environment, in which the distribution of the next feedback depends only on the last action, but in a possibly time-dependent manner.
Learning.obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment ๐ ๐จLearning.obliviousEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (ฮฝ : โ โ ProbabilityTheory.Kernel ๐ ๐จ) [โ (n : โ), ProbabilityTheory.IsMarkovKernel (ฮฝ n)] : Environment ๐ ๐จ
Code
def obliviousEnv (ฮฝ : โ โ Kernel ๐ ๐จ) [โ n, IsMarkovKernel (ฮฝ n)] : Environment ๐ ๐จ where feedback n := (ฮฝ (n + 1)).prodMkLeft _ ฮฝ0 := ฮฝ 0
Type uses (1)
Used by (10)
Actions: Source ยท Open Issue
onlineEvalEnv๐
Learning.onlineEvalEnv
The evaluation environment where the feedback is given by evaluating a fixed measurable function
f at the chosen action.
Learning.onlineEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (g : โ โ ๐ โ ๐จ) (hg : โ (n : โ), Measurable (g n)) : Environment ๐ ๐จLearning.onlineEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (g : โ โ ๐ โ ๐จ) (hg : โ (n : โ), Measurable (g n)) : Environment ๐ ๐จ
Code
noncomputable def onlineEvalEnv (g : โ โ ๐ โ ๐จ) (hg : โ n, Measurable (g n)) := obliviousEnv (fun n โฆ Kernel.deterministic (g n) (hg n))
Type uses (1)
Body uses (1)
Used by (11)
Actions: Source ยท Open Issue
instIsObliviousEnvOnlineEvalEnv๐
Learning.instIsObliviousEnvOnlineEvalEnvNo docstring.
Learning.instIsObliviousEnvOnlineEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {g : โ โ ๐ โ ๐จ} {hg : โ (n : โ), Measurable (g n)} : IsObliviousEnv (onlineEvalEnv g hg)Learning.instIsObliviousEnvOnlineEvalEnv.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} {g : โ โ ๐ โ ๐จ} {hg : โ (n : โ), Measurable (g n)} : IsObliviousEnv (onlineEvalEnv g hg)
Code
instance : IsObliviousEnv (onlineEvalEnv g hg)
Type uses (2)
Body uses (1)
Used by (3)
Actions: Source ยท Open Issue
Proof
โจโจfun n โฆ Kernel.deterministic (g n) (hg n), fun _ โฆ inferInstance, rfl, fun _ โฆ rflโฉโฉ