Learning.IT.hasCondDistrib_feedback_zero
This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.
hasCondDistrib_feedback_zero๐
Learning.IT.hasCondDistrib_feedback_zeroNo docstring.
Learning.IT.hasCondDistrib_feedback_zero.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) : ProbabilityTheory.HasCondDistrib (feedback 0) (action 0) (Environment.ฮฝ0 env) (trajMeasure alg env)Learning.IT.hasCondDistrib_feedback_zero.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) : ProbabilityTheory.HasCondDistrib (feedback 0) (action 0) (Environment.ฮฝ0 env) (trajMeasure alg env)
Code
lemma hasCondDistrib_feedback_zero (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) :
HasCondDistrib (feedback 0) (action 0) env.ฮฝ0 (trajMeasure alg env)Type uses (5)
Used by (2)
Actions: Source ยท Open Issue
Proof
by have h_step := (hasLaw_step_zero alg env).map_eq have h_action := (hasLaw_action_zero alg env).map_eq exact โจby fun_prop, by rwa [h_action]โฉ
Dependency graph
Type dependencies (5)
Algorithm๐
Learning.AlgorithmA stochastic, sequential algorithm.
Learning.Algorithm.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)Learning.Algorithm.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)
Code
structure Algorithm (๐ ๐จ : Type*) [MeasurableSpace ๐] [MeasurableSpace ๐จ] where /-- Policy or sampling rule: distribution of the next action. -/ policy : (n : โ) โ Kernel (Iic n โ ๐ ร ๐จ) ๐ /-- The policy is a Markov kernel. -/ [h_policy : โ n, IsMarkovKernel (policy n)] /-- Distribution of the first action. -/ p0 : Measure ๐ /-- The first action distribution is a probability measure. -/ [hp0 : IsProbabilityMeasure p0]
Used by (216)
Actions: Source ยท Open Issue
Environment๐
Learning.EnvironmentA stochastic environment.
Learning.Environment.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)Learning.Environment.{u_4, u_5} (๐ : Type u_4) (๐จ : Type u_5) [MeasurableSpace ๐] [MeasurableSpace ๐จ] : Type (max u_4 u_5)
Code
structure Environment (๐ ๐จ : Type*) [MeasurableSpace ๐] [MeasurableSpace ๐จ] where /-- Distribution of the next observation as function of the past history. -/ feedback : (n : โ) โ Kernel ((Iic n โ ๐ ร ๐จ) ร ๐) ๐จ /-- The feedback kernels are Markov kernels. -/ [h_feedback : โ n, IsMarkovKernel (feedback n)] /-- Distribution of the first observation given the first action. -/ ฮฝ0 : Kernel ๐ ๐จ /-- The initial observation kernel is a Markov kernel. -/ [hp0 : IsMarkovKernel ฮฝ0]
Actions: Source ยท Open Issue
feedback๐
Learning.IT.feedback
feedback n is the feedback at time n. This is a random variable on the measurable space
โ โ ๐ ร ๐จ.
Learning.IT.feedback.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} (n : โ) (h : โ โ ๐ ร ๐จ) : ๐จLearning.IT.feedback.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} (n : โ) (h : โ โ ๐ ร ๐จ) : ๐จ
Code
def feedback (n : โ) (h : โ โ ๐ ร ๐จ) : ๐จ := (h n).2
Used by (16)
Actions: Source ยท Open Issue
action๐
Learning.IT.action
action n is the action pulled at time n. This is a random variable on the measurable space
โ โ ๐ ร ๐จ.
Learning.IT.action.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} (n : โ) (h : โ โ ๐ ร ๐จ) : ๐Learning.IT.action.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} (n : โ) (h : โ โ ๐ ร ๐จ) : ๐
Code
def action (n : โ) (h : โ โ ๐ ร ๐จ) : ๐ := (h n).1
Actions: Source ยท Open Issue
trajMeasure๐
Learning.trajMeasureMeasure on the sequence of actions and observations generated by the algorithm/environment.
Learning.trajMeasure.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) : MeasureTheory.Measure (โ โ ๐ ร ๐จ)Learning.trajMeasure.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) : MeasureTheory.Measure (โ โ ๐ ร ๐จ)
Code
noncomputable
def trajMeasure (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) :
Measure (โ โ ๐ ร ๐จ) :=
Kernel.trajMeasure (alg.p0 โโ env.ฮฝ0) (stepKernel alg env)
deriving IsProbabilityMeasureType uses (2)
Used by (19)
Actions: Source ยท Open Issue
All dependencies, transitively (4)
stepKernel๐
Learning.stepKernel
Kernel describing the distribution of the next action-feedback pair given the history
up to n.
Learning.stepKernel.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.Kernel (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (๐ ร ๐จ)Learning.stepKernel.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.Kernel (โฅ(Finset.Iic n) โ ๐ ร ๐จ) (๐ ร ๐จ)
Code
noncomputable
def stepKernel (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) :
Kernel (Iic n โ ๐ ร ๐จ) (๐ ร ๐จ) :=
alg.policy n โโ env.feedback n
deriving IsMarkovKernelType uses (2)
Actions: Source ยท Open Issue
instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy๐
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicyNo docstring.
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (Algorithm.policy alg n)Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (Algorithm.policy alg n)
Code
instance (alg : Algorithm ๐ ๐จ) (n : โ) : IsMarkovKernel (alg.policy n)
Type uses (1)
Actions: Source ยท Open Issue
Proof
alg.h_policy n
instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback๐
Learning.instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedbackNo docstring.
Learning.instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (Environment.feedback env n)Learning.instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (Environment.feedback env n)
Code
instance (env : Environment ๐ ๐จ) (n : โ) : IsMarkovKernel (env.feedback n)
Type uses (1)
Used by (5)
Actions: Source ยท Open Issue
Proof
env.h_feedback n
instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel๐
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernelNo docstring.
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (stepKernel alg env n)Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel.{u_1, u_2} {๐ : Type u_1} {๐จ : Type u_2} {m๐ : MeasurableSpace ๐} {m๐จ : MeasurableSpace ๐จ} (alg : Algorithm ๐ ๐จ) (env : Environment ๐ ๐จ) (n : โ) : ProbabilityTheory.IsMarkovKernel (stepKernel alg env n)
Code
deriving IsMarkovKernel
Type uses (3)
Body uses (2)
Used by (10)
Actions: Source ยท Open Issue
Proof
deriving IsMarkovKernel