LeanMachineLearning exposition

Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action๐Ÿ”—

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

condIndepFun_feedback_history_action_action๐Ÿ”—

LemmaLearning.IsObliviousEnv.condIndepFun_feedback_history_action_action

No docstring.

๐Ÿ”—theorem
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (fun ฯ‰ => (history A Y n ฯ‰, A (n + 1) ฯ‰)) P
Learning.IsObliviousEnv.condIndepFun_feedback_history_action_action.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {ฮฉ : Type u_3} {mฮฉ : MeasurableSpace ฮฉ} {alg : Algorithm ๐“ ๐“จ} {env : Environment ๐“ ๐“จ} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsFiniteMeasure P] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {Y : โ„• โ†’ ฮฉ โ†’ ๐“จ} [StandardBorelSpace ๐“] [Nonempty ๐“] [StandardBorelSpace ๐“จ] [Nonempty ๐“จ] [StandardBorelSpace ฮฉ] [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) : ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) โ‹ฏ (Y (n + 1)) (fun ฯ‰ => (history A Y n ฯ‰, A (n + 1) ฯ‰)) P

Code

lemma condIndepFun_feedback_history_action_action [StandardBorelSpace ฮฉ]
    [IsObliviousEnv env] (h : IsAlgEnvSeq A Y alg env P) (n : โ„•) :
    Y (n + 1) โŸ‚แตข[A (n + 1), h.measurable_action (n + 1); P]
      (fun ฯ‰ โ†ฆ (history A Y n ฯ‰, A (n + 1) ฯ‰))
Type uses (5)
Body uses (3)
Used by (2)

Actions: Source ยท Open Issue

Proof
by
  have h_indep : Y (n + 1) โŸ‚แตข[A (n + 1), h.measurable_action (n + 1); P] history A Y n :=
    condIndepFun_feedback_history_action h n
  have hA := h.measurable_action
  have hY := h.measurable_feedback
  exact h_indep.prod_right (by fun_prop) (by fun_prop) (by fun_prop)

Dependency graph

Type dependencies (5)

Algorithm๐Ÿ”—

StructureLearning.Algorithm

A stochastic, sequential algorithm.

๐Ÿ”—structure
Learning.Algorithm.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)
Learning.Algorithm.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)

Code

structure Algorithm (๐“ ๐“จ : Type*) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] where
  /-- Policy or sampling rule: distribution of the next action. -/
  policy : (n : โ„•) โ†’ Kernel (Iic n โ†’ ๐“ ร— ๐“จ) ๐“
  /-- The policy is a Markov kernel. -/
  [h_policy : โˆ€ n, IsMarkovKernel (policy n)]
  /-- Distribution of the first action. -/
  p0 : Measure ๐“
  /-- The first action distribution is a probability measure. -/
  [hp0 : IsProbabilityMeasure p0]
Used by (216)

Actions: Source ยท Open Issue

Environment๐Ÿ”—

StructureLearning.Environment

A stochastic environment.

๐Ÿ”—structure
Learning.Environment.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)
Learning.Environment.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)

Code

structure Environment (๐“ ๐“จ : Type*) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] where
  /-- Distribution of the next observation as function of the past history. -/
  feedback : (n : โ„•) โ†’ Kernel ((Iic n โ†’ ๐“ ร— ๐“จ) ร— ๐“) ๐“จ
  /-- The feedback kernels are Markov kernels. -/
  [h_feedback : โˆ€ n, IsMarkovKernel (feedback n)]
  /-- Distribution of the first observation given the first action. -/
  ฮฝ0 : Kernel ๐“ ๐“จ
  /-- The initial observation kernel is a Markov kernel. -/
  [hp0 : IsMarkovKernel ฮฝ0]
Used by (128)

Actions: Source ยท Open Issue

IsObliviousEnv๐Ÿ”—

Type ClassLearning.IsObliviousEnv

An environment is oblivious if the distribution of the next feedback depends only on the last action and not on the past history.

๐Ÿ”—type class
Learning.IsObliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) : Prop
Learning.IsObliviousEnv.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) : Prop

Code

class IsObliviousEnv (env : Environment ๐“ ๐“จ) : Prop where
  exists_eq_prodMkLeft : โˆƒ ฮฝ : โ„• โ†’ Kernel ๐“ ๐“จ, (โˆ€ n, IsMarkovKernel (ฮฝ n)) โˆง
    (env.ฮฝ0 = ฮฝ 0) โˆง (โˆ€ n, env.feedback n = (ฮฝ (n + 1)).prodMkLeft _)
Type uses (1)
Used by (13)

Actions: Source ยท Open Issue

IsAlgEnvSeq๐Ÿ”—

StructureLearning.IsAlgEnvSeq

An algorithm-environment sequence: a sequence of actions and feedbacks generated by an algorithm interacting with an environment.

๐Ÿ”—structure
Learning.IsAlgEnvSeq.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {ฮฉ : Type u_3} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {mฮฉ : MeasurableSpace ฮฉ} (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) (P : MeasureTheory.Measure ฮฉ) [MeasureTheory.IsFiniteMeasure P] : Prop
Learning.IsAlgEnvSeq.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {ฮฉ : Type u_3} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} {mฮฉ : MeasurableSpace ฮฉ} (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) (P : MeasureTheory.Measure ฮฉ) [MeasureTheory.IsFiniteMeasure P] : Prop

Code

structure IsAlgEnvSeq
    (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ)
    (P : Measure ฮฉ) [IsFiniteMeasure P] : Prop where
  /-- The action sequence is measurable. -/
  measurable_action n : Measurable (A n) := by fun_prop
  /-- The feedback sequence is measurable. -/
  measurable_feedback n : Measurable (Y n) := by fun_prop
  /-- The first action has the correct law. -/
  hasLaw_action_zero : HasLaw (fun ฯ‰ โ†ฆ (A 0 ฯ‰)) alg.p0 P
  /-- The first feedback has the correct conditional distribution. -/
  hasCondDistrib_feedback_zero : HasCondDistrib (Y 0) (A 0) env.ฮฝ0 P
  /-- The next action has the correct conditional distribution given the history. -/
  hasCondDistrib_action n :
    HasCondDistrib (A (n + 1)) (history A Y n) (alg.policy n) P
  /-- The next feedback has the correct conditional distribution given the history and
  next action. -/
  hasCondDistrib_feedback n :
    HasCondDistrib (Y (n + 1)) (fun ฯ‰ โ†ฆ (history A Y n ฯ‰, A (n + 1) ฯ‰))
      (env.feedback n) P
Type uses (3)
Used by (111)

Actions: Source ยท Open Issue

history๐Ÿ”—

DefinitionLearning.history

History of the algorithm-environment sequence up to time n.

๐Ÿ”—def
Learning.history.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {ฮฉ : Type u_3} (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (n : โ„•) (ฯ‰ : ฮฉ) : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ
Learning.history.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {ฮฉ : Type u_3} (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (n : โ„•) (ฯ‰ : ฮฉ) : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ

Code

def history (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (n : โ„•) (ฯ‰ : ฮฉ) : Iic n โ†’ ๐“ ร— ๐“จ :=
  fun i โ†ฆ (A i ฯ‰, Y i ฯ‰)
Used by (72)

Actions: Source ยท Open Issue