3.9.Β SequentialLearning.AlgorithmDensity
Algorithm density
We define a density function that allows obtaining the law of the history under one algorithm from the law of the history under another algorithm when they are interacting with the same environment. This also requires one algorithm to be absolutely continuous with respect to another, a concept that we also introduce here.
Main definitions
-
AbsolutelyContinuous alg algβ:algis absolutely continuous with respect toalgβ(also denotedalg βͺβ algβ) when, in every situation, a set of actions with probability zero underalgβalso has probability zero underalg. Intuitively,algnever acts in a way thatalgβwould never act. -
density alg algβ n: a density function that allows obtaining the law of the history at timenunderalgfrom the law of the history at timenunderalgβwhen they are interacting with the same environment andalg βͺβ algβ.
Main results
-
absolutelyContinuous_map_hist: the law of the history at timenunderalgis absolutely continuous with respect to the law of the history at timenunderalgβwhen they are interacting with the same environment andalg βͺβ algβ. -
hasLaw_history_withDensity: the law of the history at timenunderalgis the law of the history at timenunderalgβwith densityalg.density algβ nwhen they are interacting with the same environment andalg βͺβ algβ.
Module LeanMachineLearning.SequentialLearning.AlgorithmDensity contains 6 exposed declarations.
AbsolutelyContinuousπ
Learning.Algorithm.AbsolutelyContinuous
For every time and history, the distribution over actions according to alg is absolutely
continuous with respect to the distribution over actions according to algβ.
Learning.Algorithm.AbsolutelyContinuous.{u_1, u_2} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] (alg algβ : Algorithm π π¨) : PropLearning.Algorithm.AbsolutelyContinuous.{u_1, u_2} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] (alg algβ : Algorithm π π¨) : Prop
Code
structure AbsolutelyContinuous (alg algβ : Algorithm π π¨) : Prop where p0 : alg.p0 βͺ algβ.p0 policy n h : alg.policy n h βͺ algβ.policy n h
Type uses (1)
Used by (7)
Actions: Source Β· Open Issue
term_βͺβ_π
Learning.Algorithm.Β«term_βͺβ_Β»
For every time and history, the distribution over actions according to alg is absolutely
continuous with respect to the distribution over actions according to algβ.
Learning.Algorithm.Β«term_βͺβ_Β» : Lean.TrailingParserDescrLearning.Algorithm.Β«term_βͺβ_Β» : Lean.TrailingParserDescr
Code
scoped notation:50 alg " βͺβ " algβ => AbsolutelyContinuous alg algβ
Body uses (1)
Actions: Source Β· Open Issue
densityπ
Learning.Algorithm.density
If the algorithm alg is absolutely continuous with respect to the algorithm algβ and they
are both interacting with the same environment, then the law of the history at time n under alg
is the law of the history at time n under algβ with density alg.density algβ n.
Learning.Algorithm.density.{u_1, u_2} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] [MeasurableSpace.CountablyGenerated π] (alg algβ : Algorithm π π¨) (n : β) : (β₯(Finset.Iic n) β π Γ π¨) β ENNRealLearning.Algorithm.density.{u_1, u_2} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] [MeasurableSpace.CountablyGenerated π] (alg algβ : Algorithm π π¨) (n : β) : (β₯(Finset.Iic n) β π Γ π¨) β ENNReal
Code
noncomputable
def density [MeasurableSpace.CountablyGenerated π] (alg algβ : Algorithm π π¨) :
(n : β) β (Iic n β π Γ π¨) β ββ₯0β
| 0, h => (alg.p0.rnDeriv algβ.p0 (h β¨0, by simpβ©).1)
| n + 1, h =>
let p := MeasurableEquiv.IicSuccProd (fun _ β¦ π Γ π¨) n h
alg.density algβ n p.1 * (alg.policy n).rnDeriv (algβ.policy n) p.1 p.2.1Type uses (1)
Body uses (1)
Used by (5)
Actions: Source Β· Open Issue
measurable_densityπ
Learning.Algorithm.measurable_densityNo docstring.
Learning.Algorithm.measurable_density.{u_1, u_2} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] [MeasurableSpace.CountablyGenerated π] (alg algβ : Algorithm π π¨) (n : β) : Measurable (density alg algβ n)Learning.Algorithm.measurable_density.{u_1, u_2} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] [MeasurableSpace.CountablyGenerated π] (alg algβ : Algorithm π π¨) (n : β) : Measurable (density alg algβ n)
Code
lemma measurable_density [MeasurableSpace.CountablyGenerated π] (alg algβ : Algorithm π π¨) (n : β) :
Measurable (alg.density algβ n)Body uses (1)
Used by (4)
Actions: Source Β· Open Issue
Proof
by induction n with | zero => simp_rw [density]; fun_prop | succ n ih => simp_rw [density]; fun_prop
absolutelyContinuous_map_historyπ
Learning.IsAlgEnvSeq.absolutelyContinuous_map_historyNo docstring.
Learning.IsAlgEnvSeq.absolutelyContinuous_map_history.{u_1, u_2, u_3, u_4} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] {Ξ© : Type u_3} [MeasurableSpace Ξ©] {alg : Algorithm π π¨} {env : Environment π π¨} {A : β β Ξ© β π} {Y : β β Ξ© β π¨} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsFiniteMeasure P] {Ξ©β : Type u_4} [MeasurableSpace Ξ©β] {algβ : Algorithm π π¨} {Aβ : β β Ξ©β β π} {Yβ : β β Ξ©β β π¨} {Pβ : MeasureTheory.Measure Ξ©β} [MeasureTheory.IsProbabilityMeasure Pβ] (h : IsAlgEnvSeq A Y alg env P) (hβ : IsAlgEnvSeq Aβ Yβ algβ env Pβ) (hc : Algorithm.AbsolutelyContinuous alg algβ) (n : β) : MeasureTheory.Measure.AbsolutelyContinuous (MeasureTheory.Measure.map (history A Y n) P) (MeasureTheory.Measure.map (history Aβ Yβ n) Pβ)Learning.IsAlgEnvSeq.absolutelyContinuous_map_history.{u_1, u_2, u_3, u_4} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] {Ξ© : Type u_3} [MeasurableSpace Ξ©] {alg : Algorithm π π¨} {env : Environment π π¨} {A : β β Ξ© β π} {Y : β β Ξ© β π¨} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsFiniteMeasure P] {Ξ©β : Type u_4} [MeasurableSpace Ξ©β] {algβ : Algorithm π π¨} {Aβ : β β Ξ©β β π} {Yβ : β β Ξ©β β π¨} {Pβ : MeasureTheory.Measure Ξ©β} [MeasureTheory.IsProbabilityMeasure Pβ] (h : IsAlgEnvSeq A Y alg env P) (hβ : IsAlgEnvSeq Aβ Yβ algβ env Pβ) (hc : Algorithm.AbsolutelyContinuous alg algβ) (n : β) : MeasureTheory.Measure.AbsolutelyContinuous (MeasureTheory.Measure.map (history A Y n) P) (MeasureTheory.Measure.map (history Aβ Yβ n) Pβ)
Code
lemma absolutelyContinuous_map_history (h : IsAlgEnvSeq A Y alg env P)
(hβ : IsAlgEnvSeq Aβ Yβ algβ env Pβ) (hc : alg βͺβ algβ) (n : β) :
P.map (history A Y n) βͺ Pβ.map (history Aβ Yβ n)Type uses (5)
Body uses (13)
Actions: Source Β· Open Issue
Proof
by
induction n with
| zero =>
rw [h.hasLaw_history_zero.map_eq, hβ.hasLaw_history_zero.map_eq]
apply Measure.AbsolutelyContinuous.map _ (by fun_prop)
rw [h.hasLaw_step_zero.map_eq, hβ.hasLaw_step_zero.map_eq]
exact Measure.AbsolutelyContinuous.compProd_left hc.p0 _
| succ n ih =>
simp_rw [history_succ]
rw [β Measure.map_map (by fun_prop), β Measure.map_map (by fun_prop)]
rotate_left
Β· exact (hβ.measurable_history n).prodMk (hβ.measurable_step (n + 1))
Β· exact (h.measurable_history n).prodMk (h.measurable_step (n + 1))
apply Measure.AbsolutelyContinuous.map _ (by fun_prop)
rw [(h.hasCondDistrib_step n).map_eq, (hβ.hasCondDistrib_step n).map_eq]
apply Measure.AbsolutelyContinuous.compProd ih
filter_upwards with h' using Measure.AbsolutelyContinuous.compProd_left_apply (hc.policy n h') _
hasLaw_history_withDensityπ
Learning.IsAlgEnvSeq.hasLaw_history_withDensityNo docstring.
Learning.IsAlgEnvSeq.hasLaw_history_withDensity.{u_1, u_2, u_3, u_4} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] {Ξ© : Type u_3} [MeasurableSpace Ξ©] {alg : Algorithm π π¨} {env : Environment π π¨} {A : β β Ξ© β π} {Y : β β Ξ© β π¨} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsFiniteMeasure P] {Ξ©β : Type u_4} [MeasurableSpace Ξ©β] {algβ : Algorithm π π¨} {Aβ : β β Ξ©β β π} {Yβ : β β Ξ©β β π¨} {Pβ : MeasureTheory.Measure Ξ©β} [MeasureTheory.IsProbabilityMeasure Pβ] [MeasurableSpace.CountablyGenerated π] (h : IsAlgEnvSeq A Y alg env P) (hβ : IsAlgEnvSeq Aβ Yβ algβ env Pβ) (hc : Algorithm.AbsolutelyContinuous alg algβ) (n : β) : ProbabilityTheory.HasLaw (history A Y n) (MeasureTheory.Measure.withDensity (MeasureTheory.Measure.map (history Aβ Yβ n) Pβ) (Algorithm.density alg algβ n)) PLearning.IsAlgEnvSeq.hasLaw_history_withDensity.{u_1, u_2, u_3, u_4} {π : Type u_1} {π¨ : Type u_2} [MeasurableSpace π] [MeasurableSpace π¨] {Ξ© : Type u_3} [MeasurableSpace Ξ©] {alg : Algorithm π π¨} {env : Environment π π¨} {A : β β Ξ© β π} {Y : β β Ξ© β π¨} {P : MeasureTheory.Measure Ξ©} [MeasureTheory.IsFiniteMeasure P] {Ξ©β : Type u_4} [MeasurableSpace Ξ©β] {algβ : Algorithm π π¨} {Aβ : β β Ξ©β β π} {Yβ : β β Ξ©β β π¨} {Pβ : MeasureTheory.Measure Ξ©β} [MeasureTheory.IsProbabilityMeasure Pβ] [MeasurableSpace.CountablyGenerated π] (h : IsAlgEnvSeq A Y alg env P) (hβ : IsAlgEnvSeq Aβ Yβ algβ env Pβ) (hc : Algorithm.AbsolutelyContinuous alg algβ) (n : β) : ProbabilityTheory.HasLaw (history A Y n) (MeasureTheory.Measure.withDensity (MeasureTheory.Measure.map (history Aβ Yβ n) Pβ) (Algorithm.density alg algβ n)) P
Code
lemma hasLaw_history_withDensity (h : IsAlgEnvSeq A Y alg env P)
(hβ : IsAlgEnvSeq Aβ Yβ algβ env Pβ) (hc : alg βͺβ algβ) (n : β) : HasLaw (history A Y n)
((Pβ.map (history Aβ Yβ n)).withDensity (alg.density algβ n)) P where
aemeasurableType uses (6)
Body uses (20)
Actions: Source Β· Open Issue
Proof
(h.measurable_history n).aemeasurable
map_eq := by
induction n with
| zero =>
rw [h.hasLaw_history_zero.map_eq, hβ.hasLaw_history_zero.map_eq, h.hasLaw_step_zero.map_eq,
hβ.hasLaw_step_zero.map_eq]
rw [β Measure.withDensity_rnDeriv_eq _ _ hc.p0,
Measure.compProd_withDensity_left (by fun_prop)]
exact map_equiv_withDensity (by fun_prop)
| succ n ih =>
let Ο h' (ar : π Γ π¨) := Kernel.rnDeriv (alg.policy n) (algβ.policy n) h' ar.1
have hs : stepKernel alg env n = (stepKernel algβ env n).withDensity Ο := by
rw [stepKernel, β Kernel.withDensity_rnDeriv_eq' (hc.policy n)]
exact Kernel.compProd_withDensity_left (Kernel.measurable_rnDeriv _ _)
have : IsMarkovKernel ((stepKernel algβ env n).withDensity Ο) := by
rw [β hs]
infer_instance
simp_rw [history_succ]
rw [β Measure.map_map (by fun_prop), β Measure.map_map (by fun_prop)]
rotate_left
Β· exact (hβ.measurable_history n).prodMk (hβ.measurable_step (n + 1))
Β· exact (h.measurable_history n).prodMk (h.measurable_step (n + 1))
rw [(h.hasCondDistrib_step n).map_eq, (hβ.hasCondDistrib_step n).map_eq, ih, hs,
Measure.compProd_withDensity_withDensity (by fun_prop) (by fun_prop)]
exact map_equiv_withDensity (by fun_prop)