Function evaluation environments #
We define two environments, onlineEvalEnv and evalEnv, where the feedback is given by evaluating
a measurable function at the chosen action. The first one allows the function to change at every
time step, while the second one uses a fixed function at every time step.
Main definitions #
onlineEvalEnv g hg: A stationary environment where the feedback at timenis given by a deterministic kernel that evaluates the measurable functiong nat the chosen action.evalEnv f hf: A stationary environment where the feedback is given by a deterministic kernel that evaluates a fixed measurable functionfat the chosen action.
They both satisfy the typeclasses IsObliviousEnv and IsDeterministicEnv.
Main statements #
forall_feedback_onlineEvalEnv_ae_eq_eval_action: For almost allฯ, the feedback at timenis equal tog nevaluated at the action taken at timen.forall_feedback_evalEnv_ae_eq_eval_action: For almost allฯ, the feedback at timenis equal tofevaluated at the action taken at timen.
noncomputable def
Learning.onlineEvalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
(g : โ โ ๐ โ ๐จ)
(hg : โ (n : โ), Measurable (g n))
:
Environment ๐ ๐จ
The evaluation environment where the feedback is given by evaluating a fixed measurable function
f at the chosen action.
Equations
- Learning.onlineEvalEnv g hg = Learning.obliviousEnv fun (n : โ) => ProbabilityTheory.Kernel.deterministic (g n) โฏ
Instances For
instance
Learning.instIsObliviousEnvOnlineEvalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{g : โ โ ๐ โ ๐จ}
{hg : โ (n : โ), Measurable (g n)}
:
IsObliviousEnv (onlineEvalEnv g hg)
instance
Learning.instIsDeterministicEnvOnlineEvalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{g : โ โ ๐ โ ๐จ}
{hg : โ (n : โ), Measurable (g n)}
:
IsDeterministicEnv (onlineEvalEnv g hg)
@[simp]
theorem
Learning.feedbackCondAction_onlineEvalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{g : โ โ ๐ โ ๐จ}
{hg : โ (n : โ), Measurable (g n)}
(n : โ)
:
@[simp]
theorem
Learning.feedbackFunZero_onlineEvalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{g : โ โ ๐ โ ๐จ}
{hg : โ (n : โ), Measurable (g n)}
[MeasurableSpace.SeparatesPoints ๐จ]
:
@[simp]
theorem
Learning.feedbackFun_onlineEvalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{g : โ โ ๐ โ ๐จ}
{hg : โ (n : โ), Measurable (g n)}
[MeasurableSpace.SeparatesPoints ๐จ]
(n : โ)
:
feedbackFun (onlineEvalEnv g hg) n = fun (p : (โฅ(Finset.Iic n) โ ๐ ร ๐จ) ร ๐) => g (n + 1) p.2
theorem
Learning.hascondDistrib_feedback_onlineEvalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
[StandardBorelSpace ๐]
[Nonempty ๐]
[StandardBorelSpace ๐จ]
[Nonempty ๐จ]
{ฮฉ : Type u_3}
{mฮฉ : MeasurableSpace ฮฉ}
{alg : Algorithm ๐ ๐จ}
{g : โ โ ๐ โ ๐จ}
{hg : โ (n : โ), Measurable (g n)}
{P : MeasureTheory.Measure ฮฉ}
[MeasureTheory.IsProbabilityMeasure P]
{A : โ โ ฮฉ โ ๐}
{Y : โ โ ฮฉ โ ๐จ}
(h : IsAlgEnvSeq A Y alg (onlineEvalEnv g hg) P)
(n : โ)
:
ProbabilityTheory.HasCondDistrib (Y n) (A n) (ProbabilityTheory.Kernel.deterministic (g n) โฏ) P
theorem
Learning.feedback_onlineEvalEnv_ae_eq_eval_action
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
[StandardBorelSpace ๐]
[Nonempty ๐]
[StandardBorelSpace ๐จ]
[Nonempty ๐จ]
{ฮฉ : Type u_3}
{mฮฉ : MeasurableSpace ฮฉ}
{alg : Algorithm ๐ ๐จ}
{g : โ โ ๐ โ ๐จ}
{hg : โ (n : โ), Measurable (g n)}
{P : MeasureTheory.Measure ฮฉ}
[MeasureTheory.IsProbabilityMeasure P]
{A : โ โ ฮฉ โ ๐}
{Y : โ โ ฮฉ โ ๐จ}
(h : IsAlgEnvSeq A Y alg (onlineEvalEnv g hg) P)
(n : โ)
:
theorem
Learning.forall_feedback_onlineEvalEnv_ae_eq_eval_action
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
[StandardBorelSpace ๐]
[Nonempty ๐]
[StandardBorelSpace ๐จ]
[Nonempty ๐จ]
{ฮฉ : Type u_3}
{mฮฉ : MeasurableSpace ฮฉ}
{alg : Algorithm ๐ ๐จ}
{g : โ โ ๐ โ ๐จ}
{hg : โ (n : โ), Measurable (g n)}
{P : MeasureTheory.Measure ฮฉ}
[MeasureTheory.IsProbabilityMeasure P]
{A : โ โ ฮฉ โ ๐}
{Y : โ โ ฮฉ โ ๐จ}
(h : IsAlgEnvSeq A Y alg (onlineEvalEnv g hg) P)
:
noncomputable def
Learning.evalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
(f : ๐ โ ๐จ)
(hf : Measurable f)
:
Environment ๐ ๐จ
The evaluation environment where the feedback is given by evaluating a fixed measurable function
f at the chosen action.
Equations
- Learning.evalEnv f hf = Learning.onlineEvalEnv (fun (x : โ) => f) โฏ
Instances For
instance
Learning.instIsObliviousEnvEvalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{f : ๐ โ ๐จ}
{hf : Measurable f}
:
IsObliviousEnv (evalEnv f hf)
instance
Learning.instIsDeterministicEnvEvalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{f : ๐ โ ๐จ}
{hf : Measurable f}
:
IsDeterministicEnv (evalEnv f hf)
@[simp]
theorem
Learning.feedbackCondAction_evalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{f : ๐ โ ๐จ}
{hf : Measurable f}
(n : โ)
:
@[simp]
theorem
Learning.feedbackFunZero_evalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{f : ๐ โ ๐จ}
{hf : Measurable f}
[MeasurableSpace.SeparatesPoints ๐จ]
:
@[simp]
theorem
Learning.feedbackFun_evalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
{f : ๐ โ ๐จ}
{hf : Measurable f}
[MeasurableSpace.SeparatesPoints ๐จ]
(n : โ)
:
theorem
Learning.hascondDistrib_feedback_evalEnv
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
[StandardBorelSpace ๐]
[Nonempty ๐]
[StandardBorelSpace ๐จ]
[Nonempty ๐จ]
{ฮฉ : Type u_3}
{mฮฉ : MeasurableSpace ฮฉ}
{alg : Algorithm ๐ ๐จ}
{f : ๐ โ ๐จ}
{hf : Measurable f}
{P : MeasureTheory.Measure ฮฉ}
[MeasureTheory.IsProbabilityMeasure P]
{A : โ โ ฮฉ โ ๐}
{Y : โ โ ฮฉ โ ๐จ}
(h : IsAlgEnvSeq A Y alg (evalEnv f hf) P)
(n : โ)
:
ProbabilityTheory.HasCondDistrib (Y n) (A n) (ProbabilityTheory.Kernel.deterministic f hf) P
theorem
Learning.feedback_evalEnv_ae_eq_eval_action
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
[StandardBorelSpace ๐]
[Nonempty ๐]
[StandardBorelSpace ๐จ]
[Nonempty ๐จ]
{ฮฉ : Type u_3}
{mฮฉ : MeasurableSpace ฮฉ}
{alg : Algorithm ๐ ๐จ}
{f : ๐ โ ๐จ}
{hf : Measurable f}
{P : MeasureTheory.Measure ฮฉ}
[MeasureTheory.IsProbabilityMeasure P]
{A : โ โ ฮฉ โ ๐}
{Y : โ โ ฮฉ โ ๐จ}
(h : IsAlgEnvSeq A Y alg (evalEnv f hf) P)
(n : โ)
:
theorem
Learning.forall_feedback_evalEnv_ae_eq_eval_action
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
[StandardBorelSpace ๐]
[Nonempty ๐]
[StandardBorelSpace ๐จ]
[Nonempty ๐จ]
{ฮฉ : Type u_3}
{mฮฉ : MeasurableSpace ฮฉ}
{alg : Algorithm ๐ ๐จ}
{f : ๐ โ ๐จ}
{hf : Measurable f}
{P : MeasureTheory.Measure ฮฉ}
[MeasureTheory.IsProbabilityMeasure P]
{A : โ โ ฮฉ โ ๐}
{Y : โ โ ฮฉ โ ๐จ}
(h : IsAlgEnvSeq A Y alg (evalEnv f hf) P)
:
theorem
Learning.feedback_evalEnv_ae_eq_eval_action_comp
{๐ : Type u_1}
{๐จ : Type u_2}
{m๐ : MeasurableSpace ๐}
{m๐จ : MeasurableSpace ๐จ}
[StandardBorelSpace ๐]
[Nonempty ๐]
[StandardBorelSpace ๐จ]
[Nonempty ๐จ]
{ฮฉ : Type u_3}
{mฮฉ : MeasurableSpace ฮฉ}
{alg : Algorithm ๐ ๐จ}
{f : ๐ โ ๐จ}
{hf : Measurable f}
{P : MeasureTheory.Measure ฮฉ}
[MeasureTheory.IsProbabilityMeasure P]
{A : โ โ ฮฉ โ ๐}
{Y : โ โ ฮฉ โ ๐จ}
{ฮฒ : Type u_4}
(h : IsAlgEnvSeq A Y alg (evalEnv f hf) P)
{n : โ}
(g : (โฅ(Finset.Iic n) โ ๐จ) โ ฮฒ)
:
โแต (ฯ : ฮฉ) โP, (g fun (i : โฅ(Finset.Iic n)) => Y (โi) ฯ) = g fun (i : โฅ(Finset.Iic n)) => f (A (โi) ฯ)