LeanMachineLearning exposition

Bandits.TS.integral_regret_le๐Ÿ”—

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

integral_regret_le๐Ÿ”—

TheoremBandits.TS.integral_regret_le

If Thompson sampling has the correct prior over environments and every environment has K actions, each of which has a corresponding reward between l and u that is sub-Gaussian with variance proxy ฯƒ2 after its mean is subtracted, then the Bayesian regret at time n is at most (2 * K + 1) * (u - l) + 8 * โˆš(ฯƒ2 * K * n * Real.log n).

๐Ÿ”—theorem
Bandits.TS.integral_regret_le.{u_1, u_2} {K : โ„•} [Nonempty (Fin K)] {l u ฯƒ2 : โ„} {ฮฉ : Type u_1} [MeasurableSpace ฮฉ] {๐“” : Type u_2} [MeasurableSpace ๐“”] [StandardBorelSpace ๐“”] [Nonempty ๐“”] {Q : MeasureTheory.Measure ๐“”} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„} [ProbabilityTheory.IsMarkovKernel ฮบ] {E : ฮฉ โ†’ ๐“”} {A : โ„• โ†’ ฮฉ โ†’ Fin K} {R : โ„• โ†’ ฮฉ โ†’ โ„} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ฮบ (tsAlgorithm hK Q ฮบ) E A R P) (hlu : l โ‰ค u) (hm : โˆ€ (e : ๐“”) (a : Fin K), โˆซ (x : โ„), id x โˆ‚ฮบ (e, a) โˆˆ Set.Icc l u) (hฯƒ2 : 0 < ฯƒ2) (hs : โˆ€ (e : ๐“”) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - โˆซ (x : โ„), id x โˆ‚ฮบ (e, a)) โŸจฯƒ2, โ‹ฏโŸฉ (ฮบ (e, a))) (n : โ„•) : โˆซ (x : ฮฉ), Learning.IsBayesAlgEnvSeq.regret ฮบ E A n x โˆ‚P โ‰ค (2 * โ†‘K + 1) * (u - l) + 8 * โˆš(ฯƒ2 * โ†‘K * โ†‘n * Real.log โ†‘n)
Bandits.TS.integral_regret_le.{u_1, u_2} {K : โ„•} [Nonempty (Fin K)] {l u ฯƒ2 : โ„} {ฮฉ : Type u_1} [MeasurableSpace ฮฉ] {๐“” : Type u_2} [MeasurableSpace ๐“”] [StandardBorelSpace ๐“”] [Nonempty ๐“”] {Q : MeasureTheory.Measure ๐“”} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„} [ProbabilityTheory.IsMarkovKernel ฮบ] {E : ฮฉ โ†’ ๐“”} {A : โ„• โ†’ ฮฉ โ†’ Fin K} {R : โ„• โ†’ ฮฉ โ†’ โ„} {P : MeasureTheory.Measure ฮฉ} [MeasureTheory.IsProbabilityMeasure P] (hK : 0 < K) (h : Learning.IsBayesAlgEnvSeq Q ฮบ (tsAlgorithm hK Q ฮบ) E A R P) (hlu : l โ‰ค u) (hm : โˆ€ (e : ๐“”) (a : Fin K), โˆซ (x : โ„), id x โˆ‚ฮบ (e, a) โˆˆ Set.Icc l u) (hฯƒ2 : 0 < ฯƒ2) (hs : โˆ€ (e : ๐“”) (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun x => x - โˆซ (x : โ„), id x โˆ‚ฮบ (e, a)) โŸจฯƒ2, โ‹ฏโŸฉ (ฮบ (e, a))) (n : โ„•) : โˆซ (x : ฮฉ), Learning.IsBayesAlgEnvSeq.regret ฮบ E A n x โˆ‚P โ‰ค (2 * โ†‘K + 1) * (u - l) + 8 * โˆš(ฯƒ2 * โ†‘K * โ†‘n * Real.log โ†‘n)

Code

theorem integral_regret_le (hK : 0 < K) (h : IsBayesAlgEnvSeq Q ฮบ (tsAlgorithm hK Q ฮบ) E A R P)
    (hlu : l โ‰ค u) (hm : โˆ€ e a, (ฮบ (e, a))[id] โˆˆ (Set.Icc l u)) (hฯƒ2 : 0 < ฯƒ2)
    (hs : โˆ€ e a, HasSubgaussianMGF (fun x โ†ฆ x - (ฮบ (e, a))[id]) โŸจฯƒ2, hฯƒ2.leโŸฉ (ฮบ (e, a))) (n : โ„•) :
    P[IsBayesAlgEnvSeq.regret ฮบ E A n]
      โ‰ค (2 * K + 1) * (u - l) + 8 * โˆš(ฯƒ2 * K * n * Real.log n)
Type uses (3)
Body uses (6)

Actions: Source ยท Open Issue

Proof
by
  by_cases hn : n = 0
  ยท simp [hn, IsBayesAlgEnvSeq.regret, Bandits.regret]
    nlinarith
  have hฮด : (0 : โ„) < 1 / n ^ 2 := by positivity
  calc P[IsBayesAlgEnvSeq.regret ฮบ E A n]
      = _ :=
        integral_regret_eq_add hK h hm n
    _ โ‰ค _ :=
        add_le_add
          (integral_sum_range_actionMean_bestAction_sub_ucb_bestAction_le h hlu hm hฯƒ2 hs hฮด n)
          (integral_sum_range_ucb_action_sub_actionMean_action_le h hlu hm hฯƒ2 hs hฮด n)
    _ = K * (u - l) + (K + 1) * (u - l) * ((n - 1) / n)
          + 4 * โˆš((2 : โ„) ^ 2 * (ฯƒ2 * K * n * Real.log n)) := by
        field_simp
        rw [Real.log_pow]
        ring_nf
    _ = K * (u - l) + (K + 1) * (u - l) * ((n - 1) / n) + 8 * โˆš(ฯƒ2 * K * n * Real.log n) := by
        rw [Real.sqrt_mul (by positivity), Real.sqrt_sq (by norm_num)]
        ring
    _ โ‰ค K * (u - l) + (K + 1) * (u - l) * 1 + 8 * โˆš(ฯƒ2 * K * n * Real.log n) := by -- loose
        have : 0 โ‰ค u - l := sub_nonneg.2 hlu
        gcongr
        rw [div_le_one (by positivity)]
        linarith
    _ = _ := by
        ring

Dependency graph

Type dependencies (3)

IsBayesAlgEnvSeq๐Ÿ”—

StructureLearning.IsBayesAlgEnvSeq

IsBayesAlgEnvSeq Q ฮบ alg E A Y P states that there is a measure P : Measure ฮฉ such that the parameter E : ฮฉ โ†’ ๐“” has law Q and that the sequences of actions A : โ„• โ†’ ฮฉ โ†’ ๐“ and feedbacks Y : โ„• โ†’ ฮฉ โ†’ ๐“จ are generated by the algorithm alg : Algorithm ๐“ ๐“จ interacting with an underlying environment that depends on E and ฮบ (stationaryEnv (ฮบ.sectR (E ฯ‰))).

๐Ÿ”—structure
Learning.IsBayesAlgEnvSeq.{u_1, u_2, u_3, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] [MeasurableSpace ฮฉ] (Q : MeasureTheory.Measure ๐“”) (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) (alg : Algorithm ๐“ ๐“จ) (E : ฮฉ โ†’ ๐“”) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (P : MeasureTheory.Measure ฮฉ) [MeasureTheory.IsFiniteMeasure P] : Prop
Learning.IsBayesAlgEnvSeq.{u_1, u_2, u_3, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] [MeasurableSpace ฮฉ] (Q : MeasureTheory.Measure ๐“”) (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) (alg : Algorithm ๐“ ๐“จ) (E : ฮฉ โ†’ ๐“”) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (P : MeasureTheory.Measure ฮฉ) [MeasureTheory.IsFiniteMeasure P] : Prop

Code

structure IsBayesAlgEnvSeq
    (Q : Measure ๐“”) (ฮบ : Kernel (๐“” ร— ๐“) ๐“จ) (alg : Algorithm ๐“ ๐“จ)
    (E : ฮฉ โ†’ ๐“”) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ)
    (P : Measure ฮฉ) [IsFiniteMeasure P] : Prop where
  measurable_param : Measurable E := by fun_prop
  measurable_action n : Measurable (A n) := by fun_prop
  measurable_feedback n : Measurable (Y n) := by fun_prop
  hasLaw_env : HasLaw E Q P
  hasCondDistrib_action_zero : HasCondDistrib (A 0) E (Kernel.const _ alg.p0) P
  hasCondDistrib_feedback_zero : HasCondDistrib (Y 0) (fun ฯ‰ โ†ฆ (E ฯ‰, A 0 ฯ‰)) ฮบ P
  hasCondDistrib_action n :
    HasCondDistrib (A (n + 1)) (fun ฯ‰ โ†ฆ (E ฯ‰, history A Y n ฯ‰))
      ((alg.policy n).prodMkLeft _) P
  hasCondDistrib_feedback n :
    HasCondDistrib (Y (n + 1)) (fun ฯ‰ โ†ฆ (history A Y n ฯ‰, E ฯ‰, A (n + 1) ฯ‰))
      (ฮบ.prodMkLeft _) P
Type uses (2)
Used by (22)

Actions: Source ยท Open Issue

tsAlgorithm๐Ÿ”—

DefinitionBandits.tsAlgorithm

The Thompson sampling algorithm with actions in Fin K, where Q : Measure ๐“” is a prior distribution over parameters, and ฮบ : Kernel (๐“” ร— Fin K) โ„ is a Markov kernel that defines the stationary environment stationaryEnv (ฮบ.sectR e) that corresponds to a parameter e : ๐“”.

At every time n, the Thompson sampling policy uses the posterior over the parameters given the history up to time n to derive the probability of each action being optimal. The action for time n is sampled according to these probabilities.

๐Ÿ”—def
Bandits.tsAlgorithm.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] [StandardBorelSpace ๐“”] [Nonempty ๐“”] (hK : 0 < K) (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„) [ProbabilityTheory.IsMarkovKernel ฮบ] : Learning.Algorithm (Fin K) โ„
Bandits.tsAlgorithm.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] [StandardBorelSpace ๐“”] [Nonempty ๐“”] (hK : 0 < K) (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„) [ProbabilityTheory.IsMarkovKernel ฮบ] : Learning.Algorithm (Fin K) โ„

Code

noncomputable
def tsAlgorithm (hK : 0 < K) (Q : Measure ๐“”) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐“” ร— Fin K) โ„)
    [IsMarkovKernel ฮบ] : Algorithm (Fin K) โ„ where
  policy := TS.policy hK Q ฮบ
  p0 := TS.initialPolicy hK Q ฮบ
Type uses (1)
Body uses (4)
Used by (4)

Actions: Source ยท Open Issue

regret๐Ÿ”—

DefinitionLearning.IsBayesAlgEnvSeq.regret

A random variable that gives the regret at time n.

๐Ÿ”—def
Learning.IsBayesAlgEnvSeq.regret.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (n : โ„•) (ฯ‰ : ฮฉ) : โ„
Learning.IsBayesAlgEnvSeq.regret.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (n : โ„•) (ฯ‰ : ฮฉ) : โ„

Code

noncomputable
def regret (ฮบ : Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (n : โ„•) (ฯ‰ : ฮฉ) : โ„ :=
  Bandits.regret (ฮบ.sectR (E ฯ‰)) A n ฯ‰
Body uses (1)
Used by (6)

Actions: Source ยท Open Issue

All dependencies, transitively (31)

Algorithm๐Ÿ”—

StructureLearning.Algorithm

A stochastic, sequential algorithm.

๐Ÿ”—structure
Learning.Algorithm.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)
Learning.Algorithm.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)

Code

structure Algorithm (๐“ ๐“จ : Type*) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] where
  /-- Policy or sampling rule: distribution of the next action. -/
  policy : (n : โ„•) โ†’ Kernel (Iic n โ†’ ๐“ ร— ๐“จ) ๐“
  /-- The policy is a Markov kernel. -/
  [h_policy : โˆ€ n, IsMarkovKernel (policy n)]
  /-- Distribution of the first action. -/
  p0 : Measure ๐“
  /-- The first action distribution is a probability measure. -/
  [hp0 : IsProbabilityMeasure p0]
Used by (216)

Actions: Source ยท Open Issue

history๐Ÿ”—

DefinitionLearning.history

History of the algorithm-environment sequence up to time n.

๐Ÿ”—def
Learning.history.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {ฮฉ : Type u_3} (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (n : โ„•) (ฯ‰ : ฮฉ) : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ
Learning.history.{u_1, u_2, u_3} {๐“ : Type u_1} {๐“จ : Type u_2} {ฮฉ : Type u_3} (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (n : โ„•) (ฯ‰ : ฮฉ) : โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ

Code

def history (A : โ„• โ†’ ฮฉ โ†’ ๐“) (Y : โ„• โ†’ ฮฉ โ†’ ๐“จ) (n : โ„•) (ฯ‰ : ฮฉ) : Iic n โ†’ ๐“ ร— ๐“จ :=
  fun i โ†ฆ (A i ฯ‰, Y i ฯ‰)
Used by (72)

Actions: Source ยท Open Issue

action๐Ÿ”—

DefinitionLearning.IT.action

action n is the action pulled at time n. This is a random variable on the measurable space โ„• โ†’ ๐“ ร— ๐“จ.

๐Ÿ”—def
Learning.IT.action.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} (n : โ„•) (h : โ„• โ†’ ๐“ ร— ๐“จ) : ๐“
Learning.IT.action.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} (n : โ„•) (h : โ„• โ†’ ๐“ ร— ๐“จ) : ๐“

Code

def action (n : โ„•) (h : โ„• โ†’ ๐“ ร— ๐“จ) : ๐“ := (h n).1
Used by (31)

Actions: Source ยท Open Issue

Environment๐Ÿ”—

StructureLearning.Environment

A stochastic environment.

๐Ÿ”—structure
Learning.Environment.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)
Learning.Environment.{u_4, u_5} (๐“ : Type u_4) (๐“จ : Type u_5) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] : Type (max u_4 u_5)

Code

structure Environment (๐“ ๐“จ : Type*) [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] where
  /-- Distribution of the next observation as function of the past history. -/
  feedback : (n : โ„•) โ†’ Kernel ((Iic n โ†’ ๐“ ร— ๐“จ) ร— ๐“) ๐“จ
  /-- The feedback kernels are Markov kernels. -/
  [h_feedback : โˆ€ n, IsMarkovKernel (feedback n)]
  /-- Distribution of the first observation given the first action. -/
  ฮฝ0 : Kernel ๐“ ๐“จ
  /-- The initial observation kernel is a Markov kernel. -/
  [hp0 : IsMarkovKernel ฮฝ0]
Used by (128)

Actions: Source ยท Open Issue

stepKernel๐Ÿ”—

DefinitionLearning.stepKernel

Kernel describing the distribution of the next action-feedback pair given the history up to n.

๐Ÿ”—def
Learning.stepKernel.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.Kernel (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) (๐“ ร— ๐“จ)
Learning.stepKernel.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.Kernel (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) (๐“ ร— ๐“จ)

Code

noncomputable
def stepKernel (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) (n : โ„•) :
    Kernel (Iic n โ†’ ๐“ ร— ๐“จ) (๐“ ร— ๐“จ) :=
  alg.policy n โŠ—โ‚– env.feedback n
deriving IsMarkovKernel
Type uses (2)
Used by (17)

Actions: Source ยท Open Issue

instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy๐Ÿ”—

InstanceLearning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy

No docstring.

๐Ÿ”—theorem
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.IsMarkovKernel (Algorithm.policy alg n)
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdPolicy.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.IsMarkovKernel (Algorithm.policy alg n)

Code

instance (alg : Algorithm ๐“ ๐“จ) (n : โ„•) : IsMarkovKernel (alg.policy n)
Type uses (1)
Used by (14)

Actions: Source ยท Open Issue

Proof
alg.h_policy n

instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback๐Ÿ”—

InstanceLearning.instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback

No docstring.

๐Ÿ”—theorem
Learning.instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.IsMarkovKernel (Environment.feedback env n)
Learning.instIsMarkovKernelProdForallSubtypeNatMemFinsetIicFeedback.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.IsMarkovKernel (Environment.feedback env n)

Code

instance (env : Environment ๐“ ๐“จ) (n : โ„•) : IsMarkovKernel (env.feedback n)
Type uses (1)
Used by (5)

Actions: Source ยท Open Issue

Proof
env.h_feedback n

instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel๐Ÿ”—

InstanceLearning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel

No docstring.

๐Ÿ”—theorem
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.IsMarkovKernel (stepKernel alg env n)
Learning.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdStepKernel.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.IsMarkovKernel (stepKernel alg env n)

Code

deriving IsMarkovKernel
Type uses (3)
Body uses (2)
Used by (10)

Actions: Source ยท Open Issue

Proof
deriving IsMarkovKernel

trajMeasure๐Ÿ”—

DefinitionLearning.trajMeasure

Measure on the sequence of actions and observations generated by the algorithm/environment.

๐Ÿ”—def
Learning.trajMeasure.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) : MeasureTheory.Measure (โ„• โ†’ ๐“ ร— ๐“จ)
Learning.trajMeasure.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) : MeasureTheory.Measure (โ„• โ†’ ๐“ ร— ๐“จ)

Code

noncomputable
def trajMeasure (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) :
    Measure (โ„• โ†’ ๐“ ร— ๐“จ) :=
  Kernel.trajMeasure (alg.p0 โŠ—โ‚˜ env.ฮฝ0) (stepKernel alg env)
deriving IsProbabilityMeasure
Type uses (2)
Body uses (2)
Used by (19)

Actions: Source ยท Open Issue

instIsProbabilityMeasureP0๐Ÿ”—

InstanceLearning.instIsProbabilityMeasureP0

No docstring.

๐Ÿ”—theorem
Learning.instIsProbabilityMeasureP0.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) : MeasureTheory.IsProbabilityMeasure (Algorithm.p0 alg)
Learning.instIsProbabilityMeasureP0.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) : MeasureTheory.IsProbabilityMeasure (Algorithm.p0 alg)

Code

instance (alg : Algorithm ๐“ ๐“จ) : IsProbabilityMeasure alg.p0
Type uses (1)
Used by (13)

Actions: Source ยท Open Issue

Proof
alg.hp0

prodLeft๐Ÿ”—

DefinitionLearning.Algorithm.prodLeft

An algorithm with observations in ๐“ง ร— ๐“จ obtained from an algorithm with observations in ๐“จ by ignoring the ๐“ง component of each observation.

๐Ÿ”—def
Learning.Algorithm.prodLeft.{u_1, u_2, u_4} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (๐“ง : Type u_4) [MeasurableSpace ๐“ง] (alg : Algorithm ๐“ ๐“จ) : Algorithm ๐“ (๐“ง ร— ๐“จ)
Learning.Algorithm.prodLeft.{u_1, u_2, u_4} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (๐“ง : Type u_4) [MeasurableSpace ๐“ง] (alg : Algorithm ๐“ ๐“จ) : Algorithm ๐“ (๐“ง ร— ๐“จ)

Code

def Algorithm.prodLeft (๐“ง : Type*) [MeasurableSpace ๐“ง] (alg : Algorithm ๐“ ๐“จ) :
    Algorithm ๐“ (๐“ง ร— ๐“จ) where
  policy n := (alg.policy n).comap (fun h i โ†ฆ ((h i).1, (h i).2.2)) (by fun_prop)
  p0 := alg.p0
Type uses (1)
Body uses (2)
Used by (6)

Actions: Source ยท Open Issue

bayesStationaryEnv๐Ÿ”—

DefinitionLearning.bayesStationaryEnv

An environment with observations in ๐“” ร— ๐“จ. The first element e of an observation is sampled from Q once and remains constant. The second element of an observation is sampled from ฮบ (e, a), where a is the corresponding action.

๐Ÿ”—def
Learning.bayesStationaryEnv.{u_1, u_2, u_3} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] : Environment ๐“ (๐“” ร— ๐“จ)
Learning.bayesStationaryEnv.{u_1, u_2, u_3} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] : Environment ๐“ (๐“” ร— ๐“จ)

Code

noncomputable
def bayesStationaryEnv (Q : Measure ๐“”) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐“” ร— ๐“) ๐“จ)
    [IsMarkovKernel ฮบ] : Environment ๐“ (๐“” ร— ๐“จ) where
  feedback n :=
    let g : (Iic n โ†’ ๐“ ร— ๐“” ร— ๐“จ) ร— ๐“ โ†’ ๐“” ร— ๐“ := fun (h, a) => ((h โŸจ0, by simpโŸฉ).2.1, a)
    (Kernel.deterministic (Prod.fst โˆ˜ g) (by fun_prop)) ร—โ‚– (ฮบ.comap g (by fun_prop))
  ฮฝ0 := (Kernel.const _ Q) โŠ—โ‚– ฮบ.swapLeft
Type uses (1)
Used by (4)

Actions: Source ยท Open Issue

bayesTrajMeasure๐Ÿ”—

DefinitionLearning.IT.bayesTrajMeasure

A measure P on a measurable space that carries random variables E, A, and Y such that IsBayesAlgEnvSeq Q ฮบ alg E A Y P.

๐Ÿ”—def
Learning.IT.bayesTrajMeasure.{u_1, u_2, u_3} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐“ ๐“จ) : MeasureTheory.Measure (โ„• โ†’ ๐“ ร— ๐“” ร— ๐“จ)
Learning.IT.bayesTrajMeasure.{u_1, u_2, u_3} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐“ ๐“จ) : MeasureTheory.Measure (โ„• โ†’ ๐“ ร— ๐“” ร— ๐“จ)

Code

noncomputable
def bayesTrajMeasure (Q : Measure ๐“”) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐“” ร— ๐“) ๐“จ)
    [IsMarkovKernel ฮบ] (alg : Algorithm ๐“ ๐“จ) : Measure (โ„• โ†’ ๐“ ร— ๐“” ร— ๐“จ) :=
  trajMeasure (alg.prodLeft ๐“”) (bayesStationaryEnv Q ฮบ)
deriving IsProbabilityMeasure
Type uses (1)
Body uses (3)
Used by (5)

Actions: Source ยท Open Issue

instIsMarkovKernelฮฝ0๐Ÿ”—

InstanceLearning.instIsMarkovKernelฮฝ0

No docstring.

๐Ÿ”—theorem
Learning.instIsMarkovKernelฮฝ0.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) : ProbabilityTheory.IsMarkovKernel (Environment.ฮฝ0 env)
Learning.instIsMarkovKernelฮฝ0.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (env : Environment ๐“ ๐“จ) : ProbabilityTheory.IsMarkovKernel (Environment.ฮฝ0 env)

Code

instance (env : Environment ๐“ ๐“จ) : IsMarkovKernel env.ฮฝ0
Type uses (1)
Used by (8)

Actions: Source ยท Open Issue

Proof
env.hp0

instIsProbabilityMeasureForallNatProdTrajMeasure๐Ÿ”—

InstanceLearning.instIsProbabilityMeasureForallNatProdTrajMeasure

No docstring.

๐Ÿ”—theorem
Learning.instIsProbabilityMeasureForallNatProdTrajMeasure.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) : MeasureTheory.IsProbabilityMeasure (trajMeasure alg env)
Learning.instIsProbabilityMeasureForallNatProdTrajMeasure.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (alg : Algorithm ๐“ ๐“จ) (env : Environment ๐“ ๐“จ) : MeasureTheory.IsProbabilityMeasure (trajMeasure alg env)

Code

deriving IsProbabilityMeasure
Type uses (3)
Body uses (4)
Used by (8)

Actions: Source ยท Open Issue

Proof
deriving IsProbabilityMeasure

instIsProbabilityMeasureForallNatProdBayesTrajMeasure๐Ÿ”—

InstanceLearning.IT.instIsProbabilityMeasureForallNatProdBayesTrajMeasure

No docstring.

๐Ÿ”—theorem
Learning.IT.instIsProbabilityMeasureForallNatProdBayesTrajMeasure.{u_1, u_2, u_3} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐“ ๐“จ) : MeasureTheory.IsProbabilityMeasure (bayesTrajMeasure Q ฮบ alg)
Learning.IT.instIsProbabilityMeasureForallNatProdBayesTrajMeasure.{u_1, u_2, u_3} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐“ ๐“จ) : MeasureTheory.IsProbabilityMeasure (bayesTrajMeasure Q ฮบ alg)

Code

deriving IsProbabilityMeasure
Type uses (2)
Body uses (3)
Used by (4)

Actions: Source ยท Open Issue

Proof
deriving IsProbabilityMeasure

bayesTrajMeasurePosterior๐Ÿ”—

DefinitionLearning.IT.bayesTrajMeasurePosterior

A kernel that represents the posterior over E given the history up to time n.

๐Ÿ”—def
Learning.IT.bayesTrajMeasurePosterior.{u_1, u_2, u_3} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] [StandardBorelSpace ๐“”] [Nonempty ๐“”] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.Kernel (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) ๐“”
Learning.IT.bayesTrajMeasurePosterior.{u_1, u_2, u_3} {๐“” : Type u_1} {๐“ : Type u_2} {๐“จ : Type u_3} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] [StandardBorelSpace ๐“”] [Nonempty ๐“”] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.Kernel (โ†ฅ(Finset.Iic n) โ†’ ๐“ ร— ๐“จ) ๐“”

Code

noncomputable
def bayesTrajMeasurePosterior [StandardBorelSpace ๐“”] [Nonempty ๐“”]
    (Q : Measure ๐“”) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐“” ร— ๐“) ๐“จ) [IsMarkovKernel ฮบ]
    (alg : Algorithm ๐“ ๐“จ) (n : โ„•) : Kernel (Iic n โ†’ ๐“ ร— ๐“จ) ๐“” :=
  condDistrib (fun ฯ‰ โ†ฆ (ฯ‰ 0).2.1) (history action (fun n ฯ‰ โ†ฆ (ฯ‰ n).2.2) n)
    (bayesTrajMeasure Q ฮบ alg)
deriving IsMarkovKernel
Type uses (1)
Body uses (4)
Used by (4)

Actions: Source ยท Open Issue

randomSampling๐Ÿ”—

DefinitionLearning.randomSampling

The Random Sampling algorithm, which samples from a fixed probability measure at each iteration.

๐Ÿ”—def
Learning.randomSampling.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮผ : MeasureTheory.Measure ๐“) [MeasureTheory.IsProbabilityMeasure ฮผ] : Algorithm ๐“ ๐“จ
Learning.randomSampling.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} (ฮผ : MeasureTheory.Measure ๐“) [MeasureTheory.IsProbabilityMeasure ฮผ] : Algorithm ๐“ ๐“จ

Code

noncomputable def randomSampling (ฮผ : Measure ๐“) [IsProbabilityMeasure ฮผ] : Algorithm ๐“ ๐“จ where
  policy _ := Kernel.const _ ฮผ
  p0 := ฮผ
Type uses (1)
Used by (5)

Actions: Source ยท Open Issue

uniformAlgorithm๐Ÿ”—

DefinitionLearning.uniformAlgorithm

The Uniform algorithm: actions are chosen uniformly at random.

๐Ÿ”—def
Learning.uniformAlgorithm.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} [Finite ๐“] [Nonempty ๐“] : Algorithm ๐“ ๐“จ
Learning.uniformAlgorithm.{u_1, u_2} {๐“ : Type u_1} {๐“จ : Type u_2} {m๐“ : MeasurableSpace ๐“} {m๐“จ : MeasurableSpace ๐“จ} [Finite ๐“] [Nonempty ๐“] : Algorithm ๐“ ๐“จ

Code

noncomputable
def uniformAlgorithm [Finite ๐“] [Nonempty ๐“] : Algorithm ๐“ ๐“จ := randomSampling (uniformOn Set.univ)
Type uses (1)
Body uses (1)
Used by (4)

Actions: Source ยท Open Issue

max๐Ÿ”—

DefinitionFunction.max

The maximum value of a tuple.

๐Ÿ”—def
Function.max.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : ฮฑ
Function.max.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : ฮฑ

Code

abbrev max : ฮฑ := univ.sup' univ_nonempty f
Used by (8)

Actions: Source ยท Open Issue

exists_argmax๐Ÿ”—

Lemmaexists_argmax

No docstring.

๐Ÿ”—theorem
exists_argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : โˆƒ i, f i = Function.max f
exists_argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : โˆƒ i, f i = Function.max f

Code

lemma exists_argmax : โˆƒ i, f i = f.max
Type uses (1)
Used by (3)

Actions: Source ยท Open Issue

Proof
by
  obtain โŸจi, -, hiโŸฉ := Finset.exists_mem_eq_sup' (by simp : Finset.univ.Nonempty) f
  exact โŸจi, hi.symmโŸฉ

argmax๐Ÿ”—

Definitionargmax

The index of the maximum value of a tuple.

๐Ÿ”—def
argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : ฮน
argmax.{u_1, u_2} {ฮน : Type u_1} {ฮฑ : Type u_2} [LinearOrder ฮฑ] [Fintype ฮน] [Nonempty ฮน] (f : ฮน โ†’ ฮฑ) : ฮน

Code

noncomputable def argmax := (exists_argmax f).choose
Body uses (2)
Used by (17)

Actions: Source ยท Open Issue

actionMean๐Ÿ”—

DefinitionLearning.IsBayesAlgEnvSeq.actionMean

A random variable that gives the mean feedback of action a.

๐Ÿ”—def
Learning.IsBayesAlgEnvSeq.actionMean.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (a : ๐“) (ฯ‰ : ฮฉ) : โ„
Learning.IsBayesAlgEnvSeq.actionMean.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (a : ๐“) (ฯ‰ : ฮฉ) : โ„

Code

noncomputable
def actionMean (ฮบ : Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (a : ๐“) (ฯ‰ : ฮฉ) : โ„ := (ฮบ (E ฯ‰, a))[id]
Used by (12)

Actions: Source ยท Open Issue

bestAction๐Ÿ”—

DefinitionLearning.IsBayesAlgEnvSeq.bestAction

A random variable that gives the action with the highest mean feedback.

๐Ÿ”—def
Learning.IsBayesAlgEnvSeq.bestAction.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [Nonempty ๐“] [Fintype ๐“] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (ฯ‰ : ฮฉ) : ๐“
Learning.IsBayesAlgEnvSeq.bestAction.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [Nonempty ๐“] [Fintype ๐“] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (ฯ‰ : ฮฉ) : ๐“

Code

noncomputable
def bestAction [Nonempty ๐“] [Fintype ๐“] (ฮบ : Kernel (๐“” ร— ๐“) โ„) (E : ฮฉ โ†’ ๐“”) (ฯ‰ : ฮฉ) : ๐“ :=
  argmax (fun a โ†ฆ actionMean ฮบ E a ฯ‰)
Body uses (2)
Used by (12)

Actions: Source ยท Open Issue

policy๐Ÿ”—

DefinitionBandits.TS.policy

The Thompson sampling policy samples an action according to its probability of being optimal under the posterior over environments given the history so far. The posterior under a uniform algorithm is used to avoid a circular definition.

๐Ÿ”—def
Bandits.TS.policy.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] [StandardBorelSpace ๐“”] [Nonempty ๐“”] (hK : 0 < K) (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„) [ProbabilityTheory.IsMarkovKernel ฮบ] (n : โ„•) : ProbabilityTheory.Kernel (โ†ฅ(Finset.Iic n) โ†’ Fin K ร— โ„) (Fin K)
Bandits.TS.policy.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] [StandardBorelSpace ๐“”] [Nonempty ๐“”] (hK : 0 < K) (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„) [ProbabilityTheory.IsMarkovKernel ฮบ] (n : โ„•) : ProbabilityTheory.Kernel (โ†ฅ(Finset.Iic n) โ†’ Fin K ร— โ„) (Fin K)

Code

noncomputable
def TS.policy (hK : 0 < K) (Q : Measure ๐“”) [IsProbabilityMeasure Q] (ฮบ : Kernel (๐“” ร— Fin K) โ„)
    [IsMarkovKernel ฮบ] (n : โ„•) : Kernel (Iic n โ†’ (Fin K) ร— โ„) (Fin K) :=
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  (IT.bayesTrajMeasurePosterior Q ฮบ uniformAlgorithm n).map (bestAction ฮบ id)
Body uses (3)
Used by (2)

Actions: Source ยท Open Issue

instIsMarkovKernelForallSubtypeNatMemFinsetIicProdBayesTrajMeasurePosterior๐Ÿ”—

InstanceLearning.IT.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdBayesTrajMeasurePosterior

No docstring.

๐Ÿ”—theorem
Learning.IT.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdBayesTrajMeasurePosterior.{u_1, u_2, u_3} {๐“” : Type u_3} {๐“ : Type u_1} {๐“จ : Type u_2} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] [StandardBorelSpace ๐“”] [Nonempty ๐“”] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.IsMarkovKernel (bayesTrajMeasurePosterior Q ฮบ alg n)
Learning.IT.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdBayesTrajMeasurePosterior.{u_1, u_2, u_3} {๐“” : Type u_3} {๐“ : Type u_1} {๐“จ : Type u_2} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ๐“จ] [StandardBorelSpace ๐“”] [Nonempty ๐“”] (Q : MeasureTheory.Measure ๐“”) [MeasureTheory.IsProbabilityMeasure Q] (ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) ๐“จ) [ProbabilityTheory.IsMarkovKernel ฮบ] (alg : Algorithm ๐“ ๐“จ) (n : โ„•) : ProbabilityTheory.IsMarkovKernel (bayesTrajMeasurePosterior Q ฮบ alg n)

Code

deriving IsMarkovKernel
Type uses (2)
Body uses (4)
Used by (1)

Actions: Source ยท Open Issue

Proof
deriving IsMarkovKernel

measurable_bestAction๐Ÿ”—

LemmaLearning.IsBayesAlgEnvSeq.measurable_bestAction

No docstring.

๐Ÿ”—theorem
Learning.IsBayesAlgEnvSeq.measurable_bestAction.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ฮฉ] [Nonempty ๐“] [Fintype ๐“] {ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„} {E : ฮฉ โ†’ ๐“”} (hE : Measurable E) : Measurable (bestAction ฮบ E)
Learning.IsBayesAlgEnvSeq.measurable_bestAction.{u_1, u_2, u_4} {๐“” : Type u_1} {๐“ : Type u_2} {ฮฉ : Type u_4} [MeasurableSpace ๐“”] [MeasurableSpace ๐“] [MeasurableSpace ฮฉ] [Nonempty ๐“] [Fintype ๐“] {ฮบ : ProbabilityTheory.Kernel (๐“” ร— ๐“) โ„} {E : ฮฉ โ†’ ๐“”} (hE : Measurable E) : Measurable (bestAction ฮบ E)

Code

lemma measurable_bestAction [Nonempty ๐“] [Fintype ๐“] {ฮบ : Kernel (๐“” ร— ๐“) โ„} {E : ฮฉ โ†’ ๐“”}
    (hE : Measurable E) : Measurable (bestAction ฮบ E)
Type uses (1)
Body uses (4)
Used by (7)

Actions: Source ยท Open Issue

Proof
by
  unfold bestAction
  fun_prop

instIsMarkovKernelForallSubtypeNatMemFinsetIicProdFinRealPolicy๐Ÿ”—

InstanceBandits.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdFinRealPolicy

No docstring.

๐Ÿ”—theorem
Bandits.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdFinRealPolicy.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] [StandardBorelSpace ๐“”] [Nonempty ๐“”] {hK : 0 < K} {Q : MeasureTheory.Measure ๐“”} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„} [ProbabilityTheory.IsMarkovKernel ฮบ] {n : โ„•} : ProbabilityTheory.IsMarkovKernel (TS.policy hK Q ฮบ n)
Bandits.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdFinRealPolicy.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] [StandardBorelSpace ๐“”] [Nonempty ๐“”] {hK : 0 < K} {Q : MeasureTheory.Measure ๐“”} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„} [ProbabilityTheory.IsMarkovKernel ฮบ] {n : โ„•} : ProbabilityTheory.IsMarkovKernel (TS.policy hK Q ฮบ n)

Code

instance {hK : 0 < K} {Q : Measure ๐“”} [IsProbabilityMeasure Q] {ฮบ : Kernel (๐“” ร— Fin K) โ„}
    [IsMarkovKernel ฮบ] {n : โ„•} : IsMarkovKernel (TS.policy hK Q ฮบ n)
Type uses (1)
Body uses (5)
Used by (1)

Actions: Source ยท Open Issue

Proof
Kernel.IsMarkovKernel.map _ (by fun_prop)

initialPolicy๐Ÿ”—

DefinitionBandits.TS.initialPolicy

The initial action is sampled according to its probability of being optimal under the prior over environments.

๐Ÿ”—def
Bandits.TS.initialPolicy.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] (hK : 0 < K) (Q : MeasureTheory.Measure ๐“”) (ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„) : MeasureTheory.Measure (Fin K)
Bandits.TS.initialPolicy.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] (hK : 0 < K) (Q : MeasureTheory.Measure ๐“”) (ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„) : MeasureTheory.Measure (Fin K)

Code

noncomputable
def TS.initialPolicy (hK : 0 < K) (Q : Measure ๐“”) (ฮบ : Kernel (๐“” ร— Fin K) โ„) : Measure (Fin K) :=
  have : Nonempty (Fin K) := Fin.pos_iff_nonempty.mp hK
  Q.map (bestAction ฮบ id)
Body uses (1)
Used by (2)

Actions: Source ยท Open Issue

instIsProbabilityMeasureFinInitialPolicy๐Ÿ”—

InstanceBandits.instIsProbabilityMeasureFinInitialPolicy

No docstring.

๐Ÿ”—theorem
Bandits.instIsProbabilityMeasureFinInitialPolicy.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] {hK : 0 < K} {Q : MeasureTheory.Measure ๐“”} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„} : MeasureTheory.IsProbabilityMeasure (TS.initialPolicy hK Q ฮบ)
Bandits.instIsProbabilityMeasureFinInitialPolicy.{u_1} {K : โ„•} {๐“” : Type u_1} [MeasurableSpace ๐“”] {hK : 0 < K} {Q : MeasureTheory.Measure ๐“”} [MeasureTheory.IsProbabilityMeasure Q] {ฮบ : ProbabilityTheory.Kernel (๐“” ร— Fin K) โ„} : MeasureTheory.IsProbabilityMeasure (TS.initialPolicy hK Q ฮบ)

Code

instance {hK : 0 < K} {Q : Measure ๐“”} [IsProbabilityMeasure Q] {ฮบ : Kernel (๐“” ร— Fin K) โ„} :
    IsProbabilityMeasure (TS.initialPolicy hK Q ฮบ)
Type uses (1)
Body uses (2)
Used by (1)

Actions: Source ยท Open Issue

Proof
Measure.isProbabilityMeasure_map (by fun_prop)

regret๐Ÿ”—

DefinitionBandits.regret

Regret of a sequence of pulls k : โ„• โ†’ ๐“ at time t for the reward kernel ฮฝ ; Kernel ๐“ โ„.

๐Ÿ”—def
Bandits.regret.{u_1, u_2} {๐“ : Type u_1} {ฮฉ : Type u_2} {m๐“ : MeasurableSpace ๐“} (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„
Bandits.regret.{u_1, u_2} {๐“ : Type u_1} {ฮฉ : Type u_2} {m๐“ : MeasurableSpace ๐“} (ฮฝ : ProbabilityTheory.Kernel ๐“ โ„) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„

Code

noncomputable
def regret (ฮฝ : Kernel ๐“ โ„) (A : โ„• โ†’ ฮฉ โ†’ ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„ :=
  t * (โจ† a, (ฮฝ a)[id]) - โˆ‘ s โˆˆ range t, (ฮฝ (A s ฯ‰))[id]
Used by (11)

Actions: Source ยท Open Issue