LeanMachineLearning exposition

Bandits.prob_avg_add_sqrt_log_le๐Ÿ”—

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

prob_avg_add_sqrt_log_le๐Ÿ”—

LemmaBandits.prob_avg_add_sqrt_log_le

No docstring.

๐Ÿ”—theorem
Bandits.prob_avg_add_sqrt_log_le.{u_1} {๐“ : Type u_1} {m๐“ : MeasurableSpace ๐“} {ฮฝ : ProbabilityTheory.Kernel ๐“ โ„} [ProbabilityTheory.IsMarkovKernel ฮฝ] {ฯƒ2 : NNReal} {c : โ„} (hฮฝ : โˆ€ (a : ๐“), ProbabilityTheory.HasSubgaussianMGF (fun x => x - โˆซ (x : โ„), id x โˆ‚ฮฝ a) ฯƒ2 (ฮฝ a)) (hฯƒ2 : ฯƒ2 โ‰  0) (hc : 0 โ‰ค c) (a : ๐“) (n k : โ„•) (hk : k โ‰  0) : (streamMeasure ฮฝ) {ฯ‰ | (โˆ‘ m โˆˆ Finset.range k, ฯ‰ m a) / โ†‘k + โˆš(2 * c * โ†‘ฯƒ2 * Real.log (โ†‘n + 1) / โ†‘k) โ‰ค โˆซ (x : โ„), id x โˆ‚ฮฝ a} โ‰ค 1 / (โ†‘n + 1) ^ c
Bandits.prob_avg_add_sqrt_log_le.{u_1} {๐“ : Type u_1} {m๐“ : MeasurableSpace ๐“} {ฮฝ : ProbabilityTheory.Kernel ๐“ โ„} [ProbabilityTheory.IsMarkovKernel ฮฝ] {ฯƒ2 : NNReal} {c : โ„} (hฮฝ : โˆ€ (a : ๐“), ProbabilityTheory.HasSubgaussianMGF (fun x => x - โˆซ (x : โ„), id x โˆ‚ฮฝ a) ฯƒ2 (ฮฝ a)) (hฯƒ2 : ฯƒ2 โ‰  0) (hc : 0 โ‰ค c) (a : ๐“) (n k : โ„•) (hk : k โ‰  0) : (streamMeasure ฮฝ) {ฯ‰ | (โˆ‘ m โˆˆ Finset.range k, ฯ‰ m a) / โ†‘k + โˆš(2 * c * โ†‘ฯƒ2 * Real.log (โ†‘n + 1) / โ†‘k) โ‰ค โˆซ (x : โ„), id x โˆ‚ฮฝ a} โ‰ค 1 / (โ†‘n + 1) ^ c

Code

lemma prob_avg_add_sqrt_log_le {ฯƒ2 : โ„โ‰ฅ0} {c : โ„}
    (hฮฝ : โˆ€ a, HasSubgaussianMGF (fun x โ†ฆ x - (ฮฝ a)[id]) ฯƒ2 (ฮฝ a)) (hฯƒ2 : ฯƒ2 โ‰  0)
    (hc : 0 โ‰ค c) (a : ๐“) (n k : โ„•) (hk : k โ‰  0) :
    streamMeasure ฮฝ {ฯ‰ | (โˆ‘ m โˆˆ range k, ฯ‰ m a) / k + โˆš(2 * c * ฯƒ2 * log (n + 1) / k) โ‰ค (ฮฝ a)[id]} โ‰ค
      1 / (n + 1) ^ c
Type uses (1)
Body uses (1)
Used by (1)

Actions: Source ยท Open Issue

Proof
by
  have h_log_nonneg : 0 โ‰ค log (n + 1) := log_nonneg (by simp)
  calc
    streamMeasure ฮฝ {ฯ‰ | (โˆ‘ m โˆˆ range k, ฯ‰ m a) / k + โˆš(2 * c * ฯƒ2 * log (n + 1) / k) โ‰ค (ฮฝ a)[id]}
  _ = streamMeasure ฮฝ
      {ฯ‰ | (โˆ‘ s โˆˆ range k, (ฯ‰ s a - (ฮฝ a)[id])) / k โ‰ค - โˆš(2 * c * ฯƒ2 * log (n + 1) / k)} := by
    congr with ฯ‰
    field_simp
    rw [Finset.sum_sub_distrib]
    simp
    grind
  _ = streamMeasure ฮฝ
      {ฯ‰ | (โˆ‘ s โˆˆ range k, (ฯ‰ s a - (ฮฝ a)[id])) โ‰ค - โˆš(2 * c * k * ฯƒ2 * log (n + 1))} := by
    congr with ฯ‰
    field_simp
    congr! 2
    rw [sqrt_div (by positivity), โ† mul_div_assoc, mul_comm, mul_div_assoc, div_sqrt,
      mul_assoc (k : โ„), mul_assoc (k : โ„), mul_assoc (k : โ„),
      sqrt_mul (x := (k : โ„)) (by positivity), mul_comm]
  _ โ‰ค 1 / (n + 1) ^ c := prob_sum_le_sqrt_log hฮฝ hฯƒ2 hc a k hk

Dependency graph

Type dependencies (1)

streamMeasure๐Ÿ”—

DefinitionBandits.streamMeasure

Measure of an infinite stream of rewards from each action.

๐Ÿ”—def
Bandits.streamMeasure.{u_1, u_2} {๐“ : Type u_1} {R : Type u_2} {m๐“ : MeasurableSpace ๐“} {mR : MeasurableSpace R} (ฮฝ : ProbabilityTheory.Kernel ๐“ R) : MeasureTheory.Measure (โ„• โ†’ ๐“ โ†’ R)
Bandits.streamMeasure.{u_1, u_2} {๐“ : Type u_1} {R : Type u_2} {m๐“ : MeasurableSpace ๐“} {mR : MeasurableSpace R} (ฮฝ : ProbabilityTheory.Kernel ๐“ R) : MeasureTheory.Measure (โ„• โ†’ ๐“ โ†’ R)

Code

noncomputable
def streamMeasure (ฮฝ : Kernel ๐“ R) : Measure (โ„• โ†’ ๐“ โ†’ R) :=
  Measure.infinitePi fun _ โ†ฆ Measure.infinitePi ฮฝ
Used by (56)

Actions: Source ยท Open Issue