LeanMachineLearning exposition

Learning.sumRewards_eq_of_pullCount_eq🔗

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

sumRewards_eq_of_pullCount_eq🔗

LemmaLearning.sumRewards_eq_of_pullCount_eq

No docstring.

🔗theorem
Learning.sumRewards_eq_of_pullCount_eq.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3} [DecidableEq 𝓐] {A : Ω 𝓐} {a : 𝓐} {ω : Ω} {R' : Ω } {s t : } (h_eq : pullCount A a s ω = pullCount A a t ω) : sumRewards A R' a s ω = sumRewards A R' a t ω
Learning.sumRewards_eq_of_pullCount_eq.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3} [DecidableEq 𝓐] {A : Ω 𝓐} {a : 𝓐} {ω : Ω} {R' : Ω } {s t : } (h_eq : pullCount A a s ω = pullCount A a t ω) : sumRewards A R' a s ω = sumRewards A R' a t ω

Code

lemma sumRewards_eq_of_pullCount_eq {R' : ℕ → Ω → ℝ} {s t : ℕ}
    (h_eq : pullCount A a s ω = pullCount A a t ω) :
    sumRewards A R' a s ω = sumRewards A R' a t ω
Type uses (2)
Body uses (3)

Actions: Source · Open Issue

Proof
by
  wlog hst : s ≤ t
  · have hts : t ≤ s := by lia
    exact (this h_eq.symm hts).symm
  induction t, hst using Nat.le_induction with
  | base => rfl
  | succ t hst' ih =>
    have h_mono' : pullCount A a t ω ≤ pullCount A a (t + 1) ω := pullCount_mono a (Nat.le_succ t) ω
    have h_eq_t : pullCount A a s ω = pullCount A a t ω :=
      le_antisymm (pullCount_mono a hst' ω) (h_eq ▸ h_mono')
    have hne : A t ω ≠ a := by
      intro ha
      have h1 := ha ▸ pullCount_action_eq_pullCount_add_one (A := A) t ω
      lia
    rw [sumRewards_add_one, if_neg hne, add_zero, ih h_eq_t]

Dependency graph

Type dependencies (2)

pullCount🔗

DefinitionLearning.pullCount

Number of times action a was chosen up to time t (excluding t).

🔗def
Learning.pullCount.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3} [DecidableEq 𝓐] (A : Ω 𝓐) (a : 𝓐) (t : ) (ω : Ω) :
Learning.pullCount.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3} [DecidableEq 𝓐] (A : Ω 𝓐) (a : 𝓐) (t : ) (ω : Ω) :

Code

noncomputable
def pullCount (A : ℕ → Ω → 𝓐) (a : 𝓐) (t : ℕ) (ω : Ω) : ℕ :=
  #(filter (fun s ↦ A s ω = a) (range t))
Used by (146)

Actions: Source · Open Issue

sumRewards🔗

DefinitionLearning.sumRewards

Sum of rewards obtained when pulling action a up to time t (exclusive).

🔗def
Learning.sumRewards.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3} [DecidableEq 𝓐] (A : Ω 𝓐) (R' : Ω ) (a : 𝓐) (t : ) (ω : Ω) :
Learning.sumRewards.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3} [DecidableEq 𝓐] (A : Ω 𝓐) (R' : Ω ) (a : 𝓐) (t : ) (ω : Ω) :

Code

def sumRewards (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ) (ω : Ω) : ℝ :=
  ∑ s ∈ range t, if A s ω = a then R' s ω else 0
Used by (44)

Actions: Source · Open Issue