`Learning.sumRewards_eq_of_pullCount_eq`🔗

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

`sumRewards_eq_of_pullCount_eq`🔗

LemmaLearning.sumRewards_eq_of_pullCount_eq

Details

No docstring.

theorem

Learning.sumRewards_eq_of_pullCount_eq.{u_1, u_3} {𝓐 : Type u_1}
  {Ω : Type u_3} [DecidableEq 𝓐] {A : ℕ → Ω → 𝓐} {a : 𝓐} {ω : Ω}
  {R' : ℕ → Ω → ℝ} {s t : ℕ}
  (h_eq : pullCount A a s ω = pullCount A a t ω) :
  sumRewards A R' a s ω = sumRewards A R' a t ω
Learning.sumRewards_eq_of_pullCount_eq.{u_1,
    u_3}
  {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] {A : ℕ → Ω → 𝓐} {a : 𝓐}
  {ω : Ω} {R' : ℕ → Ω → ℝ} {s t : ℕ}
  (h_eq :
    pullCount A a s ω =
      pullCount A a t ω) :
  sumRewards A R' a s ω =
    sumRewards A R' a t ω

Code

lemma sumRewards_eq_of_pullCount_eq {R' : ℕ → Ω → ℝ} {s t : ℕ}
    (h_eq : pullCount A a s ω = pullCount A a t ω) :
    sumRewards A R' a s ω = sumRewards A R' a t ω

Type uses (2)

Body uses (3)

Actions: Source · Open Issue

Proof

by
  wlog hst : s ≤ t
  · have hts : t ≤ s := by lia
    exact (this h_eq.symm hts).symm
  induction t, hst using Nat.le_induction with
  | base => rfl
  | succ t hst' ih =>
    have h_mono' : pullCount A a t ω ≤ pullCount A a (t + 1) ω := pullCount_mono a (Nat.le_succ t) ω
    have h_eq_t : pullCount A a s ω = pullCount A a t ω :=
      le_antisymm (pullCount_mono a hst' ω) (h_eq ▸ h_mono')
    have hne : A t ω ≠ a := by
      intro ha
      have h1 := ha ▸ pullCount_action_eq_pullCount_add_one (A := A) t ω
      lia
    rw [sumRewards_add_one, if_neg hne, add_zero, ih h_eq_t]

Dependency graph

Type dependencies (2)

`pullCount`🔗

DefinitionLearning.pullCount

Details

Number of times action a was chosen up to time t (excluding t).

def

Learning.pullCount.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (a : 𝓐) (t : ℕ) (ω : Ω) : ℕ
Learning.pullCount.{u_1, u_3}
  {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (a : 𝓐)
  (t : ℕ) (ω : Ω) : ℕ

Code

noncomputable
def pullCount (A : ℕ → Ω → 𝓐) (a : 𝓐) (t : ℕ) (ω : Ω) : ℕ :=
  #(filter (fun s ↦ A s ω = a) (range t))

Used by (146)

Actions: Source · Open Issue

`sumRewards`🔗

DefinitionLearning.sumRewards

Details

Sum of rewards obtained when pulling action a up to time t (exclusive).

def

Learning.sumRewards.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ)
  (ω : Ω) : ℝ
Learning.sumRewards.{u_1, u_3}
  {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐)
  (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ)
  (ω : Ω) : ℝ

Code

def sumRewards (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ) (ω : Ω) : ℝ :=
  ∑ s ∈ range t, if A s ω = a then R' s ω else 0

Used by (44)

Actions: Source · Open Issue

Learning.sumRewards_eq_of_pullCount_eq🔗

sumRewards_eq_of_pullCount_eq🔗

pullCount🔗

sumRewards🔗

`Learning.sumRewards_eq_of_pullCount_eq`🔗

`sumRewards_eq_of_pullCount_eq`🔗

`pullCount`🔗

`sumRewards`🔗