`Learning.sumRewards_sub_pullCount_mul_eq_sum`🔗

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

`sumRewards_sub_pullCount_mul_eq_sum`🔗

LemmaLearning.sumRewards_sub_pullCount_mul_eq_sum

Details

No docstring.

theorem

Learning.sumRewards_sub_pullCount_mul_eq_sum.{u_1, u_3} {𝓐 : Type u_1}
  {Ω : Type u_3} [DecidableEq 𝓐] {A : ℕ → Ω → 𝓐} {a : 𝓐} {n : ℕ} {ω : Ω}
  {R' : ℕ → Ω → ℝ} (c : 𝓐 → ℝ) :
  sumRewards A R' a (n + 1) ω - ↑(pullCount A a (n + 1) ω) * c a =
    ∑ i ∈ Finset.range (n + 1), if A i ω = a then R' i ω - c a else 0
Learning.sumRewards_sub_pullCount_mul_eq_sum.{u_1,
    u_3}
  {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] {A : ℕ → Ω → 𝓐} {a : 𝓐}
  {n : ℕ} {ω : Ω} {R' : ℕ → Ω → ℝ}
  (c : 𝓐 → ℝ) :
  sumRewards A R' a (n + 1) ω -
      ↑(pullCount A a (n + 1) ω) * c a =
    ∑ i ∈ Finset.range (n + 1),
      if A i ω = a then R' i ω - c a
      else 0

Code

lemma sumRewards_sub_pullCount_mul_eq_sum {R' : ℕ → Ω → ℝ} (c : 𝓐 → ℝ) :
    sumRewards A R' a (n + 1) ω - pullCount A a (n + 1) ω * c a =
      ∑ i ∈ range (n + 1), (if A i ω = a then R' i ω - c a else 0)

Type uses (2)

Body uses (4)

Actions: Source · Open Issue

Proof

by
  induction n with
  | zero =>
    simp_rw [sumRewards_add_one, pullCount_add_one]
    simp only [sumRewards_zero, Pi.zero_apply, zero_add, pullCount_zero, Nat.cast_ite, Nat.cast_one,
      CharP.cast_eq_zero, ite_mul, one_mul, zero_mul, range_one, sum_singleton]
    grind
  | succ n hn =>
    simp_rw [sumRewards_add_one (t := n + 1), pullCount_add_one (t := n + 1)]
    split_ifs with ha
    · conv_rhs => rw [sum_range_succ]
      simp only [Nat.cast_add, Nat.cast_one, ha, ↓reduceIte, add_mul, one_mul]
      grind
    · simp only [add_zero, hn]
      conv_rhs => rw [sum_range_succ]
      simp [ha]

Dependency graph

Type dependencies (2)

`sumRewards`🔗

DefinitionLearning.sumRewards

Details

Sum of rewards obtained when pulling action a up to time t (exclusive).

def

Learning.sumRewards.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ)
  (ω : Ω) : ℝ
Learning.sumRewards.{u_1, u_3}
  {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐)
  (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ)
  (ω : Ω) : ℝ

Code

def sumRewards (A : ℕ → Ω → 𝓐) (R' : ℕ → Ω → ℝ) (a : 𝓐) (t : ℕ) (ω : Ω) : ℝ :=
  ∑ s ∈ range t, if A s ω = a then R' s ω else 0

Used by (44)

Actions: Source · Open Issue

`pullCount`🔗

DefinitionLearning.pullCount

Details

Number of times action a was chosen up to time t (excluding t).

def

Learning.pullCount.{u_1, u_3} {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (a : 𝓐) (t : ℕ) (ω : Ω) : ℕ
Learning.pullCount.{u_1, u_3}
  {𝓐 : Type u_1} {Ω : Type u_3}
  [DecidableEq 𝓐] (A : ℕ → Ω → 𝓐) (a : 𝓐)
  (t : ℕ) (ω : Ω) : ℕ

Code

noncomputable
def pullCount (A : ℕ → Ω → 𝓐) (a : 𝓐) (t : ℕ) (ω : Ω) : ℕ :=
  #(filter (fun s ↦ A s ω = a) (range t))

Used by (146)

Actions: Source · Open Issue

Learning.sumRewards_sub_pullCount_mul_eq_sum🔗

sumRewards_sub_pullCount_mul_eq_sum🔗

sumRewards🔗

pullCount🔗

`Learning.sumRewards_sub_pullCount_mul_eq_sum`🔗

`sumRewards_sub_pullCount_mul_eq_sum`🔗

`sumRewards`🔗

`pullCount`🔗