LeanMachineLearning exposition

Learning.sumRewards_sub_pullCount_mul_eq_sum๐Ÿ”—

This page has the declaration's own card below, then its dependency graph, then a card for each dependency (type dependencies first, then the rest of the transitive closure). For a theorem, the graph and the dependency cards only follow its statement's dependencies (its proof is replaced by sorry, so what it proves doesn't depend on how); for everything else, both the type and the body/value are followed, since their content is part of what later declarations build on.

Minimal Lean file

sumRewards_sub_pullCount_mul_eq_sum๐Ÿ”—

LemmaLearning.sumRewards_sub_pullCount_mul_eq_sum

No docstring.

๐Ÿ”—theorem
Learning.sumRewards_sub_pullCount_mul_eq_sum.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {a : ๐“} {n : โ„•} {ฯ‰ : ฮฉ} {R' : โ„• โ†’ ฮฉ โ†’ โ„} (c : ๐“ โ†’ โ„) : sumRewards A R' a (n + 1) ฯ‰ - โ†‘(pullCount A a (n + 1) ฯ‰) * c a = โˆ‘ i โˆˆ Finset.range (n + 1), if A i ฯ‰ = a then R' i ฯ‰ - c a else 0
Learning.sumRewards_sub_pullCount_mul_eq_sum.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] {A : โ„• โ†’ ฮฉ โ†’ ๐“} {a : ๐“} {n : โ„•} {ฯ‰ : ฮฉ} {R' : โ„• โ†’ ฮฉ โ†’ โ„} (c : ๐“ โ†’ โ„) : sumRewards A R' a (n + 1) ฯ‰ - โ†‘(pullCount A a (n + 1) ฯ‰) * c a = โˆ‘ i โˆˆ Finset.range (n + 1), if A i ฯ‰ = a then R' i ฯ‰ - c a else 0

Code

lemma sumRewards_sub_pullCount_mul_eq_sum {R' : โ„• โ†’ ฮฉ โ†’ โ„} (c : ๐“ โ†’ โ„) :
    sumRewards A R' a (n + 1) ฯ‰ - pullCount A a (n + 1) ฯ‰ * c a =
      โˆ‘ i โˆˆ range (n + 1), (if A i ฯ‰ = a then R' i ฯ‰ - c a else 0)
Type uses (2)
Body uses (4)

Actions: Source ยท Open Issue

Proof
by
  induction n with
  | zero =>
    simp_rw [sumRewards_add_one, pullCount_add_one]
    simp only [sumRewards_zero, Pi.zero_apply, zero_add, pullCount_zero, Nat.cast_ite, Nat.cast_one,
      CharP.cast_eq_zero, ite_mul, one_mul, zero_mul, range_one, sum_singleton]
    grind
  | succ n hn =>
    simp_rw [sumRewards_add_one (t := n + 1), pullCount_add_one (t := n + 1)]
    split_ifs with ha
    ยท conv_rhs => rw [sum_range_succ]
      simp only [Nat.cast_add, Nat.cast_one, ha, โ†“reduceIte, add_mul, one_mul]
      grind
    ยท simp only [add_zero, hn]
      conv_rhs => rw [sum_range_succ]
      simp [ha]

Dependency graph

Type dependencies (2)

sumRewards๐Ÿ”—

DefinitionLearning.sumRewards

Sum of rewards obtained when pulling action a up to time t (exclusive).

๐Ÿ”—def
Learning.sumRewards.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„
Learning.sumRewards.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„

Code

def sumRewards (A : โ„• โ†’ ฮฉ โ†’ ๐“) (R' : โ„• โ†’ ฮฉ โ†’ โ„) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„ :=
  โˆ‘ s โˆˆ range t, if A s ฯ‰ = a then R' s ฯ‰ else 0
Used by (44)

Actions: Source ยท Open Issue

pullCount๐Ÿ”—

DefinitionLearning.pullCount

Number of times action a was chosen up to time t (excluding t).

๐Ÿ”—def
Learning.pullCount.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„•
Learning.pullCount.{u_1, u_3} {๐“ : Type u_1} {ฮฉ : Type u_3} [DecidableEq ๐“] (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„•

Code

noncomputable
def pullCount (A : โ„• โ†’ ฮฉ โ†’ ๐“) (a : ๐“) (t : โ„•) (ฯ‰ : ฮฉ) : โ„• :=
  #(filter (fun s โ†ฆ A s ฯ‰ = a) (range t))
Used by (146)

Actions: Source ยท Open Issue