Bayesian regret #
This file defines actionMean, bestAction, gap, and regret as random variables in a
measurable space Ω. These definitions are useful when IsBayesAlgEnvSeq Q κ alg E A Y P.
Recall that IsBayesAlgEnvSeq Q κ alg E A Y P states that there is a measure P : Measure Ω such
that the parameter E : Ω → 𝓔 has law Q and that the sequences of actions A : ℕ → Ω → 𝓐 and
feedbacks Y : ℕ → Ω → 𝓨 are generated by the algorithm alg : Algorithm 𝓐 𝓨 interacting with an
underlying environment that depends on E and κ (stationaryEnv (κ.sectR (E ω)))
Main definitions #
actionMean κ E a: the mean feedback associated with actiona : 𝓐based on the parameterE, which defines the underlying stationary environment together with the kernelκ.bestAction κ E: (one of) the action(s) with the highest associated mean feedback based onE.gap κ E A n: the difference between the highest mean feedback associated with an action and the mean feedback associated with the action at timenbased onEand the sequence of actionsA.regret κ E A n: the regret at timenbased onEand the sequence of actionsA. IfIsBayesAlgEnvSeq Q κ alg E A Y P, thenP[regret κ E A n]is the so-called Bayesian regret of algorithmalgunder the priorQ.
A random variable that gives the mean feedback of action a.
Instances For
A random variable that gives the action with the highest mean feedback.
Equations
- Learning.IsBayesAlgEnvSeq.bestAction κ E ω = measurableArgmax (fun (ω' : Ω) (a : 𝓐) => Learning.IsBayesAlgEnvSeq.actionMean κ E a ω') ω
Instances For
A random variable that gives the gap at time n.
Equations
- Learning.IsBayesAlgEnvSeq.gap κ E A n ω = Bandits.gap (κ.sectR (E ω)) (A n ω)
Instances For
The gap is non-negative if the means are bounded by u : ℝ (even if 𝓐 is not Finite).
A random variable that gives the regret at time n.
Equations
- Learning.IsBayesAlgEnvSeq.regret κ E A n ω = Bandits.regret (κ.sectR (E ω)) A n ω