Thompson Sampling #
This file defines the Thompson sampling algorithm. This algorithm samples an action according to its probability of being optimal under the posterior over environments given the history so far.
Main definitions #
tsAlgorithm hK Q Îș: a Thompson sampling algorithm with actions inFin KgivenhK : 0 < K, a prior distribution over parametersQ : Measure đ, and a Markov kernelÎș : Kernel (đ Ă Fin K) â. This kernel defines how a parametere : đgives rise to a stationary environment:stationaryEnv (Îș.sectR e) : Environment (Fin K) â.
Main results #
hasCondDistrib_action: if Thompson sampling has the correct prior over environments, then the conditional distribution of the next action given the history so far is equal to the conditional distribution of the best action given the history so far.
The Thompson sampling policy samples an action according to its probability of being optimal under the posterior over environments given the history so far. The posterior under a uniform algorithm is used to avoid a circular definition.
Equations
Instances For
The initial action is sampled according to its probability of being optimal under the prior over environments.
Equations
Instances For
The Thompson sampling algorithm with actions in Fin K, where Q : Measure đ is a prior
distribution over parameters, and Îș : Kernel (đ Ă Fin K) â is a Markov kernel that defines the
stationary environment stationaryEnv (Îș.sectR e) that corresponds to a parameter e : đ.
At every time n, the Thompson sampling policy uses the posterior over the parameters given the
history up to time n to derive the probability of each action being optimal. The action for time
n is sampled according to these probabilities.
Equations
- Bandits.tsAlgorithm hK Q Îș = { policy := Bandits.TS.policy hK Q Îș, h_policy := âŻ, p0 := Bandits.TS.initialPolicy hK Q Îș, hp0 := ⯠}
Instances For
If Thompson sampling has the correct prior over environments, then the conditional distribution of the next action given the history so far is equal to the conditional distribution of the best action given the history so far.