Short-term Caching¶
keras_gym.caching.MonteCarloCache |
|
keras_gym.caching.NStepCache |
A convenient helper class for n-step bootstrapping. |
-
class
keras_gym.caching.
MonteCarloCache
(env, gamma)[source]¶ -
add
(self, s, a, r, done)[source]¶ Add a transition to the experience cache.
Parameters: - s : state observation
A single state observation.
- a : action
A single action.
- r : float
A single observed reward.
- done : bool
Whether the episode has finished.
-
flush
(self)[source]¶ Flush all transitions from the cache.
Returns: - S, A, G : tuple of arrays
The returned tuple represents a batch of preprocessed transitions:
-
-
class
keras_gym.caching.
NStepCache
(env, n, gamma)[source]¶ A convenient helper class for n-step bootstrapping.
Parameters: - env : gym environment
The main gym environment. This is needed to determine
num_actions
.- n : positive int
The number of steps over which to bootstrap.
- gamma : float between 0 and 1
The amount by which to discount future rewards.
-
add
(self, s, a, r, done)[source]¶ Add a transition to the experience cache.
Parameters: - s : state observation
A single state observation.
- a : action
A single action.
- r : float
A single observed reward.
- done : bool
Whether the episode has finished.
-
flush
(self)[source]¶ Flush all transitions from the cache.
Returns: - S, A, Rn, In, S_next, A_next : tuple of arrays
The returned tuple represents a batch of preprocessed transitions:
These are typically used for bootstrapped updates, e.g. minimizing the bootstrapped MSE:
\[\left( R^{(n)}_t + I^{(n)}_t\,Q(S_{t+n},A_{t+n}) - Q(S_t,A_t) \right)^2\]
-
pop
(self)[source]¶ Pop a single transition from the cache.
Returns: - S, A, Rn, In, S_next, A_next : tuple of arrays, batch_size=1
The returned tuple represents a batch of preprocessed transitions:
These are typically used for bootstrapped updates, e.g. minimizing the bootstrapped MSE:
\[\left( R^{(n)}_t + I^{(n)}_t\,Q(S_{t+n},A_{t+n}) - Q(S_t,A_t) \right)^2\]