Loss Functions¶
This is a collection of custom keras-compatible loss functions that are used throughout this package.
Note
These functions generally require the Tensorflow backend.
Value Losses¶
These loss functions can be applied to learning a value function. Most of the losses are actually already provided by keras. The value-function losses included here are minor adaptations of the available keras losses.
Policy Losses¶
The way policy losses are implemented is slightly different from value losses
due to their non-standard structure. A policy loss is implemented in a method
on updateable policy objects (see below). If you need to implement a
custom policy loss, you can override this policy_loss_with_metrics()
method.
-
BaseUpdateablePolicy.
policy_loss_with_metrics
(Adv, A=None)[source]¶ This method constructs the policy loss as a scalar-valued Tensor, together with a dictionary of metrics (also scalars).
This method may be overridden to construct a custom policy loss and/or to change the accompanying metrics.
Parameters: - Adv : 1d Tensor, shape: [batch_size]
A batch of advantages.
- A : nd Tensor, shape: [batch_size, …]
A batch of actions taken under the behavior policy. For some choices of policy loss, e.g.
update_strategy='sac'
this input is ignored.
Returns: - loss, metrics : (Tensor, dict of Tensors)
The policy loss along with some metrics, which is a dict of type
{name <str>: metric <Tensor>}
. The loss and each of the metrics (dict values) are scalar Tensors, i.e. Tensors withndim=0
.The
loss
is passed to a keras Model usingtrain_model.add_loss(loss)
. Similarly, each metric in the metric dict is passed to the model usingtrain_model.add_metric(metric, name=name, aggregation='mean')
.