flax.optim.Adadelta¶
- class flax.optim.Adadelta(learning_rate=None, rho=0.9, eps=1e-06, weight_decay=0.0)[source]¶
Adadelta optimizer.
Reference: [ADADELTA: An Adaptive Learning Rate Method](https://arxiv.org/abs/1212.5701) (Zeiler 2012)
- Parameters
learning_rate (Optional[float]) –
rho (float) –
eps (float) –
weight_decay (float) –
- __init__(learning_rate=None, rho=0.9, eps=1e-06, weight_decay=0.0)[source]¶
Constructor for the Adadelta optimizer.
- Parameters
learning_rate (Optional[float]) – the step size used to update the parameters.
rho (float) – coefficient used for computing a running average
eps (float) – term added to the denominator to improve numerical stability
weight_decay (float) – the weight decay parameter for l2 regularization
Methods
__init__
([learning_rate, rho, eps, weight_decay])Constructor for the Adadelta optimizer.
apply_gradient
(hyper_params, params, state, ...)Applies a gradient for a set of parameters.
apply_param_gradient
(step, hyper_params, ...)Apply per-parameter gradients.
create
(target[, focus])Creates a new optimizer for the given target.
init_param_state
(param)Initialize parameter state.
init_state
(params)restore_state
(opt_target, opt_state, state_dict)Restore the optimizer target and state from the state dict.
state_dict
(target, state)update_hyper_params
(**hyper_param_overrides)Updates the hyper parameters with a set of overrides.