flax.optim.Adadelta

class flax.optim.Adadelta(learning_rate=None, rho=0.9, eps=1e-06, weight_decay=0.0)[source]

Adadelta optimizer.

Reference: [ADADELTA: An Adaptive Learning Rate Method](https://arxiv.org/abs/1212.5701) (Zeiler 2012)

Parameters
  • learning_rate (Optional[float]) –

  • rho (float) –

  • eps (float) –

  • weight_decay (float) –

__init__(learning_rate=None, rho=0.9, eps=1e-06, weight_decay=0.0)[source]

Constructor for the Adadelta optimizer.

Parameters
  • learning_rate (Optional[float]) – the step size used to update the parameters.

  • rho (float) – coefficient used for computing a running average

  • eps (float) – term added to the denominator to improve numerical stability

  • weight_decay (float) – the weight decay parameter for l2 regularization

Methods

__init__([learning_rate, rho, eps, weight_decay])

Constructor for the Adadelta optimizer.

apply_gradient(hyper_params, params, state, ...)

Applies a gradient for a set of parameters.

apply_param_gradient(step, hyper_params, ...)

Apply per-parameter gradients.

create(target[, focus])

Creates a new optimizer for the given target.

init_param_state(param)

Initialize parameter state.

init_state(params)

restore_state(opt_target, opt_state, state_dict)

Restore the optimizer target and state from the state dict.

state_dict(target, state)

update_hyper_params(**hyper_param_overrides)

Updates the hyper parameters with a set of overrides.