Activation functions

Activation functions#

flax.nnx.celu(x, alpha=1.0)[source]#

Continuously-differentiable exponential linear unit activation.

Computes the element-wise function:

\[\begin{split}\mathrm{celu}(x) = \begin{cases} x, & x > 0\\ \alpha \left(\exp(\frac{x}{\alpha}) - 1\right), & x \le 0 \end{cases}\end{split}\]

For more information, see Continuously Differentiable Exponential Linear Units.

Parameters:

x – input array
alpha – array or scalar (default: 1.0)

Returns:

An array.

flax.nnx.elu(x, alpha=1.0)[source]#

Exponential linear unit activation function.

Computes the element-wise function:

\[\begin{split}\mathrm{elu}(x) = \begin{cases} x, & x > 0\\ \alpha \left(\exp(x) - 1\right), & x \le 0 \end{cases}\end{split}\]

Parameters:

x – input array
alpha – scalar or array of alpha values (default: 1.0)

Returns:

An array.

See also

selu()

flax.nnx.gelu(x, approximate=True)[source]#

Gaussian error linear unit activation function.

If approximate=False, computes the element-wise function:

\[\mathrm{gelu}(x) = \frac{x}{2} \left(\mathrm{erfc} \left( \frac{-x}{\sqrt{2}} \right) \right)\]

If approximate=True, uses the approximate formulation of GELU:

\[\mathrm{gelu}(x) = \frac{x}{2} \left(1 + \mathrm{tanh} \left( \sqrt{\frac{2}{\pi}} \left(x + 0.044715 x^3 \right) \right) \right)\]

For more information, see Gaussian Error Linear Units (GELUs), section 2.

Parameters:

x – input array
approximate – whether to use the approximate or exact formulation.

flax.nnx.glu(x, axis=-1)[source]#

Gated linear unit activation function.

Computes the function:

\[\mathrm{glu}(x) = x\left[\ldots, 0:\frac{n}{2}, \ldots\right] \cdot \mathrm{sigmoid} \left( x\left[\ldots, \frac{n}{2}:n, \ldots\right] \right)\]

where the array is split into two along axis. The size of the axis dimension must be divisible by two.

Parameters:

x – input array
axis – the axis along which the split should be computed (default: -1)

Returns:

An array.

See also

sigmoid()

flax.nnx.hard_sigmoid(x)[source]#

Hard Sigmoid activation function.

Computes the element-wise function

\[\mathrm{hard\_sigmoid}(x) = \frac{\mathrm{relu6}(x + 3)}{6}\]

Parameters:: x – input array
Returns:: An array.

See also

relu6()

flax.nnx.hard_silu(x)[source]#

Hard SiLU (swish) activation function

Computes the element-wise function

\[\mathrm{hard\_silu}(x) = x \cdot \mathrm{hard\_sigmoid}(x)\]

Both hard_silu() and hard_swish() are aliases for the same function.

Parameters:: x – input array
Returns:: An array.

See also

hard_sigmoid()

flax.nnx.hard_swish(x)#

Hard SiLU (swish) activation function

Computes the element-wise function

\[\mathrm{hard\_silu}(x) = x \cdot \mathrm{hard\_sigmoid}(x)\]

Both hard_silu() and hard_swish() are aliases for the same function.

Parameters:: x – input array
Returns:: An array.

See also

hard_sigmoid()

flax.nnx.hard_tanh(x)[source]#

Hard \(\mathrm{tanh}\) activation function.

Computes the element-wise function:

\[\begin{split}\mathrm{hard\_tanh}(x) = \begin{cases} -1, & x < -1\\ x, & -1 \le x \le 1\\ 1, & 1 < x \end{cases}\end{split}\]

Parameters:: x – input array
Returns:: An array.

flax.nnx.leaky_relu(x, negative_slope=0.01)[source]#

Leaky rectified linear unit activation function.

Computes the element-wise function:

\[\begin{split}\mathrm{leaky\_relu}(x) = \begin{cases} x, & x \ge 0\\ \alpha x, & x < 0 \end{cases}\end{split}\]

where \(\alpha\) = negative_slope.

Parameters:

x – input array
negative_slope – array or scalar specifying the negative slope (default: 0.01)

Returns:

An array.

See also

relu()

flax.nnx.log_sigmoid(x)[source]#

Log-sigmoid activation function.

Computes the element-wise function:

\[\mathrm{log\_sigmoid}(x) = \log(\mathrm{sigmoid}(x)) = -\log(1 + e^{-x})\]

Parameters:: x – input array
Returns:: An array.

See also

sigmoid()

flax.nnx.log_softmax(x, axis=-1, where=None)[source]#

Log-Softmax function.

Computes the logarithm of the softmax function, which rescales elements to the range \([-\infty, 0)\).

\[\mathrm{log\_softmax}(x)_i = \log \left( \frac{\exp(x_i)}{\sum_j \exp(x_j)} \right)\]

Parameters:

x – input array
axis – the axis or axes along which the log_softmax should be computed. Either an integer, tuple of integers, or None (all axes).
where – Elements to include in the log_softmax. The output for any masked-out element is minus infinity.

Returns:

An array.

Note

If any input values are +inf, the result will be all NaN: this reflects the fact that inf / inf is not well-defined in the context of floating-point math.

See also

softmax()

flax.nnx.logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False, where=None)[source]#

Log-sum-exp reduction.

JAX implementation of scipy.special.logsumexp().

\[\operatorname{logsumexp} a = \log \sum_i b_i \exp a_i\]

where the \(i\) indices range over one or more dimensions to be reduced.

Parameters:

a – the input array
axis – int or sequence of ints, default=None. Axis along which the sum to be computed. If None, the sum is computed along all the axes.
b – scaling factors for the exponentials. Must be broadcastable to the shape of a.
keepdims – If True, the axes that are reduced are left in the output as dimensions of size 1.
return_sign – If True, the output will be a (result, sign) pair, where sign is the sign of the sums and result contains the logarithms of their absolute values. If False only result is returned and it will contain NaN values if the sums are negative.
where – Elements to include in the reduction.

Returns:

Either an array result or a pair of arrays (result, sign), depending on the value of the return_sign argument.

See also

jax.nn.logmeanexp()

flax.nnx.one_hot(x, num_classes, *, dtype=None, axis=-1, out_sharding=None)[source]#

One-hot encodes the given indices.

Each index in the input x is encoded as a vector of zeros of length num_classes with the element at index set to one:

>>> jax.nn.one_hot(jnp.array([0, 1, 2]), 3)
Array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)

Indices outside the range \([0, \text{num\_classes})\) will be encoded as zeros:

>>> jax.nn.one_hot(jnp.array([-1, 3]), 3)
Array([[0., 0., 0.],
       [0., 0., 0.]], dtype=float32)

Parameters:

x – A tensor of indices.
num_classes – Number of classes in the one-hot dimension.
dtype – optional, a float dtype for the returned values (default jnp.float_).
axis – the axis or axes along which the function should be computed.

flax.nnx.relu(x)[source]#

Rectified linear unit activation function.

Computes the element-wise function:

\[\mathrm{relu}(x) = \max(x, 0)\]

except under differentiation, we take:

\[\nabla \mathrm{relu}(0) = 0\]

For more information see Numerical influence of ReLU’(0) on backpropagation.

Parameters:: x – input array
Returns:: An array.

Examples

>>> jax.nn.relu(jax.numpy.array([-2., -1., -0.5, 0, 0.5, 1., 2.]))
Array([0. , 0. , 0. , 0. , 0.5, 1. , 2. ], dtype=float32)

See also

relu6()

flax.nnx.selu(x)[source]#

Scaled exponential linear unit activation.

Computes the element-wise function:

\[\begin{split}\mathrm{selu}(x) = \lambda \begin{cases} x, & x > 0\\ \alpha e^x - \alpha, & x \le 0 \end{cases}\end{split}\]

where \(\lambda = 1.0507009873554804934193349852946\) and \(\alpha = 1.6732632423543772848170429916717\).

For more information, see Self-Normalizing Neural Networks.

Parameters:: x – input array
Returns:: An array.

See also

elu()

flax.nnx.sigmoid(x)[source]#

Sigmoid activation function.

Computes the element-wise function:

\[\mathrm{sigmoid}(x) = \frac{1}{1 + e^{-x}}\]

Parameters:: x – input array
Returns:: An array.

See also

log_sigmoid()

flax.nnx.identity(x)[source]#

Identity activation function.

Returns the argument unmodified.

Parameters:: x – input array
Returns:: The argument x unmodified.

Examples

>>> jax.nn.identity(jax.numpy.array([-2., -1., -0.5, 0, 0.5, 1., 2.]))
Array([-2. , -1. , -0.5, 0. , 0.5, 1. , 2. ], dtype=float32)

flax.nnx.silu(x)[source]#

SiLU (aka swish) activation function.

Computes the element-wise function:

\[\mathrm{silu}(x) = x \cdot \mathrm{sigmoid}(x) = \frac{x}{1 + e^{-x}}\]

swish() and silu() are both aliases for the same function.

Parameters:: x – input array
Returns:: An array.

See also

sigmoid()

flax.nnx.soft_sign(x)[source]#

Soft-sign activation function.

Computes the element-wise function

\[\mathrm{soft\_sign}(x) = \frac{x}{|x| + 1}\]

Parameters:: x – input array

flax.nnx.softmax(x, axis=-1, where=None)[source]#

Softmax function.

Computes the function which rescales elements to the range \([0, 1]\) such that the elements along axis sum to \(1\).

\[\mathrm{softmax}(x) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]

Parameters:

x – input array
axis – the axis or axes along which the softmax should be computed. The softmax output summed across these dimensions should sum to \(1\). Either an integer, tuple of integers, or None (all axes).
where – Elements to include in the softmax. The output for any masked-out element is zero.

Returns:

An array.

Note

If any input values are +inf, the result will be all NaN: this reflects the fact that inf / inf is not well-defined in the context of floating-point math.

See also

log_softmax()

flax.nnx.softplus(x)[source]#

Softplus activation function.

Computes the element-wise function

\[\mathrm{softplus}(x) = \log(1 + e^x)\]

Parameters:: x – input array

flax.nnx.standardize(x, axis=-1, mean=None, variance=None, epsilon=1e-05, where=None, *, algorithm='fast')[source]#

Standardizes input to zero mean and unit variance.

The standardization is given by:

\[x_{std} = \frac{x - \langle x\rangle}{\sqrt{\langle(x - \langle x\rangle)^2\rangle + \epsilon}}\]

where \(\langle x\rangle\) indicates the mean of \(x\), and \(\epsilon\) is a small correction factor introduced to avoid division by zero.

Parameters:

x – input array to be standardized.
axis – integer, tuple of integers, or None (all axes), representing the axes along which to standardize. Defaults to the last axis (-1).
mean – optionally specify the mean used for standardization. If not specified, then x.mean(axis, where=where) will be used.
variance – optionally specify the variance used for standardization. If not specified, then x.var(axis, where=where) will be used.
epsilon – correction factor added to variance to avoid division by zero; defaults to 1E-5.
where – optional boolean mask specifying which elements to use when computing the mean and variance.
algorithm – variance computation algorithm. "fast" uses mean(x^2) - mean(x)^2 which may be faster but can suffer from catastrophic cancellation and produce different results in eager vs JIT contexts. "stable" uses the two-pass formula mean((x - mean(x))^2) which is numerically stable. Default is "fast" for backward compatibility.

Returns:

An array of the same shape as x containing the standardized input.

flax.nnx.swish(x)#

SiLU (aka swish) activation function.

Computes the element-wise function:

\[\mathrm{silu}(x) = x \cdot \mathrm{sigmoid}(x) = \frac{x}{1 + e^{-x}}\]

swish() and silu() are both aliases for the same function.

Parameters:: x – input array
Returns:: An array.

See also

sigmoid()

flax.nnx.tanh(x, /)#

Calculate element-wise hyperbolic tangent of input.

JAX implementation of numpy.tanh.

The hyperbolic tangent is defined by:

\[tanh(x) = \frac{sinh(x)}{cosh(x)} = \frac{e^x - e^{-x}}{e^x + e^{-x}}\]

Parameters:: x – input array or scalar.
Returns:: An array containing the hyperbolic tangent of each element of x, promoting to inexact dtype.

Note

jnp.tanh is equivalent to computing -1j * jnp.tan(1j * x).

See also

jax.numpy.sinh(): Computes the element-wise hyperbolic sine of the input.
jax.numpy.cosh(): Computes the element-wise hyperbolic cosine of the input.
jax.numpy.arctanh(): Computes the element-wise inverse of hyperbolic tangent of the input.

Examples

>>> x = jnp.array([[-1, 0, 1],
...                [3, -2, 5]])
>>> with jnp.printoptions(precision=3, suppress=True):
...   jnp.tanh(x)
Array([[-0.762,  0.   ,  0.762],
       [ 0.995, -0.964,  1.   ]], dtype=float32)
>>> with jnp.printoptions(precision=3, suppress=True):
...   -1j * jnp.tan(1j * x)
Array([[-0.762+0.j,  0.   -0.j,  0.762-0.j],
       [ 0.995-0.j, -0.964+0.j,  1.   -0.j]],      dtype=complex64, weak_type=True)

For complex-valued input:

>>> with jnp.printoptions(precision=3, suppress=True):
...   jnp.tanh(2-5j)
Array(1.031+0.021j, dtype=complex64, weak_type=True)
>>> with jnp.printoptions(precision=3, suppress=True):
...   -1j * jnp.tan(1j * (2-5j))
Array(1.031+0.021j, dtype=complex64, weak_type=True)

Activation functions

Contents

Activation functions#