flax.linen package

Linen is the Flax Module system. Read more about our design goals in the Linen README.

Module

class flax.linen.Module(**kwargs)[source]

Base class for all neural network modules. Layers and models should subclass this class.

All Flax Modules are Python 3.7 dataclasses. Since dataclasses take over __init__, you should instead override setup(), which is automatically called to initialize the module.

Modules can contain submodules, and in this way can be nested in a tree structure. Submodels can be assigned as regular attributes inside the setup() method.

You can define arbitrary “forward pass” methods on your Module subclass. While no methods are special-cased, __call__ is a popular choice because it allows you to use module instances as if they are functions:

from flax import linen as nn

class Module(nn.Module):
  features: Tuple[int] = (16, 4)

  def setup(self):
    self.dense1 = Dense(self.features[0])
    self.dense2 = Dense(self.features[1])

  def __call__(self, x):
    return self.dense2(nn.relu(self.dense1(x)))

Optionally, for more concise module implementations where submodules definitions are co-located with their usage, you can use the compact() wrapper.

__setattr__(name, val)[source]

Sets an attribute on this Module.

We overload setattr solely to support pythonic naming via assignment of submodules in the special setup() function:

self.submodule_name = MyModule(...)

We also support lists and other general pytrees, e.g.:

self.submodules = [MyModule0(..), MyModule1(..), ...]
Parameters
  • name (str) – Attribute to set.

  • val (Any) – Value of the attribute.

apply(variables, *args, rngs=None, method=None, mutable=False, capture_intermediates=False, **kwargs)[source]

Applies a module method to variables and returns output and modified variables.

Note that method should be set if one would like to call apply on a different class method than __call__. For instance, suppose a Transformer modules has a method called encode, then the following calls apply on that method:

model = Transformer()
encoded = model.apply({'params': params}, x, method=Transformer.encode)

If a function instance is provided, the unbound function is used. For instance, the example below is equivalent to the one above:

encoded = model.apply({'params': params}, x, method=model.encode)

Note method can also be a function that is not defined in Transformer. In that case, the function should have at least one argument representing an instance of the Module class:

def other_fn(instance, ...):
  instance.some_module_attr(...)
  ...

model.apply({'params': params}, x, method=other_fn)
Parameters
  • variables (Mapping[str, Mapping[str, Any]]) – A dictionary containing variables keyed by variable collections. See flax.core.variables for more details about variables.

  • rngs (Optional[Dict[str, Any]]) – a dict of PRNGKeys to initialize the PRNG sequences. The “params” PRNG sequence is used to initialize parameters.

  • method (Optional[Callable[[...], Any]]) – A function to call apply on. This is generally a function in the module. If provided, applies this method. If not provided, applies the __call__ method of the module.

  • mutable (Union[bool, str, Container[str], flax.core.scope.DenyList]) – Can be bool, str, or list. Specifies which collections should be treated as mutable: bool: all/no collections are mutable. str: The name of a single mutable collection. list: A list of names of mutable collections.

  • capture_intermediates (Union[bool, Callable[[flax.linen.module.Module, str], bool]]) – If True, captures intermediate return values of all Modules inside the “intermediates” collection. By default only the return values of all __call__ methods are stored. A function can be passed to change the filter behavior. The filter function takes the Module instance and method name and returns a bool indicating whether the output of that method invocation should be stored.

Returns

If mutable is False, returns output. If any collections are mutable, returns (output, vars), where vars are is a dict of the modified collections.

Return type

Union[Any, Tuple[Any, flax.core.frozen_dict.FrozenDict[str, Mapping[str, Any]]]]

bind(variables, *args, rngs=None, mutable=False)[source]

Creates an interactive Module instance by binding variables and RNGs.

bind provides an “interactive” instance of a Module directly without transforming a function with apply. This is particalary useful for debugging and interactive use cases like notebooks where a function would limit the ability split up code into different cells.

Once the variables (and optionally RNGs) are bound to a Module it becomes a stateful object. Note that idiomatic JAX is functional and therefore an interactive instance does not mix well well with vanilla JAX APIs. bind() should only be used for interactive experimentation, and in all other cases we strongly encourage to use apply() instead.

Example:

import jax
import jax.numpy as jnp
import flax.linen as nn

class AutoEncoder(nn.Module):
  def setup(self):
    self.encoder = nn.Dense(3)
    self.decoder = nn.Dense(5)

  def __call__(self, x):
    return self.decoder(self.encoder(x))

x = jnp.ones((16, 9))
ae = AutoEncoder()
variables = ae.init(jax.random.PRNGKey(0), x)
model = ae.bind(variables)
z = model.encoder(x)
x_reconstructed = model.decoder(z)
Parameters
  • variables (Mapping[str, Mapping[str, Any]]) –

    A dictionary containing variables keyed by variable

    collections. See flax.core.variables for more details about variables.

    rngs: a dict of PRNGKeys to initialize the PRNG sequences. mutable: Can be bool, str, or list. Specifies which collections should be

    treated as mutable: bool: all/no collections are mutable. str: The name of a single mutable collection. list: A list of names of mutable collections.

  • rngs (Optional[Dict[str, Any]]) –

  • mutable (Union[bool, str, Container[str], flax.core.scope.DenyList]) –

Returns

A copy of this instance with bound variables and RNGs.

init(rngs, *args, method=None, mutable=DenyList(deny='intermediates'), **kwargs)[source]

Initializes a module method with variables and returns modified variables.

Jitting init initializes a model lazily using only the shapes of the provided arguments, and avoids computing the forward pass with actual values. Example:

jit_init = jax.jit(SomeModule.init)
jit_init(rng, jnp.ones(input_shape, jnp.float32))
Parameters
  • rngs (Union[Any, Dict[str, Any]]) – The rngs for the variable collections.

  • method (Optional[Callable[[...], Any]]) – An optional method. If provided, applies this method. If not provided, applies the __call__ method.

  • mutable (Union[bool, str, Container[str], flax.core.scope.DenyList]) – Can be bool, str, or list. Specifies which collections should be treated as mutable: bool: all/no collections are mutable. str: The name of a single mutable collection. list: A list of names of mutable collections. By default all collections except “intermediates” are mutable.

Returns

The initialized variable dict.

Return type

flax.core.frozen_dict.FrozenDict[str, Mapping[str, Any]]

init_with_output(rngs, *args, method=None, mutable=DenyList(deny='intermediates'), **kwargs)[source]

Initializes a module method with variables and returns output and modified variables.

Parameters
  • rngs (Union[Any, Dict[str, Any]]) – The rngs for the variable collections.

  • method (Optional[Callable[[...], Any]]) – An optional method. If provided, applies this method. If not provided, applies the __call__ method.

  • mutable (Union[bool, str, Container[str], flax.core.scope.DenyList]) – Can be bool, str, or list. Specifies which collections should be treated as mutable: bool: all/no collections are mutable. str: The name of a single mutable collection. list: A list of names of mutable collections. By default all collections except “intermediates” are mutable.

Returns

(output, vars)`, where vars are is a dict of the modified collections.

Return type

Tuple[Any, flax.core.frozen_dict.FrozenDict[str, Mapping[str, Any]]]

make_rng(name)[source]

Returns a new RNG key from a given RNG sequence for this Module.

The new RNG key is split from the previous one. Thus, every call to make_rng returns a new RNG key, while still guaranteeing full reproducibility.

TODO: Link to Flax RNG design note.

Parameters

name (str) – The RNG sequence name.

Returns

The newly generated RNG key.

Return type

Any

param(name, init_fn, *init_args)[source]

Declares and returns a parameter in this Module.

Parameters are read-only variables in the collection named “params”. See flax.core.variables for more details on variables.

The first argument of init_fn is assumed to be a PRNG key, which is provided automatically and does not have to be passed using init_args:

mean = self.param('mean', lecun_normal(), (2, 2))

In the example above, the function lecun_normal expects two arguments: key and shape, but only shape has to be provided explicitly; key is set automatically using the PRNG for params that is passed when initializing the module using init().

Parameters
  • name (str) – The parameter name.

  • init_fn (Callable[[...], flax.linen.module.T]) – The function that will be called to compute the initial value of this variable. This function will only be called the first time this parameter is used in this module.

  • *init_args – The arguments to pass to init_fn.

Returns

The value of the initialized parameter.

Return type

flax.linen.module.T

setup()[source]

Initializes a Module lazily (similar to a lazy __init__).

setup is called once lazily on a module instance when a module is bound, immediately before any other methods like __call__ are invoked, or before a setup-defined attribute on self is accessed.

This can happen in three cases:

  1. Immediately when invoking apply(), init() or init_and_output().

  2. Once the module is given a name by being assigned to an attribute of another module inside the other module’s setup method (see __setattr__()):

    class MyModule(nn.Module):
      def setup(self):
        submodule = Conv(...)
    
        # Accessing `submodule` attributes does not yet work here.
    
        # The following line invokes `self.__setattr__`, which gives
        # `submodule` the name "conv1".
        self.conv1 = submodule
    
        # Accessing `submodule` attributes or methods is now safe and
        # either causes setup() to be called once.
    
  3. Once a module is constructed inside a method wrapped with compact(), immediately before another method is called or setup defined attribute is accessed.

sow(col, name, value, reduce_fn=<function <lambda>>, init_fn=<function <lambda>>)[source]

Stores a value in a collection.

Collections can be used to collect intermediate values without the overhead of explicitly passing a container through each Module call.

If the target collection is not mutable sow behaves like a no-op and returns False.

Example:

import jax
import jax.numpy as jnp
import flax.linen as nn

class Foo(nn.Module):
  @nn.compact
  def __call__(self, x):
    h = nn.Dense(4)(x)
    self.sow('intermediates', 'h', h)
    return nn.Dense(2)(h)

x = jnp.ones((16, 9))
model = Foo()
variables = model.init(jax.random.PRNGKey(0), x)
y, state = model.apply(variables, x, mutable=['intermediates'])
print(state['intermediates'])  # {'h': (...,)}

By default the values are stored in a tuple and each stored value is appended at the end. This way all intermediates can be tracked when the same module is called multiple times. Alternatively, a custom init/reduce function can be passed:

class Foo2(nn.Module):
  @nn.compact
  def __call__(self, x):
    init_fn = lambda: 0
    reduce_fn = lambda a, b: a + b
    self.sow('intermediates', 'h', x,
             init_fn=init_fn, reduce_fn=reduce_fn)
    self.sow('intermediates', 'h', x * 2,
             init_fn=init_fn, reduce_fn=reduce_fn)
    return x

model = Foo2()
variables = model.init(jax.random.PRNGKey(0), x)
y, state = model.apply(variables, jnp.ones((1, 1)), mutable=['intermediates'])
print(state['intermediates'])  # ==> {'h': [[3.]]}
Parameters
  • col (str) – The name of the variable collection.

  • name (str) – The name of the variable.

  • value (flax.linen.module.T) – The value of the variable.

  • reduce_fn (Callable[[flax.linen.module.K, flax.linen.module.T], flax.linen.module.K]) – The function used to combine the existing value with the new value the default is to append the value to a tuple.

  • init_fn (Callable[[], flax.linen.module.K]) – For the first value stored reduce_fn will be passed the result of init_fn together with the value to be stored. The default is an empty tuple.

Returns

True if the value has been stored successfully, False otherwise.

Return type

bool

variable(col, name, init_fn, *init_args)[source]

Declares and returns a variable in this Module.

See flax.core.variables for more information. See also param() for a shorthand way to define read-only variables in the “params” collection.

Contrary to param(), all arguments passing using init_fn should be passed on explicitly:

key = self.make_rng('stats')
mean = self.variable('stats', 'mean', lecun_normal(), key, (2, 2))

In the example above, the function lecun_normal expects two arguments: key and shape, and both have to be passed on. The PRNG for stats has to be provided explicitly when calling init() and apply().

Parameters
  • col (str) – The variable collection name.

  • name (str) – The variable name.

  • init_fn – The function that will be called to compute the initial value of this variable. This function will only be called the first time this variable is used in this module.

  • *init_args – The arguments to pass to init_fn.

Returns

A flax.core.variables.Variable that can be read or set via “.value” attribute. Throws an error if the variable exists already.

Return type

flax.core.scope.Variable

property variables: Mapping[str, Mapping[str, Any]]

Returns the variables in this module.

Init/Apply

flax.linen.apply(fn, module, mutable=False, capture_intermediates=False)[source]

Creates an apply function to call fn with a bound module.

Unlike Module.apply this function returns a new function with the signature (variables, *args, rngs=None, **kwargs) -> T where T is the return type of fn. If mutable is not False the return type is a tuple where the second item is a FrozenDict with the mutated variables.

The apply function that is returned can be directly composed with JAX transformations like jax.jit:

def f(foo, x):
  z = foo.encode(x)
  y = foo.decode(z)
  # ...
  return y

foo = Foo()
f_jitted = jax.jit(nn.apply(f, foo))
f_jitted(variables, x)
Parameters
  • fn (Callable[[...], Any]) – The function that should be applied. The first argument passed will be an module instance of the module with variables and RNGs bound to it.

  • module (flax.linen.module.Module) – The Module that will be used to bind variables and RNGs to. The Module passed as the first argument to fn will be a clone of module.

  • mutable (Union[bool, str, Container[str], flax.core.scope.DenyList]) – Can be bool, str, or list. Specifies which collections should be treated as mutable: bool: all/no collections are mutable. str: The name of a single mutable collection. list: A list of names of mutable collections.

  • capture_intermediates (Union[bool, Callable[[flax.linen.module.Module, str], bool]]) – If True, captures intermediate return values of all Modules inside the “intermediates” collection. By default only the return values of all __call__ methods are stored. A function can be passed to change the filter behavior. The filter function takes the Module instance and method name and returns a bool indicating whether the output of that method invocation should be stored.

Returns

The apply function wrapping fn.

Return type

Callable[[…], Any]

flax.linen.init(fn, module, mutable=DenyList(deny='intermediates'))[source]

Creates an init function to call fn with a bound module.

Unlike Module.init this function returns a new function with the signature (rngs, *args, **kwargs) -> variables. The rngs can be a dict of PRNGKeys or a single `PRNGKey which is equivalent to passing a dict with one PRNGKey with the name “params”.

The init function that is returned can be directly composed with JAX transformations like jax.jit:

def f(foo, x):
  z = foo.encode(x)
  y = foo.decode(z)
  # ...
  return y

foo = Foo()
f_jitted = jax.jit(nn.init(f, foo))
variables = f_jitted(rng, x)
Parameters
  • fn (Callable[[...], Any]) – The function that should be applied. The first argument passed will be an module instance of the module with variables and RNGs bound to it.

  • module (flax.linen.module.Module) – The Module that will be used to bind variables and RNGs to. The Module passed as the first argument to fn will be a clone of module.

  • mutable (Union[bool, str, Container[str], flax.core.scope.DenyList]) – Can be bool, str, or list. Specifies which collections should be treated as mutable: bool: all/no collections are mutable. str: The name of a single mutable collection. list: A list of names of mutable collections. By default all collections except “intermediates” are mutable.

Returns

The init function wrapping fn.

Return type

Callable[[…], flax.core.frozen_dict.FrozenDict[str, Mapping[str, Any]]]

flax.linen.init_with_output(fn, module, mutable=DenyList(deny='intermediates'))[source]

Creates an init function to call fn with a bound module that also returns the function outputs.

Unlike Module.init_with_output this function returns a new function with the signature (rngs, *args, **kwargs) -> (T, variables) where T is the return type of fn. The rngs can be a dict of PRNGKeys or a single `PRNGKey which is equivalent to passing a dict with one PRNGKey with the name “params”.

The init function that is returned can be directly composed with JAX transformations like jax.jit:

def f(foo, x):
  z = foo.encode(x)
  y = foo.decode(z)
  # ...
  return y

foo = Foo()
f_jitted = jax.jit(nn.init_with_output(f, foo))
y, variables = f_jitted(rng, x)
Parameters
  • fn (Callable[[...], Any]) – The function that should be applied. The first argument passed will be an module instance of the module with variables and RNGs bound to it.

  • module (flax.linen.module.Module) – The Module that will be used to bind variables and RNGs to. The Module passed as the first argument to fn will be a clone of module.

  • mutable (Union[bool, str, Container[str], flax.core.scope.DenyList]) – Can be bool, str, or list. Specifies which collections should be treated as mutable: bool: all/no collections are mutable. str: The name of a single mutable collection. list: A list of names of mutable collections. By default all collections except “intermediates” are mutable.

Returns

The init function wrapping fn.

Return type

Callable[[…], Tuple[Any, flax.core.frozen_dict.FrozenDict[str, Mapping[str, Any]]]]

Variables

A variable dict is a normal Python dictionary, which is a container for one or more “variable collections”, each of which are nested dictionaries whose leaves are jax.numpy arrays.

The different variable collections share the same nested tree structure.

For example, consider the following variable dictionary:

{
  "params": {
    "Conv1": { "weight": ..., "bias": ... },
    "BatchNorm1": { "scale": ..., "mean": ... },
    "Conv2": {...}
  },
  "batch_stats": {
    "BatchNorm1": { "moving_mean": ..., "moving_average": ...}
  }
}

In this case, the "BatchNorm1" key lives in both the "params" and `"batch_stats"" collections. This reflects the fact that the submodule named ""BatchNorm1"" has both trainable parameters (the "params" collection), as well as other non-trainable variables (the "batch_stats" collection)

TODO: Make “variable dict” design note, and link to it from here.

class flax.core.variables.Variable(scope, collection, name)[source]

A Variable object allows mutable access to a variable in a VariableDict.

Variables are identified by a collection (e.g., “batch_stats”) and a name (e.g., “moving_mean”). The value property gives access to the variable’s content and can be assigned to for mutation.

Compact methods

flax.linen.compact(fun)[source]

Marks the given module method allowing inlined submodules.

Methods wrapped in @compact can define submodules directly within the method.

For instance:

@compact
__call__(self, x, features):
  x = nn.Dense(features)(x)
  ...

At most one method in each Module may be wrapped with @compact.

Parameters

fun (flax.linen.module._CallableT) – The Module method to mark as compact.

Returns

The given function fun marked as compact.

Return type

flax.linen.module._CallableT

Transformations

JAX transformations on Modules.

Jax functional transformations operate on pure functions. Flax extends these transformations to also operate on Module’s which have stateful variables and PRNG sequences. We refer to these extended versions as “lifted transformations”.

A lifted transformation can be applied to a Module class or a function that takes a Module instance as its first argument.

vmap(target, variable_axes, split_rngs[, …])

A lifted version of jax.vmap.

scan(target[, variable_axes, …])

A lifted version of jax.lax.scan.

jit(target[, variables, rngs, …])

Lifted version of jax.jit.

remat(target[, variables, rngs, concrete, …])

Lifted version of jax.checkpoint.

Linear modules

Dense(features[, use_bias, dtype, …])

A linear transformation applied over the last dimension of the input.

DenseGeneral(features[, axis, batch_dims, …])

A linear transformation with flexible axes.

Conv(features, kernel_size[, strides, …])

Convolution Module wrapping lax.conv_general_dilated.

ConvTranspose(features, kernel_size[, …])

Convolution Module wrapping lax.conv_transpose.

Embed(num_embeddings, features[, dtype, …])

Embedding Module.

Normalization

BatchNorm([use_running_average, axis, …])

BatchNorm Module.

LayerNorm([epsilon, dtype, use_bias, …])

Layer normalization (https://arxiv.org/abs/1607.06450).

GroupNorm([num_groups, group_size, epsilon, …])

Group normalization (arxiv.org/abs/1803.08494).

Pooling

max_pool(inputs, window_shape[, strides, …])

Pools the input by taking the maximum of a window slice.

avg_pool(inputs, window_shape[, strides, …])

Pools the input by taking the average over a window.

Activation functions

celu(x[, alpha])

Continuously-differentiable exponential linear unit activation.

elu(x[, alpha])

Exponential linear unit activation function.

gelu(x[, approximate])

Gaussian error linear unit activation function.

glu(x[, axis])

Gated linear unit activation function.

log_sigmoid(x)

Log-sigmoid activation function.

log_softmax(x[, axis])

Log-Softmax function.

relu

Rectified linear unit activation function.

sigmoid(x)

Sigmoid activation function.

soft_sign(x)

Soft-sign activation function.

softmax(x[, axis])

Softmax function.

softplus(x)

Softplus activation function.

swish(x)

SiLU activation function.

Attention primitives

dot_product_attention(query, key, value[, …])

Computes dot-product attention given query, key, and value.

SelfAttention(num_heads[, dtype, …])

Self-attention special case of multi-head dot-product attention.

Stochastic

Dropout(rate[, broadcast_dims, …])

Create a dropout layer.

RNN primitives

LSTMCell([gate_fn, activation_fn, …])

LSTM cell.

OptimizedLSTMCell([gate_fn, activation_fn, …])

More efficient LSTM Cell that concatenates state components before matmul.

GRUCell([gate_fn, activation_fn, …])

GRU cell.