Advanced Guides#
Advanced
- Scale up on multiple devices
- Performance considerations
- Use Flax NNX and Linen together
- Model surgery
- Extracting intermediate values
- A Flax Optimization Cookbook
- Exponential Moving Average
- Low Rank Adaptation
- LBFGS
- Per-Parameter Learning Rates
- Gradient Accumulation
- Sharding Optimization State Differently from Parameters