Optimizations

Author

Marie-Hélène Burle

Data padding

Prevent recompilation for the last batch that is smaller (different shape).

Learning rate scheduling

Using multiple accelerators

Parallel runs on multiple GPUs/TPUs