Optimizations
Data padding
Prevent recompilation for the last batch that is smaller (different shape).
Learning rate scheduling
Using multiple accelerators
Parallel runs on multiple GPUs/TPUs
Marie-Hélène Burle
Prevent recompilation for the last batch that is smaller (different shape).
Parallel runs on multiple GPUs/TPUs