Marie-Hélène Burle
November 25, 2025
Deep learning experiments come with a lot of components:
While developing an efficient model, various datasets will be trained on various architectures tuned with various hyperparameters
*hp = hyperparameter
How did we get performance19 again? 🤯
The solution to this complexity is to use an experiment tracking tool such as MLflow and, optionally, a data versioning tool such as DVC

The foundation of many proprietary no-code/low-code tuning platforms that just add a layer on top to interface with the user with text rather than code
e.g. Microsoft Fabric, FLAML
MLflow projects do not (yet) support uv
Some functionality missing for deployment and production for large companies (but irrelevant for research and no FOSS option exists)
Run: single execution of a model training event
Model signature: a formal description of a model’s input and output data structure, data types, and names of columns or features
With uv
Create a uv project:
Install MLflow:
Track models at checkpoints
Compare with different datasets
Visualize with tracking UI

![]()
For team development
Here we use PostgreSQL which works well to manage a database in a client-server system
(Requires installing the psycopg2 package)
The workflow looks like this:
experiments child runs tags
http://<host>:<port> in your browserExample:
For a local server on port 5000, open http://127.0.0.1:5000
Example for a local server on port 5000:
Find the optimal set of hyperparameters that maximize a model’s predictive accuracy and performance
➔ Find the right balance between high bias (underfitting) and high variance (overfitting) to improve the model’s ability to generalize and perform well on new, unseen data
Hyperparameters optimization used to be done manually following a systematic grid pattern. This was extremely inefficient
Nowadays, there are many frameworks that do it automatically, faster, and better
Example: