10 MLflow Features to 10 Million Downloads

Written by Jim Hibbard | Nov 7, 2022 8:08:29 PM

We’re thrilled to announce that MLflow has passed 10,000,000 monthly downloads!

This milestone, which few open source projects achieve, was accomplished through contributions from the MLflow community’s many open-source developers and users. As we celebrate the community’s success, we think it’s worth taking a moment to reflect on how MLflow reached this level of adoption in the emerging MLOps ecosystem. I’d like to highlight ten design principles and features that contributed to MLflow passing 10,000,000 monthly downloads by making users successful with a wide variety of model development and MLOps initiatives.

1. Open Interface Design Philosophy

MLflow is based on an open interface design philosophy, aimed at making it easy to connect arbitrary ML code and tools as part of larger workflows. By relying on simple command-line interfaces and REST APIs, MLflow remains easy to mix into already existing workflows and supports a pattern of progressive adoption across its four main components: Tracking, Projects, Models, and Registry. The modular design and focus on APIs prevents lock-in and makes it easy to extend the framework, run alongside other tooling, and create new integrations as required.

2. Tracking for providing experiment and data management:

MLflow Tracking is often the first component that new users encounter when they adopt MLflow as part of their MLOps strategy. It provides APIs for logging model parameters, code versions, metrics, output files, and any other useful artifacts produced by running machine learning code. MLflow Tracking lets you log and query experiments using Python, REST, R API, and Java APIs.

3. Autologging for saving metrics, parameters, and models without explicit log statements:

Built on top of the MLflow Tracking API, autologging integrates with some of the most popular ML libraries to record metrics, parameters, and library specific training information in a best practices way without intervention.

4. Web UI for easy inspection of individual runs and sharing insights:

MLflow’s built-in UI provides a huge boost in productivity to teams that need to quickly share and compare model training results. The artifact viewer can display visualizations, documents, and summary tables associated with individual runs to provide a deeper context and background information to collaborators.

5. Models abstract packaging and calling ML models across a wide variety of ML libraries:

An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools and deployment scenarios. From real-time serving through a REST API or batch inference on Apache Spark, the MLflow Models API is designed to package models in a way that’s easy to integrate with a variety of inference runtimes.

6. Registry as a central hub for sharing and working on models:

MLflow’s Model Registry begins to shine as the number of experiments, runs, and models to track increases. It’s another modular component of the MLflow framework and it solves the problem of how to collaboratively manage the full lifecycle of an MLflow Model. With the Model Registry, you can provide model lineage information (which MLflow experiment and run produced the model), versioning, and annotations to smooth the transition of a model in and out of production. The Model Registry acts as a hub for sharing and working on models across your team and is a great place to integrate with CI/CD tooling.

7. Projects as a packaging format for data science code that needs portability:

MLflow Projects provide a format for packaging data science code in a way that promotes reusability and reproducibility across diverse compute environments. By providing a set of conventions and an open API to expose information on a workflow’s environment requirements and calling conventions, data scientists (and automation tooling) can run packaged workflows without needing to know the underlying languages, libraries, or environment specifications that a MLflow Project depends on.

8. Plugins for customizing the behavior of MLflow’s Python client in powerful ways:

MLflow plugins are Python packages that can be installed using PyPI or conda to add or enhance existing features, they can often be the right solution for adding a needed feature that is too specific for a broader audience. Plugins are able to override the default behavior of every major component of MLflow. Reference implementations for each plugin type provide a natural starting point for authoring new features and popular plugins are featured as community projects in MLflow’s documentation.

9. MLflow Pipelines for data scientists and ML engineers standardizing workflows:

MLflow Pipelines provide opinionated templates for solving data science problems end-to-end, simplifying the development and productionization of machine learning products. Each step of a pipeline represents a modeling or MLOps procedure, such as fitting an estimator or scoring new data, and provides the necessary boilerplate code needed for that step. Data scientists are then free to focus on developing high-quality models while ML engineers integrate them into production applications.

10. MLflow’s deployment API exposes functionality for deploying MLflow models to production serving tools:

MLflow’s deployment APIs support numerous deployment scenarios across a variety of use cases from wearables to large scale inference on clusters. This can be especially useful when you’re required to handle multiple deployment scenarios for a single model.

If reading about these ten features has you excited about the possibilities of the largest open source project for MLOps and wanting to be a part of MLflow’s continued success, there’s never been a better time to contribute. By claiming an open issue you can work alongside a maintainer to enhance one of the most impactful MLOps toolkits available.

Looking forward to our community’s next milestone, MLflow is nearing its 2.0 release on November 15th. For anyone else who can’t wait until then to try out all the new features and UI improvements, there’s an MLflow 2.0 release candidate to try out today!

View full post