Avoid INTERNAL_SERVER_ERROR in MLFlow UI caused by timeouts

MLFlow can be very slow sometimes, especially if you are using the default storage method (plain folders and files in the file system) rather than a database backend. If you have more than just a few runs in an experiment, the web interface gets really slow. Load times of a few minutes can easily happen if you have 100 or more runs in an experiment.

MLFlow UI internally uses gunicorn as a webserver. Setting the timeout of gunicorn to a higher number can resolve the problem of seeing INTERNAL_SERVER_ERROR after the page loaded a minute or two. You can set a new timeout like this:

GUNICORN_CMD_ARGS="--timeout 600" mlflow ui -h 127.0.0.1 -p 1234

This sets the timeout to 10 minutes (600 seconds) which should be enough time for most cases. However, depending on the number of runs you have, you might have to set it even higher. Of course this is very annoying and if you access the UI often, it really can block your work.

A better solution is probably to use a database as the storage backend (e.g. SQLite). The root problem that makes the UI so slow is that MLFlow needs to iterate through the experiment folder, go into each run folder, then go into each metrics, params, artifacts, etc. folders and then open text files for each item you have in them. I’ll publish a comparison between the two methods in the next days.

MLFlow + Optuna: Parallel hyper-parameter optimization and logging

Optuna is a Python library that allows to easily optimize hyper-parameters of machine learning models. MLFlow is a tool which can be used to keep track of experiments. In this post I want to show how to use them together: Use Optuna to find optimal hyper-parameters and MLFlow to keep track of each hyper-parameter candidate (Optuna trial).

I will create one MLFlow run for the overall Optuna study and one nested run for each trial. Trials will run in parallel. Using the default MLFlow fluent interface does not work properly when using multiple threads in parallel because you will see errors like this:

mlflow.exceptions.MlflowException: Changing param values is not allowed. Param with key=’x’ was already logged with value=’4.826018001260979′ for run ID=’664a3b7001b04fcdb132c351238a8cf4′. Attempted logging new value ‘4.799057323848487’.

This error is shown if you use the “standard mlflow approach”:

Read More

Why are precision, recall and F1 score equal when using micro averaging in a multi-class problem?

In a recent project I was wondering why I get the exact same value for precision, recall and the F1 score when using scikit-learn’s metrics. The project is about a simple classification problem where the input is mapped to exactly \(1\) of \(n\) classes. I was using micro averaging for the metric functions, which means the following according to sklearn’s documentation:

Calculate metrics globally by counting the total true positives, false negatives and false positives.

According to the documentation this behaviour is correct:

Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F, while “weighted” averaging may produce an F-score that is not between precision and recall.

After thinking about it a bit I figured out why this is the case. In this article, I will explain the reasons.

Read More

NLP: Approaches for Sentence Embeddings (Overview)

In 2013, Mikolov et. al published ‘Distributed Representations of Words and Phrases and their Compositionality‘, a paper about a new approach to represent words by dense vectors. This was an improvement over the alternative, representing words as one-hot vectors, as these dense vector embeddings encode some meaning of the words they represent. In other terms, words with similar meaning are be close to each other in the vector space of the embedding. For example, “blue” would be close to “red” but far from “cat”. A commonly used name for their approach is word2vec.

Read More