56 lines
3.4 KiB
Markdown
56 lines
3.4 KiB
Markdown
---
|
|
title: Pickle 5 Madness with MLFlow and Python 3.6/3.7
|
|
author: James
|
|
type: post
|
|
resources:
|
|
- name: feature
|
|
src: images/feature.jpg
|
|
date: 2021-01-14T11:42:28+00:00
|
|
url: /2021/01/14/pickle-5-madness-with-mlflow/
|
|
description: "Solving 'unsupported pickle protocol: 5' when trying to load mlflow models"
|
|
|
|
tags:
|
|
- machine-learning
|
|
- python
|
|
- ai
|
|
- devops
|
|
- mlops
|
|
- work
|
|
- open source
|
|
|
|
---
|
|
|
|
{{<figure src="images/feature.jpg" caption="A jar of pickles by <a href='https://www.pexels.com/photo/crop-unrecognizable-person-with-jar-of-pickled-zucchini-3952045/'>Ksenia Charnaya</a>">}}
|
|
|
|
I recently came across an infuriating problem where an [MLFlow python model](https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html) I had trained on one system using Python `3.6` would not load on another system with an identical version of Python.
|
|
|
|
The exact problem was that when I ran `mlflow models serve -m <url/to/model/in/bucket>` the service would crash saying that the model could not be unserialized because `ValueError: unsupported pickle protocol: 5`.
|
|
|
|
A quick bit of searching shows that this error happens when something is pickled in Python 3.8 which uses pickle protocol 5 by default and loaded by a system running an earlier version of Python 3 (3.6 or 3.7) which only support pickle protocol up to v4.
|
|
|
|
Under the covers mlflow uses [cloudpickle](https://github.com/cloudpipe/cloudpickle), a library that provides extended pickle support including the ability to pickle lambda functions and functions/classes defined interactively in the `__main__` module of your program or in a Jupyter notebook. By default `cloudpickle` uses the highest version of pickle protocol available in your python implementation (by checking [pickle.HIGHEST_PROTOCOL](https://docs.python.org/3/library/pickle.html#pickle.HIGHEST_PROTOCOL) constant) - this makes sense for most use cases where you want to serialize objects and pass them around within the same Python setup - as a rule of thumb, more recent protocols are better performing/more efficient.
|
|
|
|
However this is a mystery because I'm running Python `3.6.12` on both systems which does not support protocol 5, so how is it that cloudpickle is using this version to write the models? I still haven't worked this out and if anyone knows please get in touch because it is driving me mad!
|
|
|
|
Luckily for us, although the use of v5 is puzzling, there is a solution. The [pickle5](https://pypi.org/project/pickle5/) library provides version 5 support that is backwards compatible with Python 3.6 and 3.7. Furthermore, [cloudpickle will automatically detect and load this library if it is available](https://github.com/dask/distributed/pull/3849). Therefore all we need to do is install `pickle5` in our MLFLow serving environment to make this issue go away.
|
|
|
|
The easiest way to make sure pickle5 is available to your server is by adding it to your conda env when you save your model to MLFlow:
|
|
|
|
```python
|
|
|
|
model = SomeScikitLearnModel()
|
|
model.fit(X,y)
|
|
|
|
conda_env = mlflow.pyfunc.get_default_conda_env()
|
|
conda_env['dependencies'].append({'pip': [
|
|
'pickle5'
|
|
'scikit-learn==0.23.2'
|
|
#... some other dependencies
|
|
]})
|
|
|
|
mlflow.sklearn.log_model(model, "model", conda_env=conda_env)
|
|
```
|
|
|
|
Note: I already checked and `pickle5` is not installed in the first environment but the Conda base version of Python on that system is `3.8.3` so I think there must be some weird leakage of the conda paths going on when I train my model.
|
|
|