diff --git a/.drone.yml b/.drone.yml index 81bb142..6cb7dc5 100644 --- a/.drone.yml +++ b/.drone.yml @@ -4,6 +4,9 @@ name: update_website steps: - name: hugo_build image: alombarte/hugo + when: + branch: + - main commands: - git submodule init - git submodule update @@ -11,6 +14,9 @@ steps: - hugo - name: hugo_publish image: alpine:3.12.3 + when: + branch: + - main environment: FTP_USERNAME: from_secret: FTP_USERNAME diff --git a/brainsteam/content/posts/2021/01/2021-01-14-mlflow-pickle5-madness/images/feature.jpg b/brainsteam/content/posts/2021/01/2021-01-14-mlflow-pickle5-madness/images/feature.jpg new file mode 100644 index 0000000..23e6493 Binary files /dev/null and b/brainsteam/content/posts/2021/01/2021-01-14-mlflow-pickle5-madness/images/feature.jpg differ diff --git a/brainsteam/content/posts/2021/01/2021-01-14-mlflow-pickle5-madness/index.md b/brainsteam/content/posts/2021/01/2021-01-14-mlflow-pickle5-madness/index.md new file mode 100644 index 0000000..6ebe642 --- /dev/null +++ b/brainsteam/content/posts/2021/01/2021-01-14-mlflow-pickle5-madness/index.md @@ -0,0 +1,55 @@ +--- +title: Pickle 5 Madness with MLFlow and Python 3.6/3.7 +author: James +type: post +resources: + - name: feature + src: images/feature.jpg +date: 2021-01-14T11:42:28+00:00 +url: /2021/01/14/pickle-5-madness-with-mlflow/ +description: "Solving 'unsupported pickle protocol: 5' when trying to load mlflow models" +categories: + - Work + - Open Source +tags: + - machine-learning + - python + - ai + - devops + - mlops + +--- + +{{
}} + +I recently came across an infuriating problem where an [MLFlow python model](https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html) I had trained on one system using Python `3.6` would not load on another system with an identical version of Python. + +The exact problem was that when I ran `mlflow models serve -m ` the service would crash saying that the model could not be unserialized because `ValueError: unsupported pickle protocol: 5`. + +A quick bit of searching shows that this error happens when something is pickled in Python 3.8 which uses pickle protocol 5 by default and loaded by a system running an earlier version of Python 3 (3.6 or 3.7) which only support pickle protocol up to v4. + +Under the covers mlflow uses [cloudpickle](https://github.com/cloudpipe/cloudpickle), a library that provides extended pickle support including the ability to pickle lambda functions and functions/classes defined interactively in the `__main__` module of your program or in a Jupyter notebook. By default `cloudpickle` uses the highest version of pickle protocol available in your python implementation (by checking [pickle.HIGHEST_PROTOCOL](https://docs.python.org/3/library/pickle.html#pickle.HIGHEST_PROTOCOL) constant) - this makes sense for most use cases where you want to serialize objects and pass them around within the same Python setup - as a rule of thumb, more recent protocols are better performing/more efficient. + +However this is a mystery because I'm running Python `3.6.12` on both systems which does not support protocol 5, so how is it that cloudpickle is using this version to write the models? I still haven't worked this out and if anyone knows please get in touch because it is driving me mad! + +Luckily for us, although the use of v5 is puzzling, there is a solution. The [pickle5](https://pypi.org/project/pickle5/) library provides version 5 support that is backwards compatible with Python 3.6 and 3.7. Furthermore, [cloudpickle will automatically detect and load this library if it is available](https://github.com/dask/distributed/pull/3849). Therefore all we need to do is install `pickle5` in our MLFLow serving environment to make this issue go away. + +The easiest way to make sure pickle5 is available to your server is by adding it to your conda env when you save your model to MLFlow: + +```python + +model = SomeScikitLearnModel() +model.fit(X,y) + +conda_env = mlflow.pyfunc.get_default_conda_env() +conda_env['dependencies'].append({'pip': [ + 'pickle5' + 'scikit-learn==0.23.2' + #... some other dependencies + ]}) + +mlflow.sklearn.log_model(model, "model", conda_env=conda_env) +``` + +Note: I already checked and `pickle5` is not installed in the first environment but the Conda base version of Python on that system is `3.8.3` so I think there must be some weird leakage of the conda paths going on when I train my model. +