Brainsteam

The irregular mental expulsions of a PhD student and CTO of Filament, my views are my own and do not represent my employers in any way.

15 Jan 2019

Spacy Link or “How not to keep downloading the same files over and over”

If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:

python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10

If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.

Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up? I recently found that about 40GB of disk space on my laptop was being used by spacy models I’d downloaded and forgotten about.

Fear not – spacy link offers you salvation from this wasteful use of disk space.

Spacy link essentially allows you to link your virtualenv copy of spacy to a copy of the model you already downloaded. Say you installed your desired spacy model to your global python3 installation – somewhere like** _/usr/lib/python3/site-packages/spacy/data_**** __**

Spacy link will let you link your existing model into a virtualenv to save redownloading (and using extra disk space). From your virtualenv you can do:

python -m spacy link ** /usr/lib/python3/site-packages/spacy/data/<name_of_model> **

For example if we wanted to make the en_core_web_lg the default english model model in our virtualenv we could do

python -m spacy link ** /usr/lib/python3/site-packages/spacy/data/en_core_web_lg en**

Presto! Now when we do spacy.load(‘en’) inside our virtualenv we get the large model!

comments powered by Disqus