brainsteam.co.uk/brainsteam/content/posts/legacy/2019-01-15-spacy-link-or-ho...

2.7 KiB
Raw Permalink Blame History

title author type date url medium_post categories tags
Spacy Link or “How not to keep downloading the same files over and over” James post 2019-01-15T18:14:16+00:00 /2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/
O:11:"Medium_Post":11:{s:16:"author_image_url";s:69:"https://cdn-images-1.medium.com/fit/c/200/200/0*naYvMn9xdbL5qlkJ.jpeg";s:10:"author_url";s:30:"https://medium.com/@jamesravey";s:11:"byline_name";N;s:12:"byline_email";N;s:10:"cross_link";s:2:"no";s:2:"id";s:12:"11a44e1c247f";s:21:"follower_notification";s:3:"yes";s:7:"license";s:19:"all-rights-reserved";s:14:"publication_id";s:2:"-1";s:6:"status";s:6:"public";s:3:"url";s:114:"https://medium.com/@jamesravey/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over-11a44e1c247f";}
Work
PhD
nlp
python

If youre a frequent user of spacy and virtualenv you might well be all too familiar with the following:

python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10

If youre lucky and you have a decent internet connection then great, if not its time to make a cup of tea.

Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up? I recently found that about 40GB of disk space on my laptop was being used by spacy models Id downloaded and forgotten about.

Fear not spacy link offers you salvation from this wasteful use of disk space.

Spacy link essentially allows you to link your virtualenv copy of spacy to a copy of the model you already downloaded. Say you installed your desired spacy model to your global python3 installation somewhere like** /usr/lib/python3/site-packages/spacy/data**** __**

Spacy link will let you link your existing model into a virtualenv to save redownloading (and using extra disk space). From your virtualenv you can do:

python -m spacy link ** /usr/lib/python3/site-packages/spacy/data/<name_of_model> **

For example if we wanted to make the en_core_web_lg the default english model model in our virtualenv we could do

python -m spacy link ** /usr/lib/python3/site-packages/spacy/data/en_core_web_lg en**

Presto! Now when we do spacy.load(en) inside our virtualenv we get the large model!