brainsteam.co.uk/brainsteam/content/posts/legacy/2019-01-15-spacy-link-or-ho...

43 lines
2.8 KiB
Markdown
Raw Normal View History

2020-12-28 11:39:11 +00:00
---
author: James
2023-07-09 11:34:44 +01:00
date: 2019-01-15 18:14:16+00:00
2020-12-28 11:39:11 +00:00
medium_post:
2023-07-09 11:34:44 +01:00
- O:11:"Medium_Post":11:{s:16:"author_image_url";s:69:"https://cdn-images-1.medium.com/fit/c/200/200/0*naYvMn9xdbL5qlkJ.jpeg";s:10:"author_url";s:30:"https://medium.com/@jamesravey";s:11:"byline_name";N;s:12:"byline_email";N;s:10:"cross_link";s:2:"no";s:2:"id";s:12:"11a44e1c247f";s:21:"follower_notification";s:3:"yes";s:7:"license";s:19:"all-rights-reserved";s:14:"publication_id";s:2:"-1";s:6:"status";s:6:"public";s:3:"url";s:114:"https://medium.com/@jamesravey/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over-11a44e1c247f";}
post_meta:
- date
2024-10-28 20:59:46 +00:00
preview: /social/35fd16f7227ec8827d906ed4b78035ebe224683b067549ad8321cc0ce164e25c.png
2020-12-29 10:14:30 +00:00
tags:
2023-07-09 11:34:44 +01:00
- nlp
- python
- work
- phd
title: Spacy Link or “How not to keep downloading the same files over and over”
type: posts
url: /2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/
2020-12-28 11:39:11 +00:00
---
2023-07-09 11:34:44 +01:00
2020-12-28 11:39:11 +00:00
If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:
<blockquote class="wp-block-quote">
<p>
python -m spacy download en_core_web_lg<br /> Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0<br /> Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)<br /> 5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10
</p>
</blockquote>
If you&#8217;re lucky and you have a decent internet connection then great, if not it&#8217;s time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up? I recently found that about 40GB of disk space on my laptop was being used by spacy models I&#8217;d downloaded and forgotten about.
Fear not &#8211; spacy link offers you salvation from this wasteful use of disk space.
Spacy link essentially allows you to link your virtualenv copy of spacy to a copy of the model you already downloaded. Say you installed your desired spacy model to your global python3 installation &#8211; somewhere like** _/usr/lib/python3/site-packages/spacy/data_**** __**
Spacy link will let you link your existing model into a virtualenv to save redownloading (and using extra disk space). From your virtualenv you can do:
python -m spacy link ** _/usr/lib/python3/site-packages/spacy/data/<name\_of\_model> <name of model>_**
For example if we wanted to make the **en\_core\_web_lg** the default english model model in our virtualenv we could do
python -m spacy link ** _/usr/lib/python3/site-packages/spacy/data/en\_core\_web_lg en_**
Presto! Now when we do **spacy.load(&#8216;en&#8217;)** inside our virtualenv we get the large model!