<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Spacy Link or “How not to keep downloading the same files over and over” - Brainsteam</title><meta name="viewport" content="width=device-width, initial-scale=1"> <meta itemprop="name" content="Spacy Link or “How not to keep downloading the same files over and over”"> <meta itemprop="description" content="If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following: python -m spacy download en_core_web_lg Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0 Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB) 5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea. Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?"><meta itemprop="datePublished" content="2019-01-15T18:14:16+00:00" /> <meta itemprop="dateModified" content="2019-01-15T18:14:16+00:00" /> <meta itemprop="wordCount" content="235"> <meta itemprop="keywords" content="nlp,python," /><meta property="og:title" content="Spacy Link or “How not to keep downloading the same files over and over”" /> <meta property="og:description" content="If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following: python -m spacy download en_core_web_lg Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0 Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB) 5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea. Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?" /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://brainsteam.co.uk/2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/" /><meta property="article:section" content="posts" /> <meta property="article:published_time" content="2019-01-15T18:14:16+00:00" /> <meta property="article:modified_time" content="2019-01-15T18:14:16+00:00" /> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="Spacy Link or “How not to keep downloading the same files over and over”"/> <meta name="twitter:description" content="If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following: python -m spacy download en_core_web_lg Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0 Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB) 5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea. Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?"/> <link href='https://fonts.googleapis.com/css?family=Playfair+Display:700' rel='stylesheet' type='text/css'> <link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/normalize.css" /> <link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/main.css" /> <link id="dark-scheme" rel="stylesheet" type="text/css" href="https://brainsteam.co.uk/css/dark.css" /> <script src="https://brainsteam.co.uk/js/feather.min.js"></script> <script src="https://brainsteam.co.uk/js/main.js"></script> </head> <body> <div class="container wrapper"> <div class="header"> <div class="avatar"> <a href="https://brainsteam.co.uk/"> <img src="/images/avatar.png" alt="Brainsteam" /> </a> </div> <h1 class="site-title"><a href="https://brainsteam.co.uk/">Brainsteam</a></h1> <div class="site-description"><p>The irregular mental expulsions of a PhD student and CTO of Filament, my views are my own and do not represent my employers in any way.</p><nav class="nav social"> <ul class="flat"><li><a href="https://twitter.com/jamesravey/" title="Twitter" rel="me"><i data-feather="twitter"></i></a></li><li><a href="https://github.com/ravenscroftj" title="Github" rel="me"><i data-feather="github"></i></a></li><li><a href="/index.xml" title="RSS" rel="me"><i data-feather="rss"></i></a></li></ul> </nav></div> <nav class="nav"> <ul class="flat"> <li> <a href="/">Home</a> </li> <li> <a href="/tags">Tags</a> </li> <li> <a href="https://jamesravey.me">About Me</a> </li> </ul> </nav> </div> <div class="post"> <div class="post-header"> <div class="meta"> <div class="date"> <span class="day">15</span> <span class="rest">Jan 2019</span> </div> </div> <div class="matter"> <h1 class="title">Spacy Link or “How not to keep downloading the same files over and over”</h1> </div> </div> <div class="markdown"> <p>If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:</p> <blockquote class="wp-block-quote"> <p> python -m spacy download en_core_web_lg<br /> Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0<br /> Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)<br /> 5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 </p> </blockquote> <p>If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.</p> <p>Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up? I recently found that about 40GB of disk space on my laptop was being used by spacy models I’d downloaded and forgotten about.</p> <p>Fear not – spacy link offers you salvation from this wasteful use of disk space.</p> <p>Spacy link essentially allows you to link your virtualenv copy of spacy to a copy of the model you already downloaded. Say you installed your desired spacy model to your global python3 installation – somewhere like** _/usr/lib/python3/site-packages/spacy/data_**** __**</p> <p>Spacy link will let you link your existing model into a virtualenv to save redownloading (and using extra disk space). From your virtualenv you can do:</p> <p>python -m spacy link ** <em>/usr/lib/python3/site-packages/spacy/data/<name_of_model> <name of model></em>**</p> <p>For example if we wanted to make the <strong>en_core_web_lg</strong> the default english model model in our virtualenv we could do</p> <p>python -m spacy link ** <em>/usr/lib/python3/site-packages/spacy/data/en_core_web_lg en</em>**</p> <p>Presto! Now when we do <strong>spacy.load(‘en’)</strong> inside our virtualenv we get the large model!</p> </div> <div class="tags"> <ul class="flat"> <li><a href="/tags/nlp">nlp</a></li> <li><a href="/tags/python">python</a></li> </ul> </div><div id="disqus_thread"></div> <script type="text/javascript"> (function () { if (window.location.hostname == "localhost") return; var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; var disqus_shortname = 'brainsteam'; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the </a></noscript> <a href="http://disqus.com/" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a> </div> </div> <div class="footer wrapper"> <nav class="nav"> <div>2021 © James Ravenscroft 2020 | <a href="https://github.com/knadh/hugo-ink">Ink</a> theme on <a href="https://gohugo.io">Hugo</a></div> </nav> </div> <script type="application/javascript"> var doNotTrack = false; if (!doNotTrack) { window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date; ga('create', 'UA-186263385-1', 'auto'); ga('send', 'pageview'); } </script> <script async src='https://www.google-analytics.com/analytics.js'></script> <script>feather.replace()</script> </body> </html>