164 lines
9.0 KiB
HTML
164 lines
9.0 KiB
HTML
<!DOCTYPE html>
|
||
<html>
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Spacy Link or “How not to keep downloading the same files over and over” - Brainsteam</title><meta name="viewport" content="width=device-width, initial-scale=1">
|
||
<meta itemprop="name" content="Spacy Link or “How not to keep downloading the same files over and over”">
|
||
<meta itemprop="description" content="If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:
|
||
python -m spacy download en_core_web_lg
|
||
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
|
||
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
|
||
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.
|
||
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?"><meta itemprop="datePublished" content="2019-01-15T18:14:16+00:00" />
|
||
<meta itemprop="dateModified" content="2019-01-15T18:14:16+00:00" />
|
||
<meta itemprop="wordCount" content="235">
|
||
<meta itemprop="keywords" content="nlp,python," /><meta property="og:title" content="Spacy Link or “How not to keep downloading the same files over and over”" />
|
||
<meta property="og:description" content="If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:
|
||
python -m spacy download en_core_web_lg
|
||
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
|
||
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
|
||
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.
|
||
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?" />
|
||
<meta property="og:type" content="article" />
|
||
<meta property="og:url" content="https://brainsteam.co.uk/2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/" /><meta property="article:section" content="posts" />
|
||
<meta property="article:published_time" content="2019-01-15T18:14:16+00:00" />
|
||
<meta property="article:modified_time" content="2019-01-15T18:14:16+00:00" />
|
||
|
||
<meta name="twitter:card" content="summary"/>
|
||
<meta name="twitter:title" content="Spacy Link or “How not to keep downloading the same files over and over”"/>
|
||
<meta name="twitter:description" content="If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:
|
||
python -m spacy download en_core_web_lg
|
||
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
|
||
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
|
||
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.
|
||
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?"/>
|
||
<link href='https://fonts.googleapis.com/css?family=Playfair+Display:700' rel='stylesheet' type='text/css'>
|
||
<link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/normalize.css" />
|
||
<link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/main.css" />
|
||
|
||
<link id="dark-scheme" rel="stylesheet" type="text/css" href="https://brainsteam.co.uk/css/dark.css" />
|
||
|
||
<script src="https://brainsteam.co.uk/js/feather.min.js"></script>
|
||
|
||
<script src="https://brainsteam.co.uk/js/main.js"></script>
|
||
</head>
|
||
|
||
<body>
|
||
<div class="container wrapper">
|
||
<div class="header">
|
||
|
||
<div class="avatar">
|
||
<a href="https://brainsteam.co.uk/">
|
||
<img src="/images/avatar.png" alt="Brainsteam" />
|
||
</a>
|
||
</div>
|
||
|
||
<h1 class="site-title"><a href="https://brainsteam.co.uk/">Brainsteam</a></h1>
|
||
<div class="site-description"><p>The irregular mental expulsions of a PhD student and CTO of Filament, my views are my own and do not represent my employers in any way.</p><nav class="nav social">
|
||
<ul class="flat"><li><a href="https://twitter.com/jamesravey/" title="Twitter" rel="me"><i data-feather="twitter"></i></a></li><li><a href="https://github.com/ravenscroftj" title="Github" rel="me"><i data-feather="github"></i></a></li><li><a href="/index.xml" title="RSS" rel="me"><i data-feather="rss"></i></a></li></ul>
|
||
</nav></div>
|
||
|
||
<nav class="nav">
|
||
<ul class="flat">
|
||
|
||
<li>
|
||
<a href="/">Home</a>
|
||
</li>
|
||
|
||
<li>
|
||
<a href="/tags">Tags</a>
|
||
</li>
|
||
|
||
<li>
|
||
<a href="https://jamesravey.me">About Me</a>
|
||
</li>
|
||
|
||
</ul>
|
||
</nav>
|
||
</div>
|
||
|
||
<div class="post">
|
||
<div class="post-header">
|
||
|
||
<div class="meta">
|
||
<div class="date">
|
||
<span class="day">15</span>
|
||
<span class="rest">Jan 2019</span>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="matter">
|
||
<h1 class="title">Spacy Link or “How not to keep downloading the same files over and over”</h1>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="markdown">
|
||
<p>If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:</p>
|
||
<blockquote class="wp-block-quote">
|
||
<p>
|
||
python -m spacy download en_core_web_lg<br /> Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0<br /> Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)<br /> 5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10
|
||
</p>
|
||
</blockquote>
|
||
<p>If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.</p>
|
||
<p>Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up? I recently found that about 40GB of disk space on my laptop was being used by spacy models I’d downloaded and forgotten about.</p>
|
||
<p>Fear not – spacy link offers you salvation from this wasteful use of disk space.</p>
|
||
<p>Spacy link essentially allows you to link your virtualenv copy of spacy to a copy of the model you already downloaded. Say you installed your desired spacy model to your global python3 installation – somewhere like** _/usr/lib/python3/site-packages/spacy/data_**** __**</p>
|
||
<p>Spacy link will let you link your existing model into a virtualenv to save redownloading (and using extra disk space). From your virtualenv you can do:</p>
|
||
<p>python -m spacy link ** <em>/usr/lib/python3/site-packages/spacy/data/<name_of_model> <name of model></em>**</p>
|
||
<p>For example if we wanted to make the <strong>en_core_web_lg</strong> the default english model model in our virtualenv we could do</p>
|
||
<p>python -m spacy link ** <em>/usr/lib/python3/site-packages/spacy/data/en_core_web_lg en</em>**</p>
|
||
<p>Presto! Now when we do <strong>spacy.load(‘en’)</strong> inside our virtualenv we get the large model!</p>
|
||
|
||
</div>
|
||
|
||
<div class="tags">
|
||
|
||
|
||
<ul class="flat">
|
||
|
||
<li><a href="/tags/nlp">nlp</a></li>
|
||
|
||
<li><a href="/tags/python">python</a></li>
|
||
|
||
</ul>
|
||
|
||
|
||
</div><div id="disqus_thread"></div>
|
||
<script type="text/javascript">
|
||
(function () {
|
||
|
||
|
||
if (window.location.hostname == "localhost")
|
||
return;
|
||
|
||
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
|
||
var disqus_shortname = 'brainsteam';
|
||
dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
|
||
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
|
||
})();
|
||
</script>
|
||
<noscript>Please enable JavaScript to view the </a></noscript>
|
||
<a href="http://disqus.com/" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
|
||
</div>
|
||
</div>
|
||
<div class="footer wrapper">
|
||
<nav class="nav">
|
||
<div>2021 © James Ravenscroft 2020 | <a href="https://github.com/knadh/hugo-ink">Ink</a> theme on <a href="https://gohugo.io">Hugo</a></div>
|
||
</nav>
|
||
</div>
|
||
|
||
|
||
<script type="application/javascript">
|
||
var doNotTrack = false;
|
||
if (!doNotTrack) {
|
||
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;
|
||
ga('create', 'UA-186263385-1', 'auto');
|
||
|
||
ga('send', 'pageview');
|
||
}
|
||
</script>
|
||
<script async src='https://www.google-analytics.com/analytics.js'></script>
|
||
<script>feather.replace()</script>
|
||
</body>
|
||
</html>
|