brainsteam.co.uk/2019/01/15/spacy-link-or-how-not-to-ke.../index.html

164 lines
9.0 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Spacy Link or “How not to keep downloading the same files over and over” - Brainsteam</title><meta name="viewport" content="width=device-width, initial-scale=1">
<meta itemprop="name" content="Spacy Link or “How not to keep downloading the same files over and over”">
<meta itemprop="description" content="If youre a frequent user of spacy and virtualenv you might well be all too familiar with the following:
python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If youre lucky and you have a decent internet connection then great, if not its time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?"><meta itemprop="datePublished" content="2019-01-15T18:14:16&#43;00:00" />
<meta itemprop="dateModified" content="2019-01-15T18:14:16&#43;00:00" />
<meta itemprop="wordCount" content="235">
<meta itemprop="keywords" content="nlp,python," /><meta property="og:title" content="Spacy Link or “How not to keep downloading the same files over and over”" />
<meta property="og:description" content="If youre a frequent user of spacy and virtualenv you might well be all too familiar with the following:
python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If youre lucky and you have a decent internet connection then great, if not its time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://brainsteam.co.uk/2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/" /><meta property="article:section" content="posts" />
<meta property="article:published_time" content="2019-01-15T18:14:16&#43;00:00" />
<meta property="article:modified_time" content="2019-01-15T18:14:16&#43;00:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Spacy Link or “How not to keep downloading the same files over and over”"/>
<meta name="twitter:description" content="If youre a frequent user of spacy and virtualenv you might well be all too familiar with the following:
python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If youre lucky and you have a decent internet connection then great, if not its time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?"/>
<link href='https://fonts.googleapis.com/css?family=Playfair+Display:700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/normalize.css" />
<link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/main.css" />
<link id="dark-scheme" rel="stylesheet" type="text/css" href="https://brainsteam.co.uk/css/dark.css" />
<script src="https://brainsteam.co.uk/js/feather.min.js"></script>
<script src="https://brainsteam.co.uk/js/main.js"></script>
</head>
<body>
<div class="container wrapper">
<div class="header">
<div class="avatar">
<a href="https://brainsteam.co.uk/">
<img src="/images/avatar.png" alt="Brainsteam" />
</a>
</div>
<h1 class="site-title"><a href="https://brainsteam.co.uk/">Brainsteam</a></h1>
<div class="site-description"><p>The irregular mental expulsions of a PhD student and CTO of Filament, my views are my own and do not represent my employers in any way.</p><nav class="nav social">
<ul class="flat"><li><a href="https://twitter.com/jamesravey/" title="Twitter" rel="me"><i data-feather="twitter"></i></a></li><li><a href="https://github.com/ravenscroftj" title="Github" rel="me"><i data-feather="github"></i></a></li><li><a href="/index.xml" title="RSS" rel="me"><i data-feather="rss"></i></a></li></ul>
</nav></div>
<nav class="nav">
<ul class="flat">
<li>
<a href="/">Home</a>
</li>
<li>
<a href="/tags">Tags</a>
</li>
<li>
<a href="https://jamesravey.me">About Me</a>
</li>
</ul>
</nav>
</div>
<div class="post">
<div class="post-header">
<div class="meta">
<div class="date">
<span class="day">15</span>
<span class="rest">Jan 2019</span>
</div>
</div>
<div class="matter">
<h1 class="title">Spacy Link or “How not to keep downloading the same files over and over”</h1>
</div>
</div>
<div class="markdown">
<p>If youre a frequent user of spacy and virtualenv you might well be all too familiar with the following:</p>
<blockquote class="wp-block-quote">
<p>
python -m spacy download en_core_web_lg<br /> Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0<br /> Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)<br /> 5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10
</p>
</blockquote>
<p>If youre lucky and you have a decent internet connection then great, if not its time to make a cup of tea.</p>
<p>Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up? I recently found that about 40GB of disk space on my laptop was being used by spacy models Id downloaded and forgotten about.</p>
<p>Fear not spacy link offers you salvation from this wasteful use of disk space.</p>
<p>Spacy link essentially allows you to link your virtualenv copy of spacy to a copy of the model you already downloaded. Say you installed your desired spacy model to your global python3 installation somewhere like** _/usr/lib/python3/site-packages/spacy/data_**** __**</p>
<p>Spacy link will let you link your existing model into a virtualenv to save redownloading (and using extra disk space). From your virtualenv you can do:</p>
<p>python -m spacy link ** <em>/usr/lib/python3/site-packages/spacy/data/&lt;name_of_model&gt; <name of model></em>**</p>
<p>For example if we wanted to make the <strong>en_core_web_lg</strong> the default english model model in our virtualenv we could do</p>
<p>python -m spacy link ** <em>/usr/lib/python3/site-packages/spacy/data/en_core_web_lg en</em>**</p>
<p>Presto! Now when we do <strong>spacy.load(en)</strong> inside our virtualenv we get the large model!</p>
</div>
<div class="tags">
<ul class="flat">
<li><a href="/tags/nlp">nlp</a></li>
<li><a href="/tags/python">python</a></li>
</ul>
</div><div id="disqus_thread"></div>
<script type="text/javascript">
(function () {
if (window.location.hostname == "localhost")
return;
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
var disqus_shortname = 'brainsteam';
dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
})();
</script>
<noscript>Please enable JavaScript to view the </a></noscript>
<a href="http://disqus.com/" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
</div>
</div>
<div class="footer wrapper">
<nav class="nav">
<div>2021 © James Ravenscroft 2020 | <a href="https://github.com/knadh/hugo-ink">Ink</a> theme on <a href="https://gohugo.io">Hugo</a></div>
</nav>
</div>
<script type="application/javascript">
var doNotTrack = false;
if (!doNotTrack) {
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;
ga('create', 'UA-186263385-1', 'auto');
ga('send', 'pageview');
}
</script>
<script async src='https://www.google-analytics.com/analytics.js'></script>
<script>feather.replace()</script>
</body>
</html>