<!DOCTYPE html>
	<meta charset="utf-8" />
	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Spacy Link or “How not to keep downloading the same files over and over” - Brainsteam</title><meta name="viewport" content="width=device-width, initial-scale=1">
	<meta itemprop="name" content="Spacy Link or “How not to keep downloading the same files over and over”">
<meta itemprop="description" content="If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:
 python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10  If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?"><meta itemprop="datePublished" content="2019-01-15T18:14:16&#43;00:00" />
<meta itemprop="dateModified" content="2019-01-15T18:14:16&#43;00:00" />
<meta itemprop="wordCount" content="235">
<meta itemprop="keywords" content="nlp,python," /><meta property="og:title" content="Spacy Link or “How not to keep downloading the same files over and over”" />
<meta property="og:description" content="If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:
 python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10  If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://brainsteam.co.uk/2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/" /><meta property="article:section" content="posts" />
<meta property="article:published_time" content="2019-01-15T18:14:16&#43;00:00" />
<meta property="article:modified_time" content="2019-01-15T18:14:16&#43;00:00" />

<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Spacy Link or “How not to keep downloading the same files over and over”"/>
<meta name="twitter:description" content="If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:
 python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10  If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?"/>
<link href='https://fonts.googleapis.com/css?family=Playfair+Display:700' rel='stylesheet' type='text/css'>
	<link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/normalize.css" />
	<link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/main.css" />

        <link id="dark-scheme" rel="stylesheet" type="text/css" href="https://brainsteam.co.uk/css/dark.css" />

	<script src="https://brainsteam.co.uk/js/feather.min.js"></script>
		<script src="https://brainsteam.co.uk/js/main.js"></script>

	<div class="container wrapper">
		<div class="header">
    <div class="avatar">
        <a href="https://brainsteam.co.uk/">
            <img src="/images/avatar.png" alt="Brainsteam" />
    <h1 class="site-title"><a href="https://brainsteam.co.uk/">Brainsteam</a></h1>
    <div class="site-description"><p>The irregular mental expulsions of a PhD student and CTO of Filament, my views are my own and do not represent my employers in any way.</p><nav class="nav social">
            <ul class="flat"><li><a href="https://twitter.com/jamesravey/" title="Twitter" rel="me"><i data-feather="twitter"></i></a></li><li><a href="https://github.com/ravenscroftj" title="Github" rel="me"><i data-feather="github"></i></a></li><li><a href="/index.xml" title="RSS" rel="me"><i data-feather="rss"></i></a></li></ul>

	<nav class="nav">
		<ul class="flat">
				<a href="/">Home</a>
				<a href="/tags">Tags</a>
				<a href="https://jamesravey.me">About Me</a>

		<div class="post">
			<div class="post-header">
					<div class="meta">
						<div class="date">
							<span class="day">15</span>
							<span class="rest">Jan 2019</span>
				<div class="matter">
					<h1 class="title">Spacy Link or “How not to keep downloading the same files over and over”</h1>
			<div class="markdown">
				<p>If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:</p>
<blockquote class="wp-block-quote">
    python -m spacy download en_core_web_lg<br /> Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0<br /> Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)<br /> 5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10
<p>If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.</p>
<p>Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up? I recently found that about 40GB of disk space on my laptop was being used by spacy models I’d downloaded and forgotten about.</p>
<p>Fear not – spacy link offers you salvation from this wasteful use of disk space.</p>
<p>Spacy link essentially allows you to link your virtualenv copy of spacy to a copy of the model you already downloaded. Say you installed your desired spacy model to your global python3 installation – somewhere like** _/usr/lib/python3/site-packages/spacy/data_**** __**</p>
<p>Spacy link will let you link your existing model into a virtualenv to save redownloading (and using extra disk space). From your virtualenv you can do:</p>
<p>python -m spacy link ** <em>/usr/lib/python3/site-packages/spacy/data/&lt;name_of_model&gt; <name of model></em>**</p>
<p>For example if we wanted to make the <strong>en_core_web_lg</strong> the default english model model in our virtualenv we could do</p>
<p>python -m spacy link ** <em>/usr/lib/python3/site-packages/spacy/data/en_core_web_lg en</em>**</p>
<p>Presto! Now when we do <strong>spacy.load(‘en’)</strong> inside our virtualenv we get the large model!</p>


			<div class="tags">
						<ul class="flat">
							<li><a href="/tags/nlp">nlp</a></li>
							<li><a href="/tags/python">python</a></li>
			</div><div id="disqus_thread"></div>
<script type="text/javascript">
	(function () {
		if (window.location.hostname == "localhost")

		var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
		var disqus_shortname = 'brainsteam';
		dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
		(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
<noscript>Please enable JavaScript to view the </a></noscript>
<a href="http://disqus.com/" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
	<div class="footer wrapper">
	<nav class="nav">
		<div>2021  © James Ravenscroft 2020 |  <a href="https://github.com/knadh/hugo-ink">Ink</a> theme on <a href="https://gohugo.io">Hugo</a></div>

<script type="application/javascript">
var doNotTrack = false;
if (!doNotTrack) {
	window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;
	ga('create', 'UA-186263385-1', 'auto');
	ga('send', 'pageview');
<script async src='https://www.google-analytics.com/analytics.js'></script>