brainsteam.co.uk/2018/04/13/programmatically-downloadin.../index.html

144 lines
8.9 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Programmatically Downloading Open Access Papers - Brainsteam</title><meta name="viewport" content="width=device-width, initial-scale=1">
<meta itemprop="name" content="Programmatically Downloading Open Access Papers">
<meta itemprop="description" content="(Cover image “Unlocked” by Sean Hobson)
If youre an academic or youve got an interest in reading scientific papers, youve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. Its ok if youre affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes dont work and you still cant see the paper."><meta itemprop="datePublished" content="2018-04-13T16:04:47&#43;00:00" />
<meta itemprop="dateModified" content="2018-04-13T16:04:47&#43;00:00" />
<meta itemprop="wordCount" content="357">
<meta itemprop="keywords" content="open access,scientific papers,unpaywall," /><meta property="og:title" content="Programmatically Downloading Open Access Papers" />
<meta property="og:description" content="(Cover image “Unlocked” by Sean Hobson)
If youre an academic or youve got an interest in reading scientific papers, youve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. Its ok if youre affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes dont work and you still cant see the paper." />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://brainsteam.co.uk/2018/04/13/programmatically-downloading-open-access-papers/" /><meta property="article:section" content="posts" />
<meta property="article:published_time" content="2018-04-13T16:04:47&#43;00:00" />
<meta property="article:modified_time" content="2018-04-13T16:04:47&#43;00:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Programmatically Downloading Open Access Papers"/>
<meta name="twitter:description" content="(Cover image “Unlocked” by Sean Hobson)
If youre an academic or youve got an interest in reading scientific papers, youve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. Its ok if youre affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes dont work and you still cant see the paper."/>
<link href='https://fonts.googleapis.com/css?family=Playfair+Display:700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/normalize.css" />
<link rel="stylesheet" type="text/css" media="screen" href="https://brainsteam.co.uk/css/main.css" />
<link id="dark-scheme" rel="stylesheet" type="text/css" href="https://brainsteam.co.uk/css/dark.css" />
<script src="https://brainsteam.co.uk/js/feather.min.js"></script>
<script src="https://brainsteam.co.uk/js/main.js"></script>
</head>
<body>
<div class="container wrapper">
<div class="header">
<div class="avatar">
<a href="https://brainsteam.co.uk/">
<img src="/images/avatar.png" alt="Brainsteam" />
</a>
</div>
<h1 class="site-title"><a href="https://brainsteam.co.uk/">Brainsteam</a></h1>
<div class="site-description"><p>The irregular mental expulsions of a PhD student and CTO of Filament, my views are my own and do not represent my employers in any way.</p><nav class="nav social">
<ul class="flat"><li><a href="https://twitter.com/jamesravey/" title="Twitter" rel="me"><i data-feather="twitter"></i></a></li><li><a href="https://github.com/ravenscroftj" title="Github" rel="me"><i data-feather="github"></i></a></li><li><a href="/index.xml" title="RSS" rel="me"><i data-feather="rss"></i></a></li></ul>
</nav></div>
<nav class="nav">
<ul class="flat">
<li>
<a href="/">Home</a>
</li>
<li>
<a href="/tags">Tags</a>
</li>
<li>
<a href="https://jamesravey.me">About Me</a>
</li>
</ul>
</nav>
</div>
<div class="post">
<div class="post-header">
<div class="meta">
<div class="date">
<span class="day">13</span>
<span class="rest">Apr 2018</span>
</div>
</div>
<div class="matter">
<h1 class="title">Programmatically Downloading Open Access Papers</h1>
</div>
</div>
<div class="markdown">
<p><em><a href="https://www.flickr.com/photos/seanhobson/6216334720/in/photolist-atjkJQ-QuYgDA-cb9bGo-4o84DP-9GAeQ5-5dopRY-hyQV19-ngTMst-4rRwgg-qQr5Sy-e4XhCg-mQJpZ-6ttPLT-6zQxh2-dsE6bM-qQcUxd-6msKYB-4HRo5J-8W2ryV-4B5rRC-xj9C8-2V5HKa-7zS5wE-Ldsdy-bwMFxR-nibhxt-5mKLS5-5m2URM-7CsC9C-4nJ5jt-a4mQik-6GPYgf-cb9c8s-363XxR-8R4jGd-4qHxrv-T4A8wx-T1NyJG-4tR45P-f5bde-4tV62J-cDEZ9L-Te2m9S-NLeKd-orGJh5-4j53Za-T4Abnn-fqPY88-T1NwPE-7deVVp" target="_blank" rel="noopener">(Cover image “Unlocked” by Sean Hobson)</a></em></p>
<p>If youre an academic or youve got an interest in reading scientific papers, youve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. Its ok if youre affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes dont work and you still cant see the paper. Thankfully, the guys at<a href="http://unpaywall.org/"> Unpaywall</a> (actually built by <a href="http://impactstory.org/">Impact Story</a>) have been doing a fantastic job of making open access papers much more easily available to interested academics in the browser. If you end up at a publisher paywall and Unpaywall know about a legitimate free copy of the paper youre trying to read, theyll link you straight to it for direct download. Problem solved.</p>
<p>For me, as someone interested in text mining on large volumes of scientific papers, getting hold of high quality, peer reviewed open access papers that I can analyse can be a pain. I previously wrote about <a href="https://papro.org.uk/2013/02/26/plosget-py/">downloading batches of papers from PLOS One</a> for data mining purposes but Im currently interested in downloading papers that get mentioned and linked to in the news and although that can sometimes include PLOS journals, it also includes many other publishers, both open access and closed. Thankfully, Unpaywall come to the rescue again.</p>
<p>Unpaywall.org provide a free API that takes in a DOI and spits out any and all known free versions of that paper. That makes my life a lot easier: all I have to do is find a long list of DOIs that Im interested in analysing and run them through the API.</p>
<p>Ive provided a gist of the python function Ive written that wraps this API. Ive been using it in a Jupyter notebook (which Im not ready to publish just yet). Feel free to use it in your project. It might save you an hour or two of development time (it took me a while to work out what errors I needed to try and catch).</p>
</div>
<div class="tags">
<ul class="flat">
<li><a href="/tags/open-access">open access</a></li>
<li><a href="/tags/scientific-papers">scientific papers</a></li>
<li><a href="/tags/unpaywall">unpaywall</a></li>
</ul>
</div><div id="disqus_thread"></div>
<script type="text/javascript">
(function () {
if (window.location.hostname == "localhost")
return;
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
var disqus_shortname = 'brainsteam';
dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
})();
</script>
<noscript>Please enable JavaScript to view the </a></noscript>
<a href="http://disqus.com/" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
</div>
</div>
<div class="footer wrapper">
<nav class="nav">
<div>2021 © James Ravenscroft 2020 | <a href="https://github.com/knadh/hugo-ink">Ink</a> theme on <a href="https://gohugo.io">Hugo</a></div>
</nav>
</div>
<script type="application/javascript">
var doNotTrack = false;
if (!doNotTrack) {
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;
ga('create', 'UA-186263385-1', 'auto');
ga('send', 'pageview');
}
</script>
<script async src='https://www.google-analytics.com/analytics.js'></script>
<script>feather.replace()</script>
</body>
</html>