brainsteam.co.uk/brainsteam/content/posts/legacy/2018-04-13-programmatically...

3.5 KiB
Raw Blame History

title author type date url featured_image medium_post tags
Programmatically Downloading Open Access Papers James post 2018-04-13T16:04:47+00:00 /2018/04/13/programmatically-downloading-open-access-papers/ /wp-content/uploads/2018/04/6216334720_54e29fc13c_o-825x510.jpg
O:11:"Medium_Post":11:{s:16:"author_image_url";s:69:"https://cdn-images-1.medium.com/fit/c/200/200/0*naYvMn9xdbL5qlkJ.jpeg";s:10:"author_url";s:30:"https://medium.com/@jamesravey";s:11:"byline_name";N;s:12:"byline_email";N;s:10:"cross_link";s:2:"no";s:2:"id";s:12:"9cbbb57ab932";s:21:"follower_notification";s:3:"yes";s:7:"license";s:19:"all-rights-reserved";s:14:"publication_id";s:2:"-1";s:6:"status";s:6:"public";s:3:"url";s:91:"https://medium.com/@jamesravey/programmatically-downloading-open-access-papers-9cbbb57ab932";}
open access
scientific papers
Open Source
phd

(Cover image “Unlocked” by Sean Hobson)

If youre an academic or youve got an interest in reading scientific papers, youve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. Its ok if youre affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes dont work and you still cant see the paper. Thankfully, the guys at Unpaywall (actually built by Impact Story) have been doing a fantastic job of making open access papers much more easily available to interested academics in the browser. If you end up at a publisher paywall and Unpaywall know about a legitimate free copy of the paper youre trying to read, theyll link you straight to it for direct download. Problem solved.

For me, as someone interested in text mining on large volumes of scientific papers, getting hold of high quality, peer reviewed open access papers that I can analyse can be a pain. I previously wrote about downloading batches of papers from PLOS One for data mining purposes but Im currently interested in downloading papers that get mentioned and linked to in the news and although that can sometimes include PLOS journals, it also includes many other publishers, both open access and closed. Thankfully, Unpaywall come to the rescue again.

Unpaywall.org provide a free API that takes in a DOI and spits out any and all known free versions of that paper. That makes my life a lot easier: all I have to do is find a long list of DOIs that Im interested in analysing and run them through the API.

Ive provided a gist of the python function Ive written that wraps this API. Ive been using it in a Jupyter notebook (which Im not ready to publish just yet). Feel free to use it in your project. It might save you an hour or two of development time (it took me a while to work out what errors I needed to try and catch).