--- author: James date: 2018-04-13 16:04:47+00:00 featured_image: /wp-content/uploads/2018/04/6216334720_54e29fc13c_o-825x510.jpg medium_post: - O:11:"Medium_Post":11:{s:16:"author_image_url";s:69:"https://cdn-images-1.medium.com/fit/c/200/200/0*naYvMn9xdbL5qlkJ.jpeg";s:10:"author_url";s:30:"https://medium.com/@jamesravey";s:11:"byline_name";N;s:12:"byline_email";N;s:10:"cross_link";s:2:"no";s:2:"id";s:12:"9cbbb57ab932";s:21:"follower_notification";s:3:"yes";s:7:"license";s:19:"all-rights-reserved";s:14:"publication_id";s:2:"-1";s:6:"status";s:6:"public";s:3:"url";s:91:"https://medium.com/@jamesravey/programmatically-downloading-open-access-papers-9cbbb57ab932";} post_meta: - date preview: /social/04b7658a82b7347fa52c0afd9672bf3bcbb6fbe2af79dacf685edea7b2f5cd73.png tags: - open access - scientific papers - Open Source - phd title: Programmatically Downloading Open Access Papers type: posts url: /2018/04/13/programmatically-downloading-open-access-papers/ --- _(Cover image “Unlocked” by Sean Hobson)_ If you’re an academic or you’ve got an interest in reading scientific papers, you’ve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. It’s ok if you’re affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes don’t work and you still can’t see the paper. Thankfully, the guys at[ Unpaywall][1] (actually built by [Impact Story][2]) have been doing a fantastic job of making open access papers much more easily available to interested academics in the browser. If you end up at a publisher paywall and Unpaywall know about a legitimate free copy of the paper you’re trying to read, they’ll link you straight to it for direct download. Problem solved. For me, as someone interested in text mining on large volumes of scientific papers, getting hold of high quality, peer reviewed open access papers that I can analyse can be a pain. I previously wrote about [downloading batches of papers from PLOS One][3] for data mining purposes but I’m currently interested in downloading papers that get mentioned and linked to in the news and although that can sometimes include PLOS journals, it also includes many other publishers, both open access and closed. Thankfully, Unpaywall come to the rescue again. Unpaywall.org provide a free API that takes in a DOI and spits out any and all known free versions of that paper. That makes my life a lot easier: all I have to do is find a long list of DOIs that I’m interested in analysing and run them through the API. I’ve provided a gist of the python function I’ve written that wraps this API. I’ve been using it in a Jupyter notebook (which I’m not ready to publish just yet). Feel free to use it in your project. It might save you an hour or two of development time (it took me a while to work out what errors I needed to try and catch). [1]: http://unpaywall.org/ [2]: http://impactstory.org/ [3]: https://papro.org.uk/2013/02/26/plosget-py/