brainsteam.co.uk/public/categories/phd/index.xml

201 lines
17 KiB
XML
Raw Normal View History

2021-12-21 13:31:30 +00:00
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>PhD on Brainsteam</title>
<link>https://brainsteam.co.uk/categories/phd/</link>
<description>Recent content in PhD on Brainsteam</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<copyright>© James Ravenscroft 2020</copyright>
<lastBuildDate>Tue, 15 Jan 2019 18:14:16 +0000</lastBuildDate><atom:link href="https://brainsteam.co.uk/categories/phd/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Spacy Link or “How not to keep downloading the same files over and over”</title>
<link>https://brainsteam.co.uk/2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/</link>
<pubDate>Tue, 15 Jan 2019 18:14:16 +0000</pubDate>
<guid>https://brainsteam.co.uk/2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/</guid>
<description>If youre a frequent user of spacy and virtualenv you might well be all too familiar with the following:
python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If youre lucky and you have a decent internet connection then great, if not its time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?</description>
</item>
<item>
<title>Uploading HUGE files to Gitea</title>
<link>https://brainsteam.co.uk/2018/10/20/uploading-huge-files-to-gitea/</link>
<pubDate>Sat, 20 Oct 2018 10:09:41 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/10/20/uploading-huge-files-to-gitea/</guid>
<description>I recently stumbled upon and fell in love with Gitea a lightweight self-hosted Github and Gitlab alternative written in the Go programming language. One of my favourite things about it other than the speed and efficiency that mean you can even run it on a raspberry pi is the built in LFS support. For the unfamiliar, LFS is a protocol initially introduced by GitHub that allows users to version control large binary files something that Git is traditionally pretty poor at.</description>
</item>
<item>
<title>Dont forget your life jacket: the dangers of diving in deep at the deep end with deep learning</title>
<link>https://brainsteam.co.uk/2018/10/18/dont-forget-your-life-jacket-the-dangers-of-diving-in-deep-at-the-deep-end-with-deep-learning/</link>
<pubDate>Thu, 18 Oct 2018 14:35:05 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/10/18/dont-forget-your-life-jacket-the-dangers-of-diving-in-deep-at-the-deep-end-with-deep-learning/</guid>
<description>Deep Learning is a powerful technology but you might want to try some &amp;#8220;shallow&amp;#8221; approaches before you dive in. Neural networks are made up of neurones and synapses It&amp;#8217;s unquestionable that over the last decade, deep learning has changed machine learning landscape for the better. Deep Neural Networks (DNNs), first popularised by Yan LeCunn, Yoshua Bengio and Geoffrey Hinton, are a family of machine learning models that are capable of learning to see and categorise objects, predict stock market trends, understand written text and even play video games.</description>
</item>
<item>
<title>Programmatically Downloading Open Access Papers</title>
<link>https://brainsteam.co.uk/2018/04/13/programmatically-downloading-open-access-papers/</link>
<pubDate>Fri, 13 Apr 2018 16:04:47 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/04/13/programmatically-downloading-open-access-papers/</guid>
<description>(Cover image “Unlocked” by Sean Hobson)
If youre an academic or youve got an interest in reading scientific papers, youve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. Its ok if youre affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes dont work and you still cant see the paper.</description>
</item>
<item>
<title>Part time PhD: Mini-Sabbaticals</title>
<link>https://brainsteam.co.uk/2018/04/05/phd-mini-sabbaticals/</link>
<pubDate>Thu, 05 Apr 2018 13:08:51 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/04/05/phd-mini-sabbaticals/</guid>
<description>Avid readers amongst you will know that Im currently in the third year of my PhD in Computational Linguistics at the University of Warwick whilst also serving as CTO at Filament. An incredibly exciting pair of positions that certainly have their challenges and would be untenable without an incredibly supportive set of PhD supervisors (Amanda Clare and Maria Liakata) and an equally supportive and understanding pair of company directors (Phil and Doug).</description>
</item>
<item>
<title>Why I keep going back to Evernote</title>
<link>https://brainsteam.co.uk/2017/08/03/182/</link>
<pubDate>Thu, 03 Aug 2017 08:27:53 +0000</pubDate>
<guid>https://brainsteam.co.uk/2017/08/03/182/</guid>
<description>As the CTO for a London machine learning startup and a PhD student at Warwick Institute for the Science of Cities, to say Im busy is an understatement. At any given point in time, my mind is awash with hundreds of ideas around Filament tech strategy, a cool app Id like to build, ways to measure scientific impact, wondering what the name of that new song I heard on the radio was or some combination thereof.</description>
</item>
<item>
<title>Dialect Sensitive Topic Models</title>
<link>https://brainsteam.co.uk/2017/07/25/dialect-sensitive-topic-models/</link>
<pubDate>Tue, 25 Jul 2017 11:02:42 +0000</pubDate>
<guid>https://brainsteam.co.uk/2017/07/25/dialect-sensitive-topic-models/</guid>
<description>As part of my PhD Im currently interested in topic models that can take into account the dialect of the writing. That is, how can we build a model that can compare topics discussed in different dialectical styles, such as scientific papers versus newspaper articles. If youre new to the concept of topic modelling then this article can give you a quick primer.
Vanilla LDA A diagram of how latent variables in LDA model are connected Vanilla topic models such as Bleis LDA are great but start to fall down when the wording around one particular concept varies too much.</description>
</item>
<item>
<title>Exploring Web Archive Data CDX Files</title>
<link>https://brainsteam.co.uk/2017/06/05/exploring-web-archive-data-cdx-files/</link>
<pubDate>Mon, 05 Jun 2017 07:24:22 +0000</pubDate>
<guid>https://brainsteam.co.uk/2017/06/05/exploring-web-archive-data-cdx-files/</guid>
<description>I have recently been working in partnership with UK Web Archive in order to identify and parse large amounts of historic news data for an NLP task that I will blog about in the future. The NLP portion of this task will surely present its own challenges, but for now there is the small matter of identifying news data amongst the noise of 60TB of web archive dumps of the rest of the .</description>
</item>
<item>
<title>timetrack improvements</title>
<link>https://brainsteam.co.uk/2016/12/10/timetrack-improvements/</link>
<pubDate>Sat, 10 Dec 2016 09:33:41 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/12/10/timetrack-improvements/</guid>
<description>Ive just added a couple of improvements to timetrack that allow you to append to existing time recordings (either with an amount like 15m or using live to time additional minutes spent and append them).
You can also remove entries using timetrack rm instead of remove saving keystrokes is what programming is all about.
You can find the updated code over at github.</description>
</item>
<item>
<title>AI cant solve all our problems, but that doesnt mean it isnt intelligent</title>
<link>https://brainsteam.co.uk/2016/12/08/ai-cant-solve-all-our-problems-but-that-doesnt-mean-it-isnt-intelligent/</link>
<pubDate>Thu, 08 Dec 2016 10:08:13 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/12/08/ai-cant-solve-all-our-problems-but-that-doesnt-mean-it-isnt-intelligent/</guid>
<description>Thomas Hobbes, perhaps most famous for his thinking on western politics, was also thinking about how the human mind &amp;#8220;computes things&amp;#8221; 500 years ago. A recent opinion piece I read on Wired called for us to stop labelling our current specific machine learning models AI because they are not intelligent. I respectfully disagree.
AI is not a new concept. The idea that a computer could think like a human and one day pass for a human has been around since Turing and even in some form long before him.</description>
</item>
<item>
<title>We need to talk about push notifications (and why I stopped wearing my smartwatch)</title>
<link>https://brainsteam.co.uk/2016/11/27/we-need-to-talk-about-push-notifications-and-why-i-stopped-wearing-my-smartwatch/</link>
<pubDate>Sun, 27 Nov 2016 12:59:22 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/11/27/we-need-to-talk-about-push-notifications-and-why-i-stopped-wearing-my-smartwatch/</guid>
<description>I own a Pebble Steel which I got for Christmas a couple of years ago. Ive been very happy with it so far. I can control my music player from my wrist, get notifications and a summary of my calender. Recently, however Ive stopped wearing it. The reason is that constant streams of notifications stress me out, interrupt my workflow and not wearing it makes me feel more calm and in control and allows me to be more productive.</description>
</item>
<item>
<title>ElasticSearch: Turning analysis off and why its useful</title>
<link>https://brainsteam.co.uk/2015/11/29/elasticsearch-turning-analysis-off-and-why-its-useful/</link>
<pubDate>Sun, 29 Nov 2015 14:59:06 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/29/elasticsearch-turning-analysis-off-and-why-its-useful/</guid>
<description>I have recently been playing with Elastic search a lot for my PhD and started trying to do some more complicated queries and pattern matching using the DSL syntax. I have an index on my local machine called impact_studies which contains all 6637 REF 2014 impact case studies in a JSON format. One of the fields is “UOA” which contains the title of the unit of impact that the case study belongs to.</description>
</item>
<item>
<title>Freecite python wrapper</title>
<link>https://brainsteam.co.uk/2015/11/22/freecite-python-wrapper/</link>
<pubDate>Sun, 22 Nov 2015 19:20:19 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/22/freecite-python-wrapper/</guid>
<description>Ive written a simple wrapper around the Brown University Citation parser FreeCite. Im planning to use the service to pull out author names from references in REF impact studies and try to link them back to investigators listed on RCUK funding applications.
The code is here and is MIT licensed. It provides a simple method which takes a string representing a reference and returns a dict with each field separated. There is also a parse_many function which takes an array of reference strings and returns an array of dicts.</description>
</item>
<item>
<title>Scrolling in ElasticSearch</title>
<link>https://brainsteam.co.uk/2015/11/21/scrolling-in-elasticsearch/</link>
<pubDate>Sat, 21 Nov 2015 09:41:19 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/21/scrolling-in-elasticsearch/</guid>
<description>I know Im doing a lot of flip-flopping between SOLR and Elastic at the moment Im trying to figure out key similarities and differences between them and where one is more suitable than the other.
The following is an example of how to map a function _**f **_onto an entire set of indexed data in elastic using the scroll API.
If you use elastic, it is possible to do paging by adding a size and a from parameter.</description>
</item>
<item>
<title>Keynote at YDS 2015: Information Discovery, Partridge and Watson</title>
<link>https://brainsteam.co.uk/2015/11/02/keynote-at-yds-2015-information-discovery-partridge-and-watson/</link>
<pubDate>Mon, 02 Nov 2015 21:07:28 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/02/keynote-at-yds-2015-information-discovery-partridge-and-watson/</guid>
<description>Here is a recording of my recent keynote talk on the power of Natural Language processing through Watson and my academic/PhD topic &amp;#8211; Partridge &amp;#8211; at York Doctoral Symposium. 0-11 minutes &amp;#8211; history of mankind, invention and the acceleration of scientific progress (warming people to the idea that farming out your scientific reading to a computer is a much better idea than trying to read every paper written) 11-26 minutes &amp;#8211; My personal academic work &amp;#8211; scientific paper annotation and cognitive scientific research using NLP 26- 44 minutes &amp;#8211; Watson &amp;#8211; Jeopardy, MSK and Ecosystem 44 &amp;#8211; 48 minutes Q&amp;A on Watson and Partridge Please dont cringe too much at my technical explanation of Watson especially those of you who know much more about WEA and the original DeepQA setup than I do!</description>
</item>
<item>
<title>SAPIENTA Web Service and CLI</title>
<link>https://brainsteam.co.uk/2015/11/01/sapienta-web-service-and-cli/</link>
<pubDate>Sun, 01 Nov 2015 19:50:52 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/01/sapienta-web-service-and-cli/</guid>
<description>Hoorah! After a number of weeks Ive finally managed to get SAPIENTA running inside docker containers on our EBI cloud instance. You can try it out at http://sapienta.papro.org.uk/.
The project was previously running via a number of very precarious scripts that had a habit of stopping and not coming back up. Hopefully the new docker environment should be a lot more stable.
Another improvement Ive made is to create a websocket interface for calling the service and a Python-based commandline client.</description>
</item>
<item>
<title>CUSP Challenge Week 2015</title>
<link>https://brainsteam.co.uk/2015/08/30/cusp-challenge-week-2015/</link>
<pubDate>Sun, 30 Aug 2015 16:52:59 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/08/30/cusp-challenge-week-2015/</guid>
<description>[][1]Warwick CDT intake 2015: From left to right &amp;#8211; at the front Jacques, Zakiyya, Corinne, Neha and myself. Rear: David, John, Stephen (CDT director), Mo, Vaggelis, Malkiat and Greg Hello again readers those of you who follow me on other social media (twitter, instagram, facebook etc) probably know that Ive just returned from a week in New York City as part of my PhD. My reason for visiting was a kind of ice-breaking activity called the CUSP (Centre for Urban Science + Progress) Challenge Week.</description>
</item>
<item>
<title>SSSplit Improvements</title>
<link>https://brainsteam.co.uk/2015/07/15/sssplit-improvements/</link>
<pubDate>Wed, 15 Jul 2015 19:33:29 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/07/15/sssplit-improvements/</guid>
<description>Introduction As part of my continuing work on Partridge, Ive been working on improving the sentence splitting capability of SSSplit the component used to split academic papers from PLosOne and PubMedCentral into separate sentences.
Papers arrive in our system as big blocks of text with the occasional diagram, formula or diagram and in order to apply CoreSC annotations to the sentences we need to know where each sentence starts and ends.</description>
</item>
<item>
<title>Tidying up XML in one click</title>
<link>https://brainsteam.co.uk/2015/06/28/tidying-up-xml-in-one-click/</link>
<pubDate>Sun, 28 Jun 2015 10:24:33 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/06/28/tidying-up-xml-in-one-click/</guid>
<description>When Im working on Partridge and SAPIENTA, I find myself dealing with a lot of badly formatted XML. I used to manually run xmllint format against every file before opening it but that gets annoying very quickly (even if you have it saved in your bash history). So I decided to write a Nemo script that does it automatically for me.
#!/bin/sh for xmlfile in $NEMO_SCRIPT_SELECTED_FILE_PATHS; do if [[ $xmlfile == *.</description>
</item>
</channel>
</rss>