brainsteam.co.uk/new_files/posts/index.xml

458 lines
40 KiB
XML
Raw Normal View History

2021-12-21 13:30:09 +00:00
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Posts on Brainsteam</title>
<link>https://brainsteam.co.uk/posts/</link>
<description>Recent content in Posts on Brainsteam</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<copyright>© James Ravenscroft 2020</copyright>
<lastBuildDate>Mon, 12 Apr 2021 20:21:11 +0000</lastBuildDate><atom:link href="https://brainsteam.co.uk/posts/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>An opinionated guide to Python environments in 2021</title>
<link>https://brainsteam.co.uk/2021/04/01/opinionated-guide-to-virtualenvs/</link>
<pubDate>Mon, 12 Apr 2021 20:21:11 +0000</pubDate>
<guid>https://brainsteam.co.uk/2021/04/01/opinionated-guide-to-virtualenvs/</guid>
<description>A person overwhelmed by boxes by Cottonbro
Note: If you don&amp;rsquo;t want to read the blah-blah context and history stuff then you can jump to the recommendations The Problem The need for virtual python environments becomes fairly obvious early in most Python developers&#39; careers when they switch between two projects and realise that they have incompatible dependences (e.g. project1 needs scikit-learn-0.21 and project2 needs scikit-learn-0.24). Unlike other mainstream languages like Javascript(Node.</description>
</item>
<item>
<title>Reproducing &#39;ancient&#39; experiments with Pytorch inside docker</title>
<link>https://brainsteam.co.uk/2021/03/01/running-old-pytorch-docker/</link>
<pubDate>Mon, 01 Mar 2021 20:21:11 +0000</pubDate>
<guid>https://brainsteam.co.uk/2021/03/01/running-old-pytorch-docker/</guid>
<description>A beige analog compass by Ylanite Koppens
Introduction Open machine learning research is undergoing something of a reproducibiltiy crisis. In fairness it&amp;rsquo;s not usually the authors&#39; fault - or at least not entirely. We&amp;rsquo;re a fickle industry and the tools and frameworks were &amp;lsquo;in vogue&amp;rsquo; and state of the art a couple of years ago are now obsolete. Furthermore, academics and open source contributors are under no obligation to keep their code up to date.</description>
</item>
<item>
<title>Pickle 5 Madness with MLFlow and Python 3.6/3.7</title>
<link>https://brainsteam.co.uk/2021/01/14/pickle-5-madness-with-mlflow/</link>
<pubDate>Thu, 14 Jan 2021 11:42:28 +0000</pubDate>
<guid>https://brainsteam.co.uk/2021/01/14/pickle-5-madness-with-mlflow/</guid>
<description>A jar of pickles by Ksenia Charnaya
I recently came across an infuriating problem where an MLFlow python model I had trained on one system using Python 3.6 would not load on another system with an identical version of Python.
The exact problem was that when I ran mlflow models serve -m &amp;lt;url/to/model/in/bucket&amp;gt; the service would crash saying that the model could not be unserialized because ValueError: unsupported pickle protocol: 5.</description>
</item>
<item>
<title>Serving NLP Models with MLflow</title>
<link>https://brainsteam.co.uk/2020/12/29/serving-nlp-models-with-mlflow/</link>
<pubDate>Tue, 29 Dec 2020 09:50:28 +0000</pubDate>
<guid>https://brainsteam.co.uk/2020/12/29/serving-nlp-models-with-mlflow/</guid>
<description>MLFlow is a powerful open source MLOps platform with built in framework for serving your trained ML models as REST APIs. The REST framework will load data provided in a JSON or CSV format compatible with pandas and pass this directly into your model. This can be handy when your model is expecting a tabular list of numerical and categorical features. However it is less clear how to serve with models and pipelines that are expecting unstructured text data as their primary input.</description>
</item>
<item>
<title>DVC and Backblaze B2 for Reliable &amp; Reproducible Data Science</title>
<link>https://brainsteam.co.uk/2020/11/27/dvc-and-backblaze-b2-for-reliable-reproducible-data-science/</link>
<pubDate>Fri, 27 Nov 2020 15:43:48 +0000</pubDate>
<guid>https://brainsteam.co.uk/2020/11/27/dvc-and-backblaze-b2-for-reliable-reproducible-data-science/</guid>
<description>Introduction When youre working with large datasets, storing them in git alongside your source code is usually not an optimal solution. Git is famously, not really suited to large files and whilst general purpose solutions exist (Git LFS being perhaps the most famous and widely adopted solution), DVC is a powerful alternative that does not require a dedicated LFS server and can be used directly with a range of cloud storage systems as well as traditional NFS and SFTP-backed filestores all listed out here.</description>
</item>
<item>
<title>Dark Recommendation Engines: Algorithmic curation as part of a healthy information diet.</title>
<link>https://brainsteam.co.uk/2020/09/04/dark-recommendation-engines-algorithmic-curation-as-part-of-a-healthy-information-diet/</link>
<pubDate>Fri, 04 Sep 2020 15:30:19 +0000</pubDate>
<guid>https://brainsteam.co.uk/2020/09/04/dark-recommendation-engines-algorithmic-curation-as-part-of-a-healthy-information-diet/</guid>
<description>In an ever-growing digital landscape filled with more content than a person can consume in their lifetime, recommendation engines are a blessing but can also be a a curse and understanding their strengths and weaknesses is a vital skill as part of a balanced media diet. If you remember when connecting to the internet involved a squawking modem and images that took 5 minutes to load then you probably discovered your favourite musician after hearing them on the radio, reading about them in NME being told about them by a friend.</description>
</item>
<item>
<title>PyTorch 1.X.X and Pipenv and Specific versions of CUDA</title>
<link>https://brainsteam.co.uk/2020/02/02/pytorch-1-x-x-and-pipenv-and-specific-versions-of-cuda/</link>
<pubDate>Sun, 02 Feb 2020 14:40:46 +0000</pubDate>
<guid>https://brainsteam.co.uk/2020/02/02/pytorch-1-x-x-and-pipenv-and-specific-versions-of-cuda/</guid>
<description>I recently ran into an issue where the newest version of Torch (as of writing 1.4.0) requires a newer version of CUDA/Nvidia Drivers than I have installed.
Last time I tried to upgrade my CUDA version it took me several hours/days so I didnt really want to have to spend lots of time on that.
As it happens PyTorch has an archive of compiled python whl objects for different combinations of Python version (3.</description>
</item>
<item>
<title>How can AI practitioners reduce our carbon footprint?</title>
<link>https://brainsteam.co.uk/2019/06/20/how-can-ai-practitioners-reduce-our-carbon-footprint/</link>
<pubDate>Thu, 20 Jun 2019 09:18:40 +0000</pubDate>
<guid>https://brainsteam.co.uk/2019/06/20/how-can-ai-practitioners-reduce-our-carbon-footprint/</guid>
<description>In recent weeks and months the impending global climate catastrophe has been at the forefront of many peoples minds. Thanks to movements like Extinction Rebellion and high profile environmentalists like Greta Thunberg and David Attenborough as well as damning reports from the IPCC, it finally feels like momentum is building behind significant reduction of carbon emissions. That said, knowing how we can help on an individual level beyond driving and flying less still feels very overwhelming.</description>
</item>
<item>
<title>Why Im excited about Kubernetes &#43; Google Anthos: the Future of Enterprise AI deployment</title>
<link>https://brainsteam.co.uk/2019/04/24/why-im-excited-about-kubernetes-google-anthos-the-future-of-enterprise-ai-deployment/</link>
<pubDate>Wed, 24 Apr 2019 10:33:24 +0000</pubDate>
<guid>https://brainsteam.co.uk/2019/04/24/why-im-excited-about-kubernetes-google-anthos-the-future-of-enterprise-ai-deployment/</guid>
<description>Filament build and deploy enterprise AI applications on behalf of incumbent institutions in finance, biotech, facilities management and other sectors. James Ravenscroft, CTO at Filament, writes about the challenges of enterprise software deployment and the opportunities presented by Kubernetes and Googles Anthos offering. It is a big myth that bringing a software package to market starts and ends with developers and testers. One of the most important, complex and time consuming parts of enterprise software projects is around packaging up the code and making it run across lots of different systems: commonly and affectionately termed “DevOps” in many organisations.</description>
</item>
<item>
<title>Spacy Link or “How not to keep downloading the same files over and over”</title>
<link>https://brainsteam.co.uk/2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/</link>
<pubDate>Tue, 15 Jan 2019 18:14:16 +0000</pubDate>
<guid>https://brainsteam.co.uk/2019/01/15/spacy-link-or-how-not-to-keep-downloading-the-same-files-over-and-over/</guid>
<description>If youre a frequent user of spacy and virtualenv you might well be all too familiar with the following:
python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz (852.3MB)
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If youre lucky and you have a decent internet connection then great, if not its time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?</description>
</item>
<item>
<title>Applied AI in 2019</title>
<link>https://brainsteam.co.uk/2019/01/06/applied-ai-in-2019/</link>
<pubDate>Sun, 06 Jan 2019 09:52:35 +0000</pubDate>
<guid>https://brainsteam.co.uk/2019/01/06/applied-ai-in-2019/</guid>
<description>Looking back at some of the biggest AI and ML developments from 2018 and how they might influence applied AI in the coming year. 2018 was a pretty exciting year for AI developments. Its true to say there is still a lot of hype in the space but it feels like people are beginning to really understand where AI can and cant help them solve practical problems.
In this article well take a look at some of the AI innovation that came out of academia and research teams in 2018 and how they might affect practical AI use cases in the coming year.</description>
</item>
<item>
<title>🤐🤐Can Bots Keep Secrets? The Future of Chatbot Security and Conversational “Hacks”</title>
<link>https://brainsteam.co.uk/2018/12/09/%F0%9F%A4%90%F0%9F%A4%90can-bots-keep-secrets-the-future-of-chatbot-security-and-conversational-hacks/</link>
<pubDate>Sun, 09 Dec 2018 10:36:34 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/12/09/%F0%9F%A4%90%F0%9F%A4%90can-bots-keep-secrets-the-future-of-chatbot-security-and-conversational-hacks/</guid>
<description>As adoption of chatbots and conversational interfaces continues to grow, how will businesses keep their brand safe and their customers data safer?
From deliberate infiltration of systems tobugs that cause accidental data leakage, these days, the exposure or loss of personal data is a large part of what occupies almost every self-respecting CIOs mind. Especially since the EU has just slapped its first defendant with a GDPR fine.
Over the last 10-15 years, through the rise of the “interactive” web and social media, many companies have learned the hard way about the importance of techniques like hashing passwords stored in databases and sanitising user input before it is used for querying databases.</description>
</item>
<item>
<title>Why is Tmux crashing on start?</title>
<link>https://brainsteam.co.uk/2018/11/07/why-is-tmux-crashing-on-start/</link>
<pubDate>Wed, 07 Nov 2018 07:40:45 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/11/07/why-is-tmux-crashing-on-start/</guid>
<description>I spent several hours trying to get to the bottom of why tmux was crashing as soon as I ran it on Fedora. It turns out theres a simple fix. When tmux starts it uses /dev/ptmx to create a new TTY (virtual terminal) that the user can interact with. If your user does not have permission to access this device then tmux will silently die. A good way to verify this is to try running screen too.</description>
</item>
<item>
<title>Uploading HUGE files to Gitea</title>
<link>https://brainsteam.co.uk/2018/10/20/uploading-huge-files-to-gitea/</link>
<pubDate>Sat, 20 Oct 2018 10:09:41 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/10/20/uploading-huge-files-to-gitea/</guid>
<description>I recently stumbled upon and fell in love with Gitea a lightweight self-hosted Github and Gitlab alternative written in the Go programming language. One of my favourite things about it other than the speed and efficiency that mean you can even run it on a raspberry pi is the built in LFS support. For the unfamiliar, LFS is a protocol initially introduced by GitHub that allows users to version control large binary files something that Git is traditionally pretty poor at.</description>
</item>
<item>
<title>Dont forget your life jacket: the dangers of diving in deep at the deep end with deep learning</title>
<link>https://brainsteam.co.uk/2018/10/18/dont-forget-your-life-jacket-the-dangers-of-diving-in-deep-at-the-deep-end-with-deep-learning/</link>
<pubDate>Thu, 18 Oct 2018 14:35:05 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/10/18/dont-forget-your-life-jacket-the-dangers-of-diving-in-deep-at-the-deep-end-with-deep-learning/</guid>
<description>Deep Learning is a powerful technology but you might want to try some &amp;#8220;shallow&amp;#8221; approaches before you dive in. Neural networks are made up of neurones and synapses It&amp;#8217;s unquestionable that over the last decade, deep learning has changed machine learning landscape for the better. Deep Neural Networks (DNNs), first popularised by Yan LeCunn, Yoshua Bengio and Geoffrey Hinton, are a family of machine learning models that are capable of learning to see and categorise objects, predict stock market trends, understand written text and even play video games.</description>
</item>
<item>
<title>GPUs are not just for images any more…</title>
<link>https://brainsteam.co.uk/2018/05/13/gpus-are-not-just-for-images-any-more/</link>
<pubDate>Sun, 13 May 2018 07:26:12 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/05/13/gpus-are-not-just-for-images-any-more/</guid>
<description>As a machine learning professional specialising in computational linguistics (helping machines to extract meaning from human text), I have confused people on multiple occasions by suggesting that their document processing problem could be solved by neural networks trained using a Graphics Processing Unit (GPU). Youd be well within your rights to be confused. To the uninitiated what I just said was “Lets solve this problem involving reading lots of text by building a system that runs on specialised computer chips designed specifically to render images at high speed”.</description>
</item>
<item>
<title>Programmatically Downloading Open Access Papers</title>
<link>https://brainsteam.co.uk/2018/04/13/programmatically-downloading-open-access-papers/</link>
<pubDate>Fri, 13 Apr 2018 16:04:47 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/04/13/programmatically-downloading-open-access-papers/</guid>
<description>(Cover image “Unlocked” by Sean Hobson)
If youre an academic or youve got an interest in reading scientific papers, youve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. Its ok if youre affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes dont work and you still cant see the paper.</description>
</item>
<item>
<title>Part time PhD: Mini-Sabbaticals</title>
<link>https://brainsteam.co.uk/2018/04/05/phd-mini-sabbaticals/</link>
<pubDate>Thu, 05 Apr 2018 13:08:51 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/04/05/phd-mini-sabbaticals/</guid>
<description>Avid readers amongst you will know that Im currently in the third year of my PhD in Computational Linguistics at the University of Warwick whilst also serving as CTO at Filament. An incredibly exciting pair of positions that certainly have their challenges and would be untenable without an incredibly supportive set of PhD supervisors (Amanda Clare and Maria Liakata) and an equally supportive and understanding pair of company directors (Phil and Doug).</description>
</item>
<item>
<title>Re-using machine learning models and the “no free lunch” theorem</title>
<link>https://brainsteam.co.uk/2018/03/21/re-using-machine-learning-models-and-the-no-free-lunch-theorem/</link>
<pubDate>Wed, 21 Mar 2018 11:26:27 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/03/21/re-using-machine-learning-models-and-the-no-free-lunch-theorem/</guid>
<description>Why re-use machine learning models? Model re-use can be a huge cost saver when developing AI systems. But how well will your models perform in their new environment? You can get a lot of value out of training a machine learning model to solve a single use case, like predicting emotion in your customer chatbot transcripts and putting the angry ones through to real humans. However, you might be able to extract even more value out of your model by using it in more than one use case.</description>
</item>
<item>
<title>How I became a gopher over christmas</title>
<link>https://brainsteam.co.uk/2018/01/27/how-i-became-a-gopher/</link>
<pubDate>Sat, 27 Jan 2018 10:09:34 +0000</pubDate>
<guid>https://brainsteam.co.uk/2018/01/27/how-i-became-a-gopher/</guid>
<description>Happy new year to one and all. Its been a while since I posted and life continues onwards at a crazy pace. I meant to publish this post just after Christmas but have only found time to sit down and write now.
If anyone is wondering whats with the crazy title a gopher is someone who practices the Go programming language (just as those who write in Python refer to themselves as pythonistas.</description>
</item>
<item>
<title>Why I keep going back to Evernote</title>
<link>https://brainsteam.co.uk/2017/08/03/182/</link>
<pubDate>Thu, 03 Aug 2017 08:27:53 +0000</pubDate>
<guid>https://brainsteam.co.uk/2017/08/03/182/</guid>
<description>As the CTO for a London machine learning startup and a PhD student at Warwick Institute for the Science of Cities, to say Im busy is an understatement. At any given point in time, my mind is awash with hundreds of ideas around Filament tech strategy, a cool app Id like to build, ways to measure scientific impact, wondering what the name of that new song I heard on the radio was or some combination thereof.</description>
</item>
<item>
<title>Dialect Sensitive Topic Models</title>
<link>https://brainsteam.co.uk/2017/07/25/dialect-sensitive-topic-models/</link>
<pubDate>Tue, 25 Jul 2017 11:02:42 +0000</pubDate>
<guid>https://brainsteam.co.uk/2017/07/25/dialect-sensitive-topic-models/</guid>
<description>As part of my PhD Im currently interested in topic models that can take into account the dialect of the writing. That is, how can we build a model that can compare topics discussed in different dialectical styles, such as scientific papers versus newspaper articles. If youre new to the concept of topic modelling then this article can give you a quick primer.
Vanilla LDA A diagram of how latent variables in LDA model are connected Vanilla topic models such as Bleis LDA are great but start to fall down when the wording around one particular concept varies too much.</description>
</item>
<item>
<title>Exploring Web Archive Data CDX Files</title>
<link>https://brainsteam.co.uk/2017/06/05/exploring-web-archive-data-cdx-files/</link>
<pubDate>Mon, 05 Jun 2017 07:24:22 +0000</pubDate>
<guid>https://brainsteam.co.uk/2017/06/05/exploring-web-archive-data-cdx-files/</guid>
<description>I have recently been working in partnership with UK Web Archive in order to identify and parse large amounts of historic news data for an NLP task that I will blog about in the future. The NLP portion of this task will surely present its own challenges, but for now there is the small matter of identifying news data amongst the noise of 60TB of web archive dumps of the rest of the .</description>
</item>
<item>
<title>timetrack improvements</title>
<link>https://brainsteam.co.uk/2016/12/10/timetrack-improvements/</link>
<pubDate>Sat, 10 Dec 2016 09:33:41 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/12/10/timetrack-improvements/</guid>
<description>Ive just added a couple of improvements to timetrack that allow you to append to existing time recordings (either with an amount like 15m or using live to time additional minutes spent and append them).
You can also remove entries using timetrack rm instead of remove saving keystrokes is what programming is all about.
You can find the updated code over at github.</description>
</item>
<item>
<title>AI cant solve all our problems, but that doesnt mean it isnt intelligent</title>
<link>https://brainsteam.co.uk/2016/12/08/ai-cant-solve-all-our-problems-but-that-doesnt-mean-it-isnt-intelligent/</link>
<pubDate>Thu, 08 Dec 2016 10:08:13 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/12/08/ai-cant-solve-all-our-problems-but-that-doesnt-mean-it-isnt-intelligent/</guid>
<description>Thomas Hobbes, perhaps most famous for his thinking on western politics, was also thinking about how the human mind &amp;#8220;computes things&amp;#8221; 500 years ago. A recent opinion piece I read on Wired called for us to stop labelling our current specific machine learning models AI because they are not intelligent. I respectfully disagree.
AI is not a new concept. The idea that a computer could think like a human and one day pass for a human has been around since Turing and even in some form long before him.</description>
</item>
<item>
<title>We need to talk about push notifications (and why I stopped wearing my smartwatch)</title>
<link>https://brainsteam.co.uk/2016/11/27/we-need-to-talk-about-push-notifications-and-why-i-stopped-wearing-my-smartwatch/</link>
<pubDate>Sun, 27 Nov 2016 12:59:22 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/11/27/we-need-to-talk-about-push-notifications-and-why-i-stopped-wearing-my-smartwatch/</guid>
<description>I own a Pebble Steel which I got for Christmas a couple of years ago. Ive been very happy with it so far. I can control my music player from my wrist, get notifications and a summary of my calender. Recently, however Ive stopped wearing it. The reason is that constant streams of notifications stress me out, interrupt my workflow and not wearing it makes me feel more calm and in control and allows me to be more productive.</description>
</item>
<item>
<title>timetrack a simple time tracking application for developers</title>
<link>https://brainsteam.co.uk/2016/11/23/timetrack-a-simple-time-tracking-application-for-developers/</link>
<pubDate>Wed, 23 Nov 2016 14:43:58 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/11/23/timetrack-a-simple-time-tracking-application-for-developers/</guid>
<description>Ive written a small command line application for tracking my time on my PhD and other projects. We use Harvest at Filament which is great if youve got a huge team and want the complexity (and of course license charges) of an online cloud solution for time tracking.
If, like me, youre just interested to see how much time you are spending on your different projects and you dont have any requirement for fancy web interfaces or client billing, then timetrack might be for you.</description>
</item>
<item>
<title>The builder, the salesman and the property tycoon</title>
<link>https://brainsteam.co.uk/2016/11/12/the-builder-the-salesman-and-the-property-tycoon/</link>
<pubDate>Sat, 12 Nov 2016 11:43:24 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/11/12/the-builder-the-salesman-and-the-property-tycoon/</guid>
<description>A testament to marketers around the world is the myth that their AI platform X, Y or Z can solve all your problems with no effort. Perhaps it is this, combined with developers and data scientists often being hidden out of sight and out of mind that leads people to think this way.
Unfortunately, the truth of the matter is that ML and AI involve blood sweat and tears especially if you are building things from scratch rather than using APIs.</description>
</item>
<item>
<title>#BlackgangPi a Raspberry Pi Hack at Blackgang Chine</title>
<link>https://brainsteam.co.uk/2016/06/05/blackgangpi-a-raspberry-pi-hack-at-blackgang-chine/</link>
<pubDate>Sun, 05 Jun 2016 07:59:40 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/06/05/blackgangpi-a-raspberry-pi-hack-at-blackgang-chine/</guid>
<description>I was very excited to be invited along with some other IBMers to the Blackgang Pi event run by Dr Lucy Rogers on a semi regular basis at the Blackgang Chine theme park on the Isle of Wight.
Blackgang Chine is a theme park on the southern tip of the Isle of Wight and holds the title of oldest theme park in the United Kingdom. We were lucky enough to be invited along to help them modernise some of their animatronic exhibits, replacing some of the aging bespoke PCBs and controllers with Raspberry Pis running Node-RED and communicating using MQTT/Watson IOT.</description>
</item>
<item>
<title>Cognitive Quality Assurance Pt 2: Performance Metrics</title>
<link>https://brainsteam.co.uk/2016/05/29/cognitive-quality-assurance-pt-2-performance-metrics/</link>
<pubDate>Sun, 29 May 2016 09:41:26 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/05/29/cognitive-quality-assurance-pt-2-performance-metrics/</guid>
<description>EDIT: Hello readers, these articles are now 4 years old and many of the Watson services and APIs have moved or been changed. The concepts discussed in these articles are still relevant but I am working on 2nd editions of them.
Last time we discussed some good practices for collecting data and then splitting it into test and train in order to create a ground truth for your machine learning system.</description>
</item>
<item>
<title>IBM Watson Its for data scientists too!</title>
<link>https://brainsteam.co.uk/2016/05/01/ibm-watson-its-for-data-scientists-too/</link>
<pubDate>Sun, 01 May 2016 11:28:13 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/05/01/ibm-watson-its-for-data-scientists-too/</guid>
<description>Last week, my colleague Olly and I gave a talk at a data science meetup on how IBM Watson can be used for data science applications.
We had an amazing time and got some really great feedback from the event. We will definitely be doing more talks at events like these in the near future so keep an eye out for us!
I will also be writing a little bit more about the experiment I did around Core Scientific Concepts and Watson Natural Language Classifier in a future blog post.</description>
</item>
<item>
<title>Cognitive Quality Assurance An Introduction</title>
<link>https://brainsteam.co.uk/2016/03/29/cognitive-quality-assurance-an-introduction/</link>
<pubDate>Tue, 29 Mar 2016 08:50:29 +0000</pubDate>
<guid>https://brainsteam.co.uk/2016/03/29/cognitive-quality-assurance-an-introduction/</guid>
<description>EDIT: Hello readers, these articles are now 4 years old and many of the Watson services and APIs have moved or been changed. The concepts discussed in these articles are still relevant but I am working on 2nd editions of them.
This article has a slant towards the IBM Watson Developer Cloud Services but the principles and rules of thumb expressed here are applicable to most cognitive/machine learning problems.</description>
</item>
<item>
<title>ElasticSearch: Turning analysis off and why its useful</title>
<link>https://brainsteam.co.uk/2015/11/29/elasticsearch-turning-analysis-off-and-why-its-useful/</link>
<pubDate>Sun, 29 Nov 2015 14:59:06 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/29/elasticsearch-turning-analysis-off-and-why-its-useful/</guid>
<description>I have recently been playing with Elastic search a lot for my PhD and started trying to do some more complicated queries and pattern matching using the DSL syntax. I have an index on my local machine called impact_studies which contains all 6637 REF 2014 impact case studies in a JSON format. One of the fields is “UOA” which contains the title of the unit of impact that the case study belongs to.</description>
</item>
<item>
<title>Home automation with Raspberry Pi and Watson</title>
<link>https://brainsteam.co.uk/2015/11/28/watson-home-automation/</link>
<pubDate>Sat, 28 Nov 2015 10:57:14 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/28/watson-home-automation/</guid>
<description>Ive recently been playing with trying to build a Watson powered home automation system using my Raspberry Pi and some other electronic bits that I have on hand.
There are already a lot of people doing work in this space. One of the most successful projects being JASPER which uses speech to text and an always on background listening microphone to talk to you and carry out actions when you ask it things in natural language like “Whats the weather going to be like tomorrow?</description>
</item>
<item>
<title>Freecite python wrapper</title>
<link>https://brainsteam.co.uk/2015/11/22/freecite-python-wrapper/</link>
<pubDate>Sun, 22 Nov 2015 19:20:19 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/22/freecite-python-wrapper/</guid>
<description>Ive written a simple wrapper around the Brown University Citation parser FreeCite. Im planning to use the service to pull out author names from references in REF impact studies and try to link them back to investigators listed on RCUK funding applications.
The code is here and is MIT licensed. It provides a simple method which takes a string representing a reference and returns a dict with each field separated. There is also a parse_many function which takes an array of reference strings and returns an array of dicts.</description>
</item>
<item>
<title>Scrolling in ElasticSearch</title>
<link>https://brainsteam.co.uk/2015/11/21/scrolling-in-elasticsearch/</link>
<pubDate>Sat, 21 Nov 2015 09:41:19 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/21/scrolling-in-elasticsearch/</guid>
<description>I know Im doing a lot of flip-flopping between SOLR and Elastic at the moment Im trying to figure out key similarities and differences between them and where one is more suitable than the other.
The following is an example of how to map a function _**f **_onto an entire set of indexed data in elastic using the scroll API.
If you use elastic, it is possible to do paging by adding a size and a from parameter.</description>
</item>
<item>
<title>Spellchecking in retrieve and rank</title>
<link>https://brainsteam.co.uk/2015/11/17/spellchecking-in-retrieve-and-rank/</link>
<pubDate>Tue, 17 Nov 2015 21:41:09 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/17/spellchecking-in-retrieve-and-rank/</guid>
<description>Introduction Being able to deal with typos and incorrect spellings is an absolute must in any modern search facility. Humans can be lazy and clumsy and I personally often search for things with incorrect terms due to my sausage fingers. In this article I will explain how to turn on spelling suggestions in retrieve and rank so that if your users ask your system for something with a clumsy query, you can suggest spelling fixes for them so that they can submit another, more fruitful question to the system.</description>
</item>
<item>
<title>Retrieve and Rank and Python</title>
<link>https://brainsteam.co.uk/2015/11/16/retrieve-and-rank-and-python/</link>
<pubDate>Mon, 16 Nov 2015 18:25:39 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/16/retrieve-and-rank-and-python/</guid>
<description>Introduction Retrieve and Rank (R&amp;amp;R), if you hadnt already heard about it, is IBM Watsons new web service component for information retrieval and question answering. My colleague Chris Madison has summarised how it works in a high level way here.
R&amp;amp;R is based on the Apache SOLR search engine with a machine learning result ranking plugin that learns what answers are most relevant given an input query and presents them in the learnt “relevance” order.</description>
</item>
<item>
<title>Keynote at YDS 2015: Information Discovery, Partridge and Watson</title>
<link>https://brainsteam.co.uk/2015/11/02/keynote-at-yds-2015-information-discovery-partridge-and-watson/</link>
<pubDate>Mon, 02 Nov 2015 21:07:28 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/02/keynote-at-yds-2015-information-discovery-partridge-and-watson/</guid>
<description>Here is a recording of my recent keynote talk on the power of Natural Language processing through Watson and my academic/PhD topic &amp;#8211; Partridge &amp;#8211; at York Doctoral Symposium. 0-11 minutes &amp;#8211; history of mankind, invention and the acceleration of scientific progress (warming people to the idea that farming out your scientific reading to a computer is a much better idea than trying to read every paper written) 11-26 minutes &amp;#8211; My personal academic work &amp;#8211; scientific paper annotation and cognitive scientific research using NLP 26- 44 minutes &amp;#8211; Watson &amp;#8211; Jeopardy, MSK and Ecosystem 44 &amp;#8211; 48 minutes Q&amp;A on Watson and Partridge Please dont cringe too much at my technical explanation of Watson especially those of you who know much more about WEA and the original DeepQA setup than I do!</description>
</item>
<item>
<title>SAPIENTA Web Service and CLI</title>
<link>https://brainsteam.co.uk/2015/11/01/sapienta-web-service-and-cli/</link>
<pubDate>Sun, 01 Nov 2015 19:50:52 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/11/01/sapienta-web-service-and-cli/</guid>
<description>Hoorah! After a number of weeks Ive finally managed to get SAPIENTA running inside docker containers on our EBI cloud instance. You can try it out at http://sapienta.papro.org.uk/.
The project was previously running via a number of very precarious scripts that had a habit of stopping and not coming back up. Hopefully the new docker environment should be a lot more stable.
Another improvement Ive made is to create a websocket interface for calling the service and a Python-based commandline client.</description>
</item>
<item>
<title>A week in Austin, TX Watson Labs</title>
<link>https://brainsteam.co.uk/2015/10/22/a-week-in-austin-tx-watson-labs/</link>
<pubDate>Thu, 22 Oct 2015 18:10:57 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/10/22/a-week-in-austin-tx-watson-labs/</guid>
<description>At the beginning of the month, I was lucky enough to spend a month embedded in the Watson Labs team in Austin, TX. These mysterious and enigmatic members of the Watson family have a super secret bat-cave known as “The Garage” located on the IBM Austin site to which access is prohibited for normal IBMers unless accompanied by a labs team member.
During the week I was helping out with a couple of the internal projects but also got the chance to experiment with some of the new Watson Developer Cloud APIS to create some new tools for internal use.</description>
</item>
<item>
<title>CUSP Challenge Week 2015</title>
<link>https://brainsteam.co.uk/2015/08/30/cusp-challenge-week-2015/</link>
<pubDate>Sun, 30 Aug 2015 16:52:59 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/08/30/cusp-challenge-week-2015/</guid>
<description>[][1]Warwick CDT intake 2015: From left to right &amp;#8211; at the front Jacques, Zakiyya, Corinne, Neha and myself. Rear: David, John, Stephen (CDT director), Mo, Vaggelis, Malkiat and Greg Hello again readers those of you who follow me on other social media (twitter, instagram, facebook etc) probably know that Ive just returned from a week in New York City as part of my PhD. My reason for visiting was a kind of ice-breaking activity called the CUSP (Centre for Urban Science + Progress) Challenge Week.</description>
</item>
<item>
<title>SSSplit Improvements</title>
<link>https://brainsteam.co.uk/2015/07/15/sssplit-improvements/</link>
<pubDate>Wed, 15 Jul 2015 19:33:29 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/07/15/sssplit-improvements/</guid>
<description>Introduction As part of my continuing work on Partridge, Ive been working on improving the sentence splitting capability of SSSplit the component used to split academic papers from PLosOne and PubMedCentral into separate sentences.
Papers arrive in our system as big blocks of text with the occasional diagram, formula or diagram and in order to apply CoreSC annotations to the sentences we need to know where each sentence starts and ends.</description>
</item>
<item>
<title>Bedford Place Vintage Festival</title>
<link>https://brainsteam.co.uk/2015/06/28/bedford-place-vintage-festival/</link>
<pubDate>Sun, 28 Jun 2015 10:36:28 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/06/28/bedford-place-vintage-festival/</guid>
<description>Last week a bunch of my lindyhop group went and performed at the Bedford Place Vintage Festival in Southampton its an annual event that Ive been to twice now and we had an absolute ball.
I think I enjoyed it that much more this year purely because Ive been dancing twice as long now and I can hold my own on the social dance floor.
Heres a video of our crew performing the Shim Sham to “Mama do the hump”</description>
</item>
<item>
<title>Tidying up XML in one click</title>
<link>https://brainsteam.co.uk/2015/06/28/tidying-up-xml-in-one-click/</link>
<pubDate>Sun, 28 Jun 2015 10:24:33 +0000</pubDate>
<guid>https://brainsteam.co.uk/2015/06/28/tidying-up-xml-in-one-click/</guid>
<description>When Im working on Partridge and SAPIENTA, I find myself dealing with a lot of badly formatted XML. I used to manually run xmllint format against every file before opening it but that gets annoying very quickly (even if you have it saved in your bash history). So I decided to write a Nemo script that does it automatically for me.
#!/bin/sh for xmlfile in $NEMO_SCRIPT_SELECTED_FILE_PATHS; do if [[ $xmlfile == *.</description>
</item>
</channel>
</rss>