<description>A person overwhelmed by boxes by Cottonbro
Note: If you don&rsquo;t want to read the blah-blah context and history stuff then you can jump to the recommendations The Problem The need for virtual python environments becomes fairly obvious early in most Python developers' careers when they switch between two projects and realise that they have incompatible dependences (e.g. project1 needs scikit-learn-0.21 and project2 needs scikit-learn-0.24). Unlike other mainstream languages like Javascript(Node.</description>
</item>
<item>
<title>Reproducing 'ancient' experiments with Pytorch inside docker</title>
<description>A beige analog compass by Ylanite Koppens
Introduction Open machine learning research is undergoing something of a reproducibiltiy crisis. In fairness it&rsquo;s not usually the authors' fault - or at least not entirely. We&rsquo;re a fickle industry and the tools and frameworks were &lsquo;in vogue&rsquo; and state of the art a couple of years ago are now obsolete. Furthermore, academics and open source contributors are under no obligation to keep their code up to date.</description>
</item>
<item>
<title>Pickle 5 Madness with MLFlow and Python 3.6/3.7</title>
I recently came across an infuriating problem where an MLFlow python model I had trained on one system using Python 3.6 would not load on another system with an identical version of Python.
The exact problem was that when I ran mlflow models serve -m &lt;url/to/model/in/bucket&gt; the service would crash saying that the model could not be unserialized because ValueError: unsupported pickle protocol: 5.</description>
<description>MLFlow is a powerful open source MLOps platform with built in framework for serving your trained ML models as REST APIs. The REST framework will load data provided in a JSON or CSV format compatible with pandas and pass this directly into your model. This can be handy when your model is expecting a tabular list of numerical and categorical features. However it is less clear how to serve with models and pipelines that are expecting unstructured text data as their primary input.</description>
</item>
<item>
<title>DVC and Backblaze B2 for Reliable & Reproducible Data Science</title>
<description>Introduction When you’re working with large datasets, storing them in git alongside your source code is usually not an optimal solution. Git is famously, not really suited to large files and whilst general purpose solutions exist (Git LFS being perhaps the most famous and widely adopted solution), DVC is a powerful alternative that does not require a dedicated LFS server and can be used directly with a range of cloud storage systems as well as traditional NFS and SFTP-backed filestores all listed out here.</description>
</item>
<item>
<title>‘Dark’ Recommendation Engines: Algorithmic curation as part of a ‘healthy’ information diet.</title>
<description>In an ever-growing digital landscape filled with more content than a person can consume in their lifetime, recommendation engines are a blessing but can also be a a curse and understanding their strengths and weaknesses is a vital skill as part of a balanced media diet. If you remember when connecting to the internet involved a squawking modem and images that took 5 minutes to load then you probably discovered your favourite musician after hearing them on the radio, reading about them in NME being told about them by a friend.</description>
</item>
<item>
<title>PyTorch 1.X.X and Pipenv and Specific versions of CUDA</title>
<description>I recently ran into an issue where the newest version of Torch (as of writing 1.4.0) requires a newer version of CUDA/Nvidia Drivers than I have installed.
Last time I tried to upgrade my CUDA version it took me several hours/days so I didn’t really want to have to spend lots of time on that.
As it happens PyTorch has an archive of compiled python whl objects for different combinations of Python version (3.</description>
</item>
<item>
<title>How can AI practitioners reduce our carbon footprint?</title>
<description>In recent weeks and months the impending global climate catastrophe has been at the forefront of many peoples’ minds. Thanks to movements like Extinction Rebellion and high profile environmentalists like Greta Thunberg and David Attenborough as well as damning reports from the IPCC, it finally feels like momentum is building behind significant reduction of carbon emissions. That said, knowing how we can help on an individual level beyond driving and flying less still feels very overwhelming.</description>
</item>
<item>
<title>Why I’m excited about Kubernetes + Google Anthos: the Future of Enterprise AI deployment</title>
<description>Filament build and deploy enterprise AI applications on behalf of incumbent institutions in finance, biotech, facilities management and other sectors. James Ravenscroft, CTO at Filament, writes about the challenges of enterprise software deployment and the opportunities presented by Kubernetes and Google’s Anthos offering. It is a big myth that bringing a software package to market starts and ends with developers and testers. One of the most important, complex and time consuming parts of enterprise software projects is around packaging up the code and making it run across lots of different systems: commonly and affectionately termed “DevOps” in many organisations.</description>
</item>
<item>
<title>Spacy Link or “How not to keep downloading the same files over and over”</title>
<description>If you’re a frequent user of spacy and virtualenv you might well be all too familiar with the following:
python -m spacy download en_core_web_lg
Collecting en_core_web_lg==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz#egg=en_core_web_lg==2.0.0
5% |█▉ | 49.8MB 11.5MB/s eta 0:01:10 If you’re lucky and you have a decent internet connection then great, if not it’s time to make a cup of tea.
Even if your internet connection is good. Did you ever stop to look at how much disk space your python virtual environments were using up?</description>
<description>Looking back at some of the biggest AI and ML developments from 2018 and how they might influence applied AI in the coming year. 2018 was a pretty exciting year for AI developments. It’s true to say there is still a lot of hype in the space but it feels like people are beginning to really understand where AI can and can’t help them solve practical problems.
In this article we’ll take a look at some of the AI innovation that came out of academia and research teams in 2018 and how they might affect practical AI use cases in the coming year.</description>
</item>
<item>
<title>🤐🤐Can Bots Keep Secrets? The Future of Chatbot Security and Conversational “Hacks”</title>
<description>As adoption of chatbots and conversational interfaces continues to grow, how will businesses keep their brand safe and their customer’s data safer?
Fromdeliberate infiltration ofsystemstobugs that cause accidental data leakage, these days, the exposure or loss of personal data is a large part of what occupies almost every self-respecting CIO’s mind. Especially since the EU has just slapped its first defendant with a GDPR fine.
Over the last 10-15 years, through the rise of the “interactive” web and social media, many companies have learned the hard way about the importance of techniques likehashing passwords stored in databases andsanitising user input before it is used for querying databases.</description>
<description>I recently stumbled upon and fell in love with Gitea – a lightweight self-hosted Github and Gitlab alternative written in the Go programming language. One of my favourite things about it – other than the speed and efficiency that mean you can even run it on a raspberry pi – is the built in LFS support. For the unfamiliar, LFS is a protocol initially introduced by GitHub that allows users to version control large binary files – something that Git is traditionally pretty poor at.</description>
</item>
<item>
<title>Don’t forget your life jacket: the ‘dangers’ of diving in deep at the deep end with deep learning</title>
<description>Deep Learning is a powerful technology but you might want to try some &#8220;shallow&#8221; approaches before you dive in. Neural networks are made up of neurones and synapses It&#8217;s unquestionable that over the last decade, deep learning has changed machine learning landscape for the better. Deep Neural Networks (DNNs), first popularised by Yan LeCunn, Yoshua Bengio and Geoffrey Hinton, are a family of machine learning models that are capable of learning to see and categorise objects, predict stock market trends, understand written text and even play video games.</description>
</item>
<item>
<title>GPUs are not just for images any more…</title>
<description>As a machine learning professional specialising in computational linguistics (helping machines to extract meaning from human text), I have confused people on multiple occasions by suggesting that their document processing problem could be solved by neural networks trained using a Graphics Processing Unit (GPU). You’d be well within your rights to be confused. To the uninitiated what I just said was “Let’s solve this problem involving reading lots of text by building a system that runs on specialised computer chips designed specifically to render images at high speed”.</description>
</item>
<item>
<title>Re-using machine learning models and the “no free lunch” theorem</title>
<description>Why re-use machine learning models? Model re-use can be a huge cost saver when developing AI systems. But how well will your models perform in their new environment? You can get a lot of value out of training a machine learning model to solve a single use case, like predicting emotion in your customer chatbot transcripts and putting the angry ones through to real humans. However, you might be able to extract even more value out of your model by using it in more than one use case.</description>
</item>
<item>
<title>How I became a gopher over christmas</title>
<description>Happy new year to one and all. It’s been a while since I posted and life continues onwards at a crazy pace. I meant to publish this post just after Christmas but have only found time to sit down and write now.
If anyone is wondering what’s with the crazy title – a gopher is someone who practices the Go programming language (just as those who write in Python refer to themselves as pythonistas.</description>
<description>As the CTO for a London machine learning startup and a PhD student at Warwick Institute for the Science of Cities, to say I’m busy is an understatement. At any given point in time, my mind is awash with hundreds of ideas around Filament tech strategy, a cool app I’d like to build, ways to measure scientific impact, wondering what the name of that new song I heard on the radio was or some combination thereof.</description>
</item>
<item>
<title>AI can’t solve all our problems, but that doesn’t mean it isn’t intelligent</title>
<description>Thomas Hobbes, perhaps most famous for his thinking on western politics, was also thinking about how the human mind &#8220;computes things&#8221; 500 years ago. A recent opinion piece I read on Wired called for us to stop labelling our current specific machine learning models AI because they are not intelligent.I respectfully disagree.
AI is not a new concept. The idea that a computer could ‘think’ like a human and one day pass for a human has been around since Turing and even in some form long before him.</description>
</item>
<item>
<title>We need to talk about push notifications (and why I stopped wearing my smartwatch)</title>
<description>I own a Pebble Steel which I got for Christmas a couple of years ago. I’ve been very happy with it so far. I can control my music player from my wrist, get notifications and a summary of my calender. Recently, however I’ve stopped wearing it. The reason is that constant streams of notifications stress me out, interrupt my workflow and not wearing it makes me feel more calm and in control and allows me to be more productive.</description>
</item>
<item>
<title>The builder, the salesman and the property tycoon</title>
<description>A testament to marketers around the world is the myth that their AI platform X, Y or Z can solve all your problems with no effort. Perhaps it is this, combined with developers and data scientists often being hidden out of sight and out of mind that leads people to think this way.
Unfortunately, the truth of the matter is that ML and AI involve blood sweat and tears – especially if youare building things from scratch rather than using APIs.</description>
</item>
<item>
<title>#BlackgangPi – a Raspberry Pi Hack at Blackgang Chine</title>
<description>I was very excited to be invited along with some other IBMers to the Blackgang Pi event run by Dr Lucy Rogers on a semi regular basis at the Blackgang Chine theme park on the Isle of Wight.
Blackgang Chineis a theme park on the southern tip of the Isle of Wight and holds the title of oldest theme park in the United Kingdom. We were lucky enough to be invited along to help them modernise some of their animatronic exhibits, replacing some of the aging bespoke PCBs and controllers with Raspberry Pis running Node-RED and communicating using MQTT/Watson IOT.</description>
<description>EDIT: Hello readers, these articles are now 4 years old and many of the Watson services and APIs have moved or been changed. The concepts discussed in these articles are still relevant but I am working on 2nd editions of them.
Last timewe discussed some good practices for collecting data and then splitting it into test and train in order to create a ground truth for your machine learning system.</description>
</item>
<item>
<title>IBM Watson – It’s for data scientists too!</title>
<description>Last week, my colleague Olly and I gave a talk at a data science meetup on howIBM Watson can be used for data science applications.
We had an amazing time and got some really great feedback from the event. We will definitely be doing more talks at events like these in the near future so keep an eye out for us!
I will also be writing a little bit more about the experiment I did around Core Scientific Concepts and Watson Natural Language Classifier in a future blog post.</description>
</item>
<item>
<title>Cognitive Quality Assurance – An Introduction</title>
<description>EDIT: Hello readers, these articles are now 4 years old and many of the Watson services and APIs have moved or been changed. The concepts discussed in these articles are still relevant but I am working on 2nd editions of them.
This article has a slant towards the IBM Watson Developer Cloud Services but the principles and rules of thumb expressed here are applicable to most cognitive/machine learning problems.</description>
</item>
<item>
<title>Home automation with Raspberry Pi and Watson</title>
<description>I’ve recently been playing with trying to build a Watson powered home automation system using my Raspberry Pi and some other electronic bits that I have on hand.
There are already a lot of people doing work in this space. One of the most successful projects being JASPERwhich uses speech to text and an always on background listening microphone to talk to you and carry out actions when you ask it things in natural language like “What’s the weather going to be like tomorrow?</description>
<description>Introduction Being able to deal with typos and incorrect spellings is an absolute must in any modern search facility. Humans can be lazy and clumsy and I personally often search for things with incorrect terms due to my sausage fingers. In this article I will explain how to turn on spelling suggestions in retrieve and rank so that if your users ask your system for something with a clumsy query, you can suggest spelling fixes for them so that they can submit another, more fruitful question to the system.</description>
<description>Introduction Retrieve and Rank (R&amp;R), if you hadn’t already heard about it, is IBM Watson’s new web service component for information retrieval and question answering. My colleague Chris Madison has summarised how it works in a high level way here.
R&amp;R is based on the Apache SOLR search engine with a machine learning result ranking plugin that learns what answers are most relevant given an input query and presents them in the learnt “relevance” order.</description>
</item>
<item>
<title>Keynote at YDS 2015: Information Discovery, Partridge and Watson</title>
<description>Here is a recording of my recent keynotetalk on the power of Natural Language processing through Watson and my academic/PhD topic &#8211; Partridge &#8211; at York Doctoral Symposium. 0-11 minutes &#8211; history of mankind, invention and the acceleration of scientific progress (warming people to the idea that farming out your scientific reading to a computer is a much better idea than trying to read every paper written) 11-26 minutes &#8211; My personal academic work &#8211; scientific paper annotation and cognitive scientific research using NLP 26- 44 minutes &#8211; Watson &#8211; Jeopardy, MSK and Ecosystem 44 &#8211; 48 minutes Q&A on Watson and Partridge Please don’t cringe too much at my technical explanation of Watson – especially those of you who know much more about WEA and the original DeepQA setup than I do!</description>
<description>At the beginning of the month, I was lucky enough to spend a month embedded in the Watson Labs team in Austin, TX. These mysterious and enigmatic members of the Watson family have a super secret bat-cave known as “The Garage” located on the IBM Austin site – to which access is prohibited for normal IBMers unless accompanied by a labs team member.
During the week I was helping out with a couple of the internal projects but also got the chance to experiment with some of the new Watson Developer Cloud APIS to create some new tools for internal use.</description>
<description>Introduction As part of my continuing work on Partridge, I’ve been working on improving the sentence splitting capability of SSSplit – the component used to split academic papers from PLosOne and PubMedCentral into separate sentences.
Papers arrive in our system as big blocks of text with the occasional diagram, formula or diagram and in order to apply CoreSC annotations to the sentences we need to know where each sentence starts and ends.</description>
<description>When I’m working on Partridge and SAPIENTA, I find myself dealing with a lot of badly formatted XML. I used to manually run xmllint –format against every file before opening it but that gets annoying very quickly (even if you have it saved in your bash history). So I decided to write a Nemo script that does it automatically for me.
#!/bin/sh
for xmlfile in $NEMO_SCRIPT_SELECTED_FILE_PATHS; do
if [[ $xmlfile == *.</description>