Open Source on Brainsteam https://brainsteam.co.uk/categories/open-source/ Recent content in Open Source on Brainsteam Hugo -- gohugo.io en-us © James Ravenscroft 2020 Mon, 12 Apr 2021 20:21:11 +0000 An opinionated guide to Python environments in 2021 https://brainsteam.co.uk/2021/04/01/opinionated-guide-to-virtualenvs/ Mon, 12 Apr 2021 20:21:11 +0000 https://brainsteam.co.uk/2021/04/01/opinionated-guide-to-virtualenvs/ A person overwhelmed by boxes by Cottonbro Note: If you don’t want to read the blah-blah context and history stuff then you can jump to the recommendations The Problem The need for virtual python environments becomes fairly obvious early in most Python developers' careers when they switch between two projects and realise that they have incompatible dependences (e.g. project1 needs scikit-learn-0.21 and project2 needs scikit-learn-0.24). Unlike other mainstream languages like Javascript(Node. Reproducing 'ancient' experiments with Pytorch inside docker https://brainsteam.co.uk/2021/03/01/running-old-pytorch-docker/ Mon, 01 Mar 2021 20:21:11 +0000 https://brainsteam.co.uk/2021/03/01/running-old-pytorch-docker/ A beige analog compass by Ylanite Koppens Introduction Open machine learning research is undergoing something of a reproducibiltiy crisis. In fairness it’s not usually the authors' fault - or at least not entirely. We’re a fickle industry and the tools and frameworks were ‘in vogue’ and state of the art a couple of years ago are now obsolete. Furthermore, academics and open source contributors are under no obligation to keep their code up to date. Pickle 5 Madness with MLFlow and Python 3.6/3.7 https://brainsteam.co.uk/2021/01/14/pickle-5-madness-with-mlflow/ Thu, 14 Jan 2021 11:42:28 +0000 https://brainsteam.co.uk/2021/01/14/pickle-5-madness-with-mlflow/ A jar of pickles by Ksenia Charnaya I recently came across an infuriating problem where an MLFlow python model I had trained on one system using Python 3.6 would not load on another system with an identical version of Python. The exact problem was that when I ran mlflow models serve -m <url/to/model/in/bucket> the service would crash saying that the model could not be unserialized because ValueError: unsupported pickle protocol: 5. Serving NLP Models with MLflow https://brainsteam.co.uk/2020/12/29/serving-nlp-models-with-mlflow/ Tue, 29 Dec 2020 09:50:28 +0000 https://brainsteam.co.uk/2020/12/29/serving-nlp-models-with-mlflow/ MLFlow is a powerful open source MLOps platform with built in framework for serving your trained ML models as REST APIs. The REST framework will load data provided in a JSON or CSV format compatible with pandas and pass this directly into your model. This can be handy when your model is expecting a tabular list of numerical and categorical features. However it is less clear how to serve with models and pipelines that are expecting unstructured text data as their primary input. Why is Tmux crashing on start? https://brainsteam.co.uk/2018/11/07/why-is-tmux-crashing-on-start/ Wed, 07 Nov 2018 07:40:45 +0000 https://brainsteam.co.uk/2018/11/07/why-is-tmux-crashing-on-start/ I spent several hours trying to get to the bottom of why tmux was crashing as soon as I ran it on Fedora. It turns out there’s a simple fix. When tmux starts it uses /dev/ptmx to create a new TTY (virtual terminal) that the user can interact with. If your user does not have permission to access this device then tmux will silently die. A good way to verify this is to try running screen too. Programmatically Downloading Open Access Papers https://brainsteam.co.uk/2018/04/13/programmatically-downloading-open-access-papers/ Fri, 13 Apr 2018 16:04:47 +0000 https://brainsteam.co.uk/2018/04/13/programmatically-downloading-open-access-papers/ (Cover image “Unlocked” by Sean Hobson) If you’re an academic or you’ve got an interest in reading scientific papers, you’ve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. It’s ok if you’re affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes don’t work and you still can’t see the paper. Dialect Sensitive Topic Models https://brainsteam.co.uk/2017/07/25/dialect-sensitive-topic-models/ Tue, 25 Jul 2017 11:02:42 +0000 https://brainsteam.co.uk/2017/07/25/dialect-sensitive-topic-models/ As part of my PhD I’m currently interested in topic models that can take into account the dialect of the writing. That is, how can we build a model that can compare topics discussed in different dialectical styles, such as scientific papers versus newspaper articles. If you’re new to the concept of topic modelling then this article can give you a quick primer. Vanilla LDA A diagram of how latent variables in LDA model are connected Vanilla topic models such as Blei’s LDA are great but start to fall down when the wording around one particular concept varies too much. timetrack improvements https://brainsteam.co.uk/2016/12/10/timetrack-improvements/ Sat, 10 Dec 2016 09:33:41 +0000 https://brainsteam.co.uk/2016/12/10/timetrack-improvements/ I’ve just added a couple of improvements to timetrack that allow you to append to existing time recordings (either with an amount like 15m or using live to time additional minutes spent and append them). You can also remove entries using timetrack rm instead of remove – saving keystrokes is what programming is all about. You can find the updated code over at github. timetrack – a simple time tracking application for developers https://brainsteam.co.uk/2016/11/23/timetrack-a-simple-time-tracking-application-for-developers/ Wed, 23 Nov 2016 14:43:58 +0000 https://brainsteam.co.uk/2016/11/23/timetrack-a-simple-time-tracking-application-for-developers/ I’ve written a small command line application for tracking my time on my PhD and other projects. We use Harvest at Filament which is great if you’ve got a huge team and want the complexity (and of course license charges) of an online cloud solution for time tracking. If, like me, you’re just interested to see how much time you are spending on your different projects and you don’t have any requirement for fancy web interfaces or client billing, then timetrack might be for you.