106 lines
15 KiB
Markdown
106 lines
15 KiB
Markdown
---
|
||
date: 2021-12-31 12:03:11+00:00
|
||
description: As everyone else seems to at this time of year, I thought I would write
|
||
a quick post about how my year's gone
|
||
post_meta:
|
||
- date
|
||
preview: /social/4739e4305cec7d835da280ed2cc2775ec5ab85a39f574599bcb285cd9ad3e68b.png
|
||
resources:
|
||
- name: feature
|
||
src: images/wedding.jpg
|
||
tags:
|
||
- meta
|
||
- personal
|
||
title: "A personal retrospective of 2021 \U0001F492\U0001F4D4\U0001F916\U0001F579️\U0001F382"
|
||
type: posts
|
||
url: /2021/12/31/opinionated-guide-to-virtualenvs/
|
||
---
|
||
|
||
As everyone else seems to at this time of year, I thought I would write a quick post about how my year's gone. I will follow up with some ambitions for 2022 tomorrow.
|
||
|
||
|
||
|
||
## 💒 Getting Married
|
||
|
||
{{<figure src="images/wedding.jpg" caption="We got married">}}
|
||
|
||
My biggest personal achievement this year was getting married. My wife and I have been together since 2015 and we'd set our sights on a big traditional wedding. Due to COVID we realised that was unlikely to happen any time soon and it also made us reprioritise what we wanted to spend money on - big weddings are notoriously expensive. Instead, we opted for a small family wedding in [Winchester](https://www.visitwinchester.co.uk/) which was a lovely experience and meant we got to interact with all of our guests.
|
||
|
||
|
||
## 📔 Publishing at EACL
|
||
|
||
In January I found out that my paper [CD2CR: Co-reference Resolution Across Documents and Domains](https://arxiv.org/abs/2101.12637) had been accepted at EACL2021.
|
||
|
||
Co-reference resolution is basically knowing that the *he* in "James is an IT professional. *He* lives in England" refers to James. The whole premise of my work was that it allows you to resolve co-references between different types of documents. For example if a news article says "new species of dinosaur discovered" and it links to a scientific paper that says "we discovered Triceratops horridus in a fossil on the coast of Dorset" then the task would be to know that "dinosaur" and "Triceratops horridus" refer to the same thing.
|
||
|
||
This work has applications in fact checking and understanding when news articles paraphrase scientific work which could change the meaning.
|
||
|
||
## ⚗️🪶 Revitalising SAPIENTA & Partridge
|
||
|
||
[SAPIENTA](http://www.sapientaproject.com/) was a project by my PhD superviser [Maria Liakata](https://www.turing.ac.uk/people/researchers/maria-liakata) which uses machine learning to identify different sections in a scientific paper (e.g. background, methodology, objectives, conclusions). I built my undergraduate degree project [Partridge](https://papro.org.uk/) on top of SAPIENTA. It was a sort of prototype [Semantic Scholar](semanticscholar.org/) that makes scientific papers searchable via their sections (technical name: Core Scientific Concepts) as identified via SAPIENTA.
|
||
|
||
Earlier this year I took some time to get SAPIENTA and Partridge running again on a cheap VPS over at [OVH](https://ovh.com/). As part of this work I took some time to re-write the code that was previously written in Python 2 in Python 3 compatible syntax and modernised some of the processing pipelines (I replaced my home-grown XML-RPC-based background workers with [Dramatiq](https://dramatiq.io/) ). I also created a new [simplified command-line interface](https://github.com/ravenscroftj/sapientacli) for using SAPIENTA locally and you can also run the whole stack locally via a [docker image](https://hub.docker.com/r/ravenscroftj/sapienta) which is probably overkill for one or two papers but worthwhile for a large collection. SAPIENTA is available as an API [here](https://sapienta.papro.org.uk/)
|
||
|
||
I've also rebuilt and modernised the backend of Partridge (although the frontend could do with some love) - an instance is running [here](https://beta.papro.org.uk/).
|
||
## 🏆 Winning Best KTP Award
|
||
|
||
In September, my colleague Cynthia won the Best KTP Award for her collaboration with the University of Essex on CIELO - a tool that tries to train the best machine learning model via parameter optimisation or as she aptly writes - [it's like trying to bake the perfect cake](https://medium.com/filament-ai/baking-the-perfect-ml-model-d1ede84ce88b). As Cyn's team leader & manager I was excited to go along with her to the KTP Event at Essex in september and share in the glory but she did all of the hard work and rightly deserves the lion's share of the credit.
|
||
|
||
## 🤖 MLFlow Adoption & Python Environment Standards
|
||
|
||
At work I led the adoption of [MLFlow](https://mlflow.org/) for storing all of our machine learning experiments and results. This was a huge win in terms of productivity, reproducibility and transparency for the data science team as it means that we always know which models were trained, when, by whom, with which data, where that data is, what parameters were used and what performance was achieved. I [wrote a post about some of the challenges of using MLFlow with NLP models](/2020/12/29/serving-nlp-models-with-mlflow/) earlier in the year.
|
||
|
||
We've also adopted [DVC](https://dvc.org/) for tracking large data files (i.e. training data sets) without committing the data itself to git. This means that we know exactly which data was used for running a given script/model but that data is not clogging up our git repositories (which slows down checking projects out), it is secure (even if you have access to our git server, you also need credentials to access the data bucket) and access to the data is auditable in a pinch (we can use S3 buckets with paranoid logging). I also [wrote a little about using DVC with backblaze](/2020/11/27/dvc-and-backblaze-b2-for-reliable-reproducible-data-science/) which is something I do for personal projects and my PHD work at the end of last year. I've started using DVC for tracking and reproducing script runs as well but I've still got to write that up into a blog post and some guidelines for my team.
|
||
|
||
I also formalised some guidelines on best practices for Python development within the data science team at work. Python dependency management can be a real PITA. I've been doing Python dev since 2005 and things have really come on leaps and bounds in the last few years with the introduction of tools like [Poetry](https://python-poetry.org/) and [pipenv](https://pipenv.pypa.io/en/latest/). Earlier in the year I published [some of my thoughts](/2021/04/01/opinionated-guide-to-virtualenvs/) on how best to handle python environments and dependencies that we've now adopted within Filament.
|
||
|
||
## 🌳♻️ Environmental Efforts
|
||
|
||
I've been putting a lot more conscious effort into environmental stuff this year.
|
||
|
||
- Firstly I try to reduce what we consume by buying less stuff where possible and buying "eco-friendly" where possible. I've been using our local [refill and eco shop](https://allgoodthingseco.co.uk/) which opened this year for store cupboard staples and cleaning products. If you're in South Hampshire/Solent area I can't rate Nina and her shop highly enough.
|
||
- Our local council only collects cardboard and some types of plastic curbside but I've found local bins for different types of plastic in entrances to supermarkets now take all soft plastics including crisp packets and cat food pouches so I manually take them when I need to nip in to town for something.
|
||
- I try to be mindful about replacing/upgrading stuff - do I really need to do it or is what I have "good enough" already? I recently and reluctantly replaced my Pixel 3A because I was finding it sluggish and I didn't want to root/re-image it and endure lots of headaches with banking apps etc. My mum's had my 3A off me, factory reset it and is using it as her main phone so it won't end up as e-waste just yet.
|
||
- We had a go at growing our own food again. This year the harvest wasn't great but we got a few potatoes, onions and strawberries out of the garden.
|
||
|
||
## 📚🕹️📺 Entertainment
|
||
|
||
I've consumed a lot of books, TV shows and video games this year.
|
||
|
||
### 📚Reading
|
||
- The biggest chunk of reading I've done this year has been books from the [Malazan Book of the Fallen](https://en.wikipedia.org/wiki/Malazan_Book_of_the_Fallen) series - an epic high-fantasy series spanning 10 volumes. It's infamously pretty divisive in terms of its narrative style as but I love it. This year I've read books 4,5 and 6 and I'm about half way through volume 7.
|
||
- In January I finished Brandon Sanderson's latest Stormlight Archive offering: [Rhythm of War](https://en.wikipedia.org/wiki/Rhythm_of_War) which my wife got for me in signed hardback last christmas.
|
||
- In March I read Brandon Sanderson's [Warbreaker](https://www.goodreads.com/book/show/1268479.Warbreaker) - a standalone book within his bigger [Cosmere](https://stormlightarchive.fandom.com/wiki/Cosmere) universe.
|
||
- In April I read Brandon Sanderson's [Arcanum Unbounded](https://www.goodreads.com/book/show/28595941-arcanum-unbounded) - a collection of short stories set in the [Cosmere](https://stormlightarchive.fandom.com/wiki/Cosmere) universe. My favourite short story in the collection was Shadows for Silence in the Forests of Hell - it was a bit different to his usual writing style and it was a tense, thrilling read - I couldn't put the book down until I finished it.
|
||
|
||
I've read a couple of smaller non-fic books in between longer novels this year:
|
||
- [Notes on a nervous planet](https://www.goodreads.com/book/show/40404801-notes-on-a-nervous-planet) was a collection of optimistic, heart-warming notes, essays and stories from Matt Haig an amazing author who has [spoken openly and honestly about his anxiety and depression](https://www.theguardian.com/lifeandstyle/2018/nov/17/matt-haig-i-wanted-to-end-it-all-surviving-and-thriving-is-the-lesson-i-pass-on) and done a lot for mental health advocacy in the last few years.
|
||
- [Being The Change](https://peterkalmus.net/books/read-by-chapter-being-the-change/) - a book by NASA climate scientist [Peter Kalmus](https://twitter.com/ClimateHuman) about practical (and some less practical) steps we can take to reduce our carbon footprints
|
||
- [The Bullet Journal Method](https://bulletjournal.com/pages/book) by Ryder Carroll - the original handbook for how to the inventor of the Bullet Journalling notebook system uses his journal.
|
||
|
||
### 🕹️ Gaming
|
||
- I've discovered and played over 190 hours of [Dyson Sphere Program](https://store.steampowered.com/app/1366540/Dyson_Sphere_Program/) - a factory builder set in space with really pretty graphics, an inspiring and uplifting soundtrack and a peaceful stress-relieving gameplay.
|
||
- I've played about 40 hours of [Satisfactory](https://store.steampowered.com/app/526870/Satisfactory/) - a 3D factory builder with a beautiful 3D planet to explore and build factories and trains across.
|
||
- I've played about 18 hours of [The Ascent](https://store.steampowered.com/app/979690/The_Ascent) a top-down sci-fi shooter set on a dystopian space station where you're caught in the cross-fire between some squabbling mega-corporations.
|
||
- I've pumped a few hours into [Control](https://www.nintendo.co.uk/Games/Nintendo-Switch-download-software/Control-Ultimate-Edition-Cloud-Version-1864865.html) on my nintendo switch. It's a sci-fi/noir game where you're exploring a supernatural government facility that somehow feels like a cross between X-files and [SCP](http://scp-wiki.wikidot.com/). The Switch version of the game streams gameplay to your device from a cloud server which works surprisingly well and means that the full beauty of the game and its ray-tracing capabilities can be experienced on the switched without taxing it's hardware too much.
|
||
- I've played a few hours of [Dragon Age XI](https://www.nintendo.com/games/detail/dragon-quest-xi-s-echoes-of-an-elusive-age-definitive-edition-switch/) - the latest in the Dragon Age JRPG series. I did enjoy what I've played of it so far but I found it got a bit samey
|
||
|
||
## 📺 TV
|
||
|
||
Although this year has been a bit better than 2020 we did spend a lot of it locked down so there was plenty of opportunity to watch TV box sets. Some of my highlights were:
|
||
|
||
- [Upload](https://en.wikipedia.org/wiki/Upload_(TV_series)) - a sci-fi comedy/drama from [Greg Daniels](https://en.wikipedia.org/wiki/Greg_Daniels) of Office, Parks & Rec fame, about what it would be like if you could upload your consciousness to what is essentially an [MMORPG](https://en.wikipedia.org/wiki/Massively_multiplayer_online_role-playing_game) after you die and live forever in virtual reality. It's really well done and has some pseudo-political points to make about poor/rich divide. It reminded me a lot of the uncharacteristically uplifting Black Mirror episode [San Junipero](https://en.wikipedia.org/wiki/San_Junipero) which followed a similar premise.
|
||
- [Motherland](https://en.wikipedia.org/wiki/Motherland_(TV_series)) - a british sitcom about the perils middle-class motherhood in London. You don't have to be a parent to appreciate the humour - it's full of those oh-so-cringy, overtly british, passive-aggressive social interactions that many of us can relate to. The [UK Government COVID-19 spoof with the headlice](https://www.youtube.com/watch?v=4vYDufbCWXU) was spot on.
|
||
- [Ted Lasso](https://en.wikipedia.org/wiki/Ted_Lasso) - has received a lot of media attention as of late. It's basically about an American Football coach brought over to train a UK Football (soccer) team in an act of post-divorce sabotage by the former club-owner's wife who won ownership in the split. Weirdly you don't have to be a fan of football to appreciate the show (I'm not). I'd describe the show as aggressively wholesome in the sense that they force the warm and fuzzy feelings down your throat and you don't have a choice but to feel optimistic and happy whilst watching. [Football is life](https://youtu.be/KuM8VGvBIVk?t=35).
|
||
- [Taskmaster](https://en.wikipedia.org/wiki/Taskmaster_(TV_series)) - a comedy "game show" where contestants - usually celebrities or comedians - get recorded completing weird and wonderful tasks and then they all watch the footage back together in the studio and [Greg Davies](https://en.wikipedia.org/wiki/Greg_Davies) critiques them. It may sound like an odd premise for a show but it's highly entertaining. It's been on for a while but we only really discovered and got into it this year. There are some great highlights in [this video](https://www.youtube.com/watch?v=8osXVhoSelM)
|
||
|
||
# 🏠🚗🌴 Misc
|
||
|
||
- We've been working on the house a fair bit this year. We re-gravelled our driveway and replaced our rotten old decking in the garden with new composite decking made with [recycled plastic and reclaimed timber](https://uk.trex.com/why-trex/how-to-choose-a-deck/eco-friendly-decking/) that should last years and years with minimal maintainence and allowed us to recycle/mulch the old deck.
|
||
- We took a mini-break after our wedding in the summer during which we stayed at home but took a series of day trips to local eateries and attractions and even longleat zoo.
|
||
- We ended up going to a number of other weddings after COVID restrictions started to ease in the UK which was super fun and it was nice to not be in the hot-seat so soon after our own wedding.
|
||
- Not so much an achievement but I turned 30 this year. We were in lockdown on my birthday but outdoor attractions were open so we went to the zoo.
|
||
|
||
|
||
<a href="https://brid.gy/publish/twitter"></a>
|
||
<a href="https://brid.gy/publish/mastodon"></a> |