From 477fd5cfb130df8eb2523d3967577110dc18427b Mon Sep 17 00:00:00 2001 From: James Ravenscroft Date: Fri, 31 Dec 2021 16:30:40 +0000 Subject: [PATCH] add sapienta and partridge --- .../posts/2021/12/2021-12-31-retrospective/index.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/brainsteam/content/posts/2021/12/2021-12-31-retrospective/index.md b/brainsteam/content/posts/2021/12/2021-12-31-retrospective/index.md index 8ab4558..e9ca0a0 100644 --- a/brainsteam/content/posts/2021/12/2021-12-31-retrospective/index.md +++ b/brainsteam/content/posts/2021/12/2021-12-31-retrospective/index.md @@ -35,6 +35,13 @@ Co-reference resolution is basically knowing that the *he* in "James is an IT pr This work has applications in fact checking and understanding when news articles paraphrase scientific work which could change the meaning. +## ⚗️ Revitalising SAPIENTA & Partridge + +[SAPIENTA](http://www.sapientaproject.com/) was a project by my PhD superviser [Maria Liakata](https://www.turing.ac.uk/people/researchers/maria-liakata) which uses machine learning to identify different sections in a scientific paper (e.g. background, methodology, objectives, conclusions). I built my undergraduate degree project [Partridge](https://papro.org.uk/) on top of SAPIENTA. It was a sort of prototype [Semantic Scholar](semanticscholar.org/) that makes scientific papers searchable via their sections (technical name: Core Scientific Concepts) as identified via SAPIENTA. + +Earlier this year I took some time to get SAPIENTA and Partridge running again on a cheap VPS over at [OVH](https://ovh.com/). As part of this work I took some time to re-write the code that was previously written in Python 2 in Python 3 compatible syntax and modernised some of the processing pipelines (I replaced my home-grown XML-RPC-based background workers with [Dramatiq](https://dramatiq.io/) ). I also created a new [simplified command-line interface](https://github.com/ravenscroftj/sapientacli) for using SAPIENTA locally and you can also run the whole stack locally via a [docker image](https://hub.docker.com/r/ravenscroftj/sapienta) which is probably overkill for one or two papers but worthwhile for a large collection. SAPIENTA is available as an API [here](https://sapienta.papro.org.uk/) + +I've also rebuilt and modernised the backend of Partridge (although the frontend could do with some love) - an instance is running [here](https://beta.papro.org.uk/). ## 🏆 Winning Best KTP Award In September, my colleague Cynthia won the Best KTP Award for her collaboration with the University of Essex on CIELO - a tool that tries to train the best machine learning model via parameter optimisation or as she aptly writes - [it's like trying to bake the perfect cake](https://medium.com/filament-ai/baking-the-perfect-ml-model-d1ede84ce88b). As Cyn's team leader & manager I was excited to go along with her to the KTP Event at Essex in september and share in the glory but she did all of the hard work and rightly deserves the lion's share of the credit.