brainsteam.co.uk/brainsteam/content/posts/2024/11/03-annomemo-telegram-bot.md

55 lines
4.6 KiB
Markdown
Raw Normal View History

2024-11-03 15:12:32 +00:00
---
2024-11-03 15:15:32 +00:00
date: 2024-11-03 15:11:22+00:00
2024-11-03 15:12:32 +00:00
description: short summary
2024-11-03 15:15:32 +00:00
draft: false
2024-11-03 15:12:32 +00:00
mp-syndicate-to:
2024-11-03 15:15:32 +00:00
- https://brid.gy/publish/mastodon
- https://brid.gy/publish/twitter
preview: /social/5bd70d94e7878eff876db94c35f6885650dd021de239a952395c1de0bf1e4e4e.png
2024-11-03 15:12:32 +00:00
tags:
2024-11-03 15:15:32 +00:00
- post
- ai
- llms
- nlp
- python
- softeng
title: Simplified Handwriting OCR with AnnoMemo
type: posts
url: /2024/11/3/03-annomemo-telegram-bot
2024-11-03 15:12:32 +00:00
---
Earlier this year [I wrote about using VLM models to do OCR on my terribly scribbly hand writing](https://brainsteam.co.uk/2024/04/02/finding-the-best-ai-powered-handwriting-ocr/). Models like GPT-4o are actually quite good at interpreting my rubbish writing and converting it to markdown. However, my workflow for using these models was a bit fiddly.
I have just finished an early version of AnnoMemo, a telegram bot that can receive images of handwritten notes and respond with a transcription of them. AnnoMemo also integrates with the popular memos app. It will automatically upload the photo of your hand written notes alongside the transcription as a new note and include a link to that note in its telegram response.
AnnoMemo is a portmanteau of Annotation and Memorandum.
## Motivation
I went through a phase of manually uploading photos to ChatGPT or [my self hosted LLM portal](https://brainsteam.co.uk/2024/07/08/ditch-that-chatgpt-subscription-moving-to-pay-as-you-go-ai-usage-with-open-web-ui/) and copying and pasting the resulting text into my notes app. There are a few friction points in this process including the need to take the photo with my phone's camera app before opening [Open Web UI](https://openwebui.com/) since it currently doesn't provide a way of launching the camera in-app. I also need to highlight and copy the response and paste it into my memos app of choice. Another fairly major annoyance is that if the OCR model does get some words wrong I have to go and find the page to make sure I remember what I actually wrote.
AnnoMemo simplifies this process by allowing me to simply open Telegram, take a photo in-app and send it. The bot takes care of the rest of the process including sending the image and the prompt to the model, sending a copy of the response and the initial input image to my memos instance and finally sending the transcription and link to the newly created memo back to me inside telegram.
## How I built it
It was actually very quick to get a prototype of AnnoMemo together. It only took me an afternoon. I used telegram's Python SDK and [LiteLLM's Python SDK](https://docs.litellm.ai/docs/set_keys) to manage calls to the models. Since [I host my own LiteLLM proxy server](https://brainsteam.co.uk/2024/07/08/ditch-that-chatgpt-subscription-moving-to-pay-as-you-go-ai-usage-with-open-web-ui/), I wanted AnnoMemo to integrate with that rather than having to pass API keys to it directly. The LiteLLM Python SDK provides some flexibility here since it can talk to it's proxy counterpart or directly to an external model depending on what environment variables are passed into it
## You can use it too
I'm not currently opening my telegram bot up for others to use since I'd end up paying lots of £££ to Anthropic and/or OpenAI for other people's annotations. That said I'm thinking about providing a hosted/managed version of my tool where you can have your notes transcribed for some nominal monthly fee (these APIs are currently very cheap so it'd probably be like £3-4/month). Let me know if you're interested.
For now though, you can get a copy of the project and instructions on how to set it up from my [git repository](https://github.com/ravenscroftj/annomemo/). You just need a Telegram bot API key, an OpenAI API key and somewhere to run the bot. I've provided a pre-built docker container and instructions for running the bot on "bare metal" if you'd prefer.
## Coming Soon
### Using Local VLMs
I've been doing some testing and I've actually found that [Qwen-2 VL 2B Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) can read my handwriting perfectly, in fact possibly better than Claude Sonnet and GPT-4o. This is a tiny model (comparatively speaking). Colour me impressed! I plan on baking in support for calling this model locally rather than having AnnoMemo call out directly to
### Other PKMS Integrations
I love memos but I also make extensive use of other Personal Knowledge Management Systems (PKMS) like Obsidian and Joplin. Therefore, I may end up providing integrations with those tools at some point too.
## Conclusion
2024-11-03 15:15:32 +00:00
I'm already finding AnnoMemo to be a very useful tool. Please let me know if you end up having a go and finding it useful or if you think you'd make use of a hosted/managed version in exchange for some small nominal monthly fee.