brainsteam.co.uk/brainsteam/content/posts/2022/01/debugging-bridgy-for-brainsteam.md at main

7.2 KiB

Raw Permalink Blame History

date

description

mp-syndicate-to

post_meta

preview

Intro to Brid.gy

Back when I started reading about the IndieWeb Movement and playing with micropub/microsub around christmas time I found bri.dgy, a service by Ryan Barret a.k.a snarfed which transparently links your micropub website with social media silos like twitter, mastodon, instagram and others, in order to facilitate both POSSE and backfeeding (i.e. PESOS) of content.

I found it really cool that I could publish a blog post, a note (or 'tweet' or 'toot' in the parlance of social media silos) or a photo on my site, have it automatically syndicated to where my existing friendship networks are and then have any reactions or comments pulled back to my site automatically. This has noticably increased the number of reactions my posts seem to cause by reducing the friction that is required to respond to them. I.e. instead of navigating to my site, creating an account, posting a comment on my page, ussers can just reply to my tweet/toot directly and their reply appears on my comments page.

The Problem

I noticed in December that when I posted a photo and tried to have brid.gy syndicate it to Mastodon, it errored with a 403 - Unauthorized. Weird since I granted brid.gy access to syndicate to all my accounts and there's no auth on anything on my website as I am using Hugo static site generator in combination with Git, a CI pipeline and some home brewed micropub software.

So I was puzzled, why was this happening? Then I remembered I'd seen this before when making requests to my website during development of my micropub software. I quickly opened up a python interpreter and tried making a request to my site via requests to confirm my suspicions.

Python 3.7.11 (default, Jul 27 2021, 14:32:16) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> requests.get("https://brainsteam.co.uk")
<Response [403]>

As I suspected... I realised that my hosting provider have a blocking rule set up on their web traffic load balancer that blocks requests sent with a User-Agent header containing python-requests. If I change my user-agent to something else I can happily retrieve content from my site

Python 3.7.11 (default, Jul 27 2021, 14:32:16) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> requests.get("https://brainsteam.co.uk", headers={"User-Agent":"BrainSteamBot (https://brainsteam.co.uk)"})
<Response [200]>
>>>

If you're developing an application that makes HTTP requests it is generally web etiquette to set this header to something more specific that identifies your app (so that, for example, a server admin could see that a lot of traffic is coming from your application and contact you to try and figure out why that might be). Some server admins (like the one who runs my shared hosting server) have blanket bans on applications that do not follow this etiquette as it may signal that the author has bad intent (e.g. some kind of malicious bot/scraper).

As a model web netizen himself, Ryan had set up brid.gy to identify itself using its own User-Agent of Bridgy (https://brid.gy/about) but for whatever reason, some part of the application was sending some requests to my server with a default user-agent and being kicked out. In this case, it seemed to be the code responsible for downloading the photo from my server and re-uploading it to Mastodon. I needed to track down what was broken and see if I could fix it.

Coming Up With a Fix... Sort Of...

So I did a bit of digging... Brid.gy relies on some other libraries developed by Ryan: granary which does conversion between common web data formats (e.g. convert from microformats markup to json or rss or something) and oauth-dropins, a nice set of "off the shelf" OAuth implementations for a number of popular websites like Twitter, Reddit, Facebook etc and finally webutil which provides common utility functions for making web requests from Python.

A lot of the outgoing HTTP requests made by brid.gy actually end up as calls to requests wrapped up in utility functions provided by the webutil library. Until now, there was no default User-Agent set so I added one webutil (https://github.com/snarfed/webutil/) and simulated sending a webmention of a photo to my mastodon account. Hey presto, it worked!

Next I spent some time adjusting the unit tests in oauth-dropins and granary which started failing because they were a bit shocked to suddenly see a new header in all of their HTTP requests.

I opened PRs for my fixes and within a couple of hours Ryan had already responded to them and merged my changes. I sent this post to Mastodon to verify that it was fixed and, boom, it worked!

So why 'Sort Of'?

The fix I submitted isn't perfect, in fact far from it. The part of brid.gy that was leaking HTTP requests that didn't have a user agent set are now making requests with the webutil value even though they're part of brid.gy and same for granary. Ideally these services would use the same user-agent for all requests. In order to make some of these fixes, it seems like code that deals with each of the social media silos specifically may need to be checked and updated which is a fairly hefty job. I'm hoping I can help with this effort if I get some spare time in the coming weeks.

Conclusion

So what did I get from this experience? Well it was a nice chance to dive deep into the codebase for something I'd consider to be key infrastructure for the IndieWeb movement. Of course selfishly, it's great that I can now seemlessly syndicate media posts across Mastodon and Twitter once again. Finally, it was really nice to have a chance to interact with Ryan who is a super nice and polite chap and to be able to give something back to the brid.gy project. Consistent user-agent strings are not exactly a sexy or particularly important feature but maybe it will help others who were experiencing mysterious 403 Unauthorized errors when sharing posts.

7.2 KiB Raw Permalink Blame History