Evolving my personal music scrobbler

plexamp musicbrainz music php jellyfin navidrome laravel

I've nearly entirely rewritten my site over the past few months. First, I refactored the frontend into a Laravel application that leveraged the same postgREST endpoints that my long-running 11ty site used. Next, I wrote a new administrative application in Filament and migrated off of Directus. If I did this right, the changes went largely unnoticed. It made management of the site more seamless and it made writing dynamic frontend templates simpler by using a single, unified syntax.

I've been scrobbling my listening history to my own site for over a year now and it's gone well. It's also undergone significant change. I started it as an experiment and as part of an effort to own more of my own data. I've long been a fan of last.fm but it often felt minimally maintained and has now been acquired by new owners (as part of the Paramount sale) that, to be frank, I have absolutely no faith in to steward the platform, discard it or mine it for AI training data.

My scrobbling application started out by reading from and writing to massive JSON blobs in Netlify's blob storage. This was an approach and tool not suited to the task. I soon migrated the data to Supabase. Postgres, naturally, worked better for structured data. At this point, I was using Plex and Plexamp to listen to my music which, conveniently, emitted a scrobble event. I pointed this scrobble event at a Netlify edge function and wrote the listen to Postgres.

With data in Postgres, I would query Supabase's API during 11ty builds. This was a slow process as the amount of data and complexity of relationships caused queries to slow. Learning as I went, I wrote optimized views to significantly reduce the time these API calls took, improving build times in turn. I migrated from Supabase to a Postgres database I host and build times remained consistent.

At this point, the scrobbler looked like this:

A dedicated music page with my top artists, albums and track plays for the current week.
An identical page, but rendered with the data for the last month.
A page for each artist I scrobbled listens to.

Relatively simple, right? I had skipped pages for each album, knowing that it would dramatically increase the number of pages being built. But, build times still grew as artists were added. Each artist was a new page to write, each artist image was a new image for 11ty to optimize and a new discography of album images to optimize. Perfectly workable, but not sustainable over the long term.

I eventually moved my music listening to Jellyfin. This worked fairly well, but made scrobbling more difficult. Rather than a single scrobble event, Jellyfin would send playback ticks to a webhook (again, an edge function). I revised the receiving edge function to accept the playback ticks, perform some simple math and scrobble the track. But, without fail, it would stop sending events for longer songs.

Enter Navidrome. This is where I'm at currently and happily. It's reliable, performant and easy to manage. I've enjoyed it enough to write and maintain my own client. The catch with Navidrome was that it supports scrobbling to ListenBrainz and last.fm, but doesn't emit events to webhooks or custom endpoints. It does, however, have a dynamic frontend that makes API calls to the application backend. I poked around a bit and found a private API that returns exactly the data I need. It doesn't send events, I simply fetch the data from my instance regularly.¹

Enter Laravel. Rewriting the frontend of my site became necessary as the amount of data and rendered pages grew. It required a significant effort to write, migrate the hosting and develop a caching strategy. But, it solved the ever-growing build times. I'd started with a personal site and blog, hit constraints and grew it into an application.²

Now, with the build time constraint no longer a concern, I've added pages for each album. I already had album records which allowed me to create pages by writing a migration to add a slug field and programmatically populate it as /album-name. This is then appended to the artist slug and used to present the appropriate album as determined by the requested route.

Albums need a track list though. Up to this point I hadn't created a dedicated tracks table or incorporated track data into my scrobbler schema. Listens were connected to albums, albums to artists. Having album pages and wanting track lists meant that I could add static track lists and not attribute listens to tracks or I could try and either create them from my listen records (e.g. normalize song titles and match artist, album and listens to get unique tracks), but this wouldn't get me track numbers in the records, which doesn't make for much of a track list. Instead, I wrote a node script that went through every single release I have on Navidrome, wrote the necessary data to a new tracks table and connected them to the appropriate album.

Once I had tracks populated and linked, I wrote yet another script and associated listens to the appropriate track record. Out of roughly 40,000 records, I ended up with about 600 misses to correct. Some of these were malformed and discarded, while others were valid and were not linked due to differences in capitalization or another edge case. With listens linked to tracks, I was able to populate a play count field for each track based on the count of linked records. When a track is scrobbled, there's a Postgres function that updates the total plays attributed to a listen, album, artist and genre.

I've also written a bespoke Navidrome importer. When I log into Filament, I have a widget with buttons that let me post a status, manually fetch scrobbles, manually update play totals, re-deploy the site and import books, movies, shows, artists and upcoming episodes for shows I'm actively watching. Clicking to import an artist opens a modal with a single input for their Navidrome ID. I paste in that ID and, for new artists, it creates an artist record, all of their associated albums and all of their associated tracks. This data is guaranteed to match new scrobbles because it's sourced from where I'm scrobbling from. If it's an existing artist, it will only import new albums. If it's an upcoming album I don't have tracks for yet, it will import and attach those. After an import, I briefly verify the import data to make sure that artist and album art is correct.

If my importer happens to create a duplicate artist or album record (this happens occasionally with legacy records when my slugify method used to create artist and album keys creates a new key that doesn't match), I have album and artist correction fields in their respective edit views in Filament. I supply the key for the old record to the correction field in the new record and, when saved, a service will move all data and related records to the new album or artist.

If I've added an upcoming album that doesn't have a track list I can import, I'll take a look at MusicBrainz to see if the track list and durations are available there. If they are, I've written a MusicBrainz service that will import tracks from their API when provided with the appropriate relationships.

If a scrobble happens to not find a listen record, my scrobbler emails me via forwardemail.net's API with the data I need to identify and correct the record. This will happen if there's any missing data in the listen -> track -> album -> artist chain.

With tracks for each artist and each album, this site has roughly 5,000 pages dedicated to this scrobbler implementation.

Recently I, somewhat jokingly, posted a granular breakdown of my workflow for adding new music:

Buy music
Tag music
Add artist image to root (if new artist)
Export album covers to mp3 and flac folders
Copy album covers to screen saver directory
Use rclone to sync music to S3 storage
ssh into Navidrome server
Restart Navidrome + rclone mount to invalidate cache and initiate a scan
Import artist/album data into site from Navidrome after the scan
Verify art for artist + albums
Format/update/write artist bio
Play

This is accurate and also much quicker than you might expect. It's also much faster than the manual data entry that my earlier implementations of this entailed.

My initial scrobbler implementation was both naive and poorly designed. I've learned a lot as I've evolved this across new storage mechanisms, platforms, players and languages. What I have now is mature and robust. Imports are reliable. Scrobbling itself is reliable. Errors are reported automatically and easy to correct.

This also means that I own all of my music, play it from a server I control, using storage I control, via a client I've written and all of the data is stored on my own infrastructure. I author my own charts and can do as much or as little as I want with the data. I enjoy being able to view my own listening habits and run whatever granular queries I want. I've integrated concert tracking with artist pages, I've added support for tracking upcoming albums and exposed a calendar subscription.

It's my data, in my control, on my infrastructure and the freedom to do what I want with it.

This has the added benefit of persisting the data to be re-fetched should my consuming application experience an outage. ↩
I also track books I'm reading, movies and shows I'm watching and concerts attended here, but those are separate posts. ↩