× archivedFeb 2026Sound
MusicClustering
A self-healing scraper agent that walks DJ catalogs, plus the clustering that finds structure in what it collects.
What it is
A two-process agent for collecting tracklist data at scale and then making sense of it. One process iterates over DJs and parses their sets; a monitor process keeps it alive, restarting it on failure and rotating ProtonVPN exits when it gets rate-limited. The collected tracks are then embedded and clustered to surface the natural groupings — scenes, eras, sub-genres — hiding in the catalog.
Why two processes
Long scrapes fail in boring ways: a timeout, a blocked IP, a malformed page. Splitting the worker from a supervisor that owns liveness and resumability meant the thing could run unattended for hours and pick up exactly where it left off.