git-annex as a podcatcher
As a Sunday diversion, I wrote 150 lines of code and turned git-annex into a podcatcher!
I've been using hpodder, a podcatcher written in Haskell. But John Goerzen hasn't had time to maintain it, and it fell out of Debian a while ago. John suggested I maintain it, but I have not found the time, and it'd be another mass of code for me to learn and worry about.
Also, hpodder has some misfeatures common to the "podcatcher" genre:
- It has some kind of database of feeds and what files have been downloaded from them. And this requires an interface around adding feeds, removing feeds, changing urls, etc.
- Due to it using a database, there's no particularly good way to run it on the same feeds on multiple computers and sync the results in some way.
- It doesn't use
git annex addurlto register the url where a file came from, so when I check files in with git-annex after the fact they're missing that useful metadata and I can't just
git annex getthem to re-download them from the podcast.
So, here's a rethink of the podcatcher genre:
cd annex; git annex importfeed http://url/to/podcast http://another/podcast
There is no database of feeds at all. Although of course you can check a list of them right into the same git repository, next to the files it adds. git-annex already keeps track of urls associated with content, so it reuses that to know which urls it's already downloaded. So when you're done with a podcast file and delete it, it won't download it again.
This is a podcatcher that doesn't need to actually download podcast files!
--fast, it only records the existence of files in git,
git annex get will download them from the web (or perhaps from
a nearer location that git-annex knows about).
Took just 3 hours to write, and that's including full control over
the filenames it uses (
and automatic resuming of interrupted downloads. Most of what I needed
was already available in git-annex's utility libraries or Hackage.
Technically, the only part of this that was hard at all was efficiently querying the git repository for a list of all known urls. I found a pretty fast way to do it, but might add a local cache file later on.