11 Dec 2012 sness   » (Journeyer)

How to process a million songs in 20 minutes « Music Machinery

How to process a million songs in 20 minutes « Music Machinery: "The recently released Million Song Dataset (MSD), a  collaborative project between The Echo Nest and Columbia’s LabROSA is a fantastic resource for music researchers. It contains detailed acoustic and contextual data for a million songs. However, getting started with the dataset can be a bit daunting. First of all, the dataset is huge (around 300 gb) which is more than most people want to download.  Second, it is such a big dataset that processing it in a traditional fashion, one track at a time, is going to take a long time.  Even if you can process a track in 100 milliseconds, it is still going to take over a day to process all of the tracks in the dataset.  Luckily there are some techniques such as Map/Reduce that make processing big data scalable over multiple CPUs.  In this post I shall describe how we can use Amazon’s Elastic Map Reduce to easily process the million song dataset."

'via Blog this'

Syndicated 2012-12-11 19:00:00 from sness

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!