Introducing Apache Mahout
Introducing Apache Mahout:'via Blog this'
Machine Learning, etc: Log loss or hinge loss?
Machine Learning, etc: Log loss or hinge loss?: "Hinge loss is less sensitive to exact probabilities. In particular, minimizer of hinge loss over probability densities will be a function that returns returns 1 over the region where true p(y=1|x) is greater than 0.5, and 0 otherwise. If we are fitting functions of the form above, then once hinge-loss minimizer attains the minimum, adding extra degrees of freedom will never increase approximation error.Pattern Recognition and Machine Learning (Information Science and Statistics): Christopher M. Bishop: 9780387310732: Amazon.com: Books
Pattern Recognition and Machine Learning (Information Science and Statistics): Christopher M. Bishop: 9780387310732: Amazon.com: Books: "This is the first textbook on pattern recognition to present the Bayesian viewpoint. The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible."Christopher M. Bishop
Christopher M. Bishop: "This leading textbook provides a comprehensive introduction to the fields of pattern recognition and machine learning. It is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. "iPhone USB / Bluetooth Tethering With Linux ~ Web Upd8: Ubuntu / Linux blog
iPhone USB / Bluetooth Tethering With Linux ~ Web Upd8: Ubuntu / Linux blog:CS6240: Parallel Data Processing in MapReduce
CS6240: Parallel Data Processing in MapReduce: "Pig; MapReduce design patterns Read the Pig paper. Read chapters 8 and 11 in the Tom White book."How to process a million songs in 20 minutes « Music Machinery
How to process a million songs in 20 minutes « Music Machinery: "The recently released Million Song Dataset (MSD), a collaborative project between The Echo Nest and Columbia’s LabROSA is a fantastic resource for music researchers. It contains detailed acoustic and contextual data for a million songs. However, getting started with the dataset can be a bit daunting. First of all, the dataset is huge (around 300 gb) which is more than most people want to download. Second, it is such a big dataset that processing it in a traditional fashion, one track at a time, is going to take a long time. Even if you can process a track in 100 milliseconds, it is still going to take over a day to process all of the tracks in the dataset. Luckily there are some techniques such as Map/Reduce that make processing big data scalable over multiple CPUs. In this post I shall describe how we can use Amazon’s Elastic Map Reduce to easily process the million song dataset."New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!