21 Nov 2012 sness   » (Journeyer)

File Format Integrations

File Format Integrations: "Importer 'bin/mahout' jobs
Run these with --help to see options

bin/mahout arff.vector
bin/mahout lucene.vector
bin/mahout seqdirectory
turns text files into sequence files, one file per key/value pair
bin/mahout SequenceFilesFromMailArchives
parses mailboxes and emits one text body per mail message
bin/mahout regexconverter
reads text lines and emits the regex output lines into SequenceFiles."

'via Blog this'

Syndicated 2012-11-21 19:53:00 from sness

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!