Older blog entries for sness (starting at number 4694)

Logistic Regression

Logistic Regression: "Logistic Regression (SGD)
Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several predictor variables that may be either numerical or categories.

Logistic regression is the standard industry workhorse that underlies many production fraud detection and advertising quality and targeting products. The Mahout implementation uses Stochastic Gradient Descent (SGD) to all large training sets to be used.

For a more detailed analysis of the approach, have a look at the thesis of Paul Komarek:

http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en

See MAHOUT-228 for the main JIRA issue for SGD.

"

'via Blog this'

Syndicated 2012-12-01 00:59:00 from sness

Logistic

Logistic: "
In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.

Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.

le Cessie, S., van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics. 41(1):191-201."

'via Blog this'

Syndicated 2012-12-01 00:59:00 from sness

Logistic

Logistic: "Class for building and using a multinomial logistic regression model with a ridge estimator.

There are some modifications, however, compared to the paper of leCessie and van Houwelingen(1992):

If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix.
"

'via Blog this'

Syndicated 2012-12-01 00:02:00 from sness

WEKA - Convert from arff to csv from command line?

WEKA - Convert from arff to csv from command line?: " weka.core.converters.CSVSaver -i -o "

'via Blog this'

java -Xmx1500m -classpath /usr/share/java/weka.jar weka.core.converters.CSVSaver -i test.arff -o test.csv

Syndicated 2012-11-30 18:48:00 from sness

Getting Started

Getting Started: "/* local mode */
\$ pig -x local ...

/* mapreduce mode */
\$ pig ...
or
\$ pig -x mapreduce ..."

'via Blog this'

Syndicated 2012-11-30 18:41:00 from sness

Performing Data Science with HBase: Strata Conference + Hadoop World - O'Reilly Conferences, October 23 - 25, 2012, New York, NY

Performing Data Science with HBase: Strata Conference + Hadoop World - O'Reilly Conferences, October 23 - 25, 2012, New York, NY: "Regardless, large amounts of data – especially data about users intended for use in an online system such as an e-commerce site, gaming platform, or ad network – is stored in HBase, and data scientists must be able to perform investigative analysis on this information to better understand their business and improve these online processes. And the read/write model of HBase offers advantages over HDFS to the data scientist building complex analysis pipelines."

'via Blog this'

Syndicated 2012-11-29 23:30:00 from sness

Software Engineer, Data Infrastructure Engineering | Facebook Careers

Software Engineer, Data Infrastructure Engineering | Facebook Careers: "Facebook is seeking a Software Engineer to join the Data team. The ideal candidate will dream about distributed systems for the parallel processing of massive quantities of data, be familiar with Hadoop/Pig/HBase and MapReduce/Sawzall/Bigtable, and frequently think to themselves, 'Yeah, that works for 500 MB of data; what about 500 TB?' This position is full-time and based in our New York office."

'via Blog this'

Syndicated 2012-11-29 23:27:00 from sness

4685 older entries...