# Older blog entries for sness (starting at number 4695)

Newton Institute Seminar : van Houwelingen, JC, 17/06/2008

Newton Institute Seminar : van Houwelingen, JC, 17/06/2008: "Global testing of association and/or predictability in regression problems with p>>n predictors"

'via Blog this'

Syndicated 2012-12-01 01:02:00 from sness

Logistic Regression

Logistic Regression: "Logistic Regression (SGD)
Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several predictor variables that may be either numerical or categories.

Logistic regression is the standard industry workhorse that underlies many production fraud detection and advertising quality and targeting products. The Mahout implementation uses Stochastic Gradient Descent (SGD) to all large training sets to be used.

For a more detailed analysis of the approach, have a look at the thesis of Paul Komarek:

http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en

See MAHOUT-228 for the main JIRA issue for SGD.

"

'via Blog this'

Syndicated 2012-12-01 00:59:00 from sness

Logistic

Logistic: "
In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.

Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.

le Cessie, S., van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics. 41(1):191-201."

'via Blog this'

Syndicated 2012-12-01 00:59:00 from sness

Logistic

Logistic: "Class for building and using a multinomial logistic regression model with a ridge estimator.

There are some modifications, however, compared to the paper of leCessie and van Houwelingen(1992):

If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix.
"

'via Blog this'

Syndicated 2012-12-01 00:02:00 from sness

WEKA - Convert from arff to csv from command line?

WEKA - Convert from arff to csv from command line?: " weka.core.converters.CSVSaver -i -o "

'via Blog this'

java -Xmx1500m -classpath /usr/share/java/weka.jar weka.core.converters.CSVSaver -i test.arff -o test.csv

Syndicated 2012-11-30 18:48:00 from sness

Getting Started

Getting Started: "/* local mode */
\$ pig -x local ...

/* mapreduce mode */
\$ pig ...
or
\$ pig -x mapreduce ..."

'via Blog this'

Syndicated 2012-11-30 18:41:00 from sness

Performing Data Science with HBase: Strata Conference + Hadoop World - O'Reilly Conferences, October 23 - 25, 2012, New York, NY

Performing Data Science with HBase: Strata Conference + Hadoop World - O'Reilly Conferences, October 23 - 25, 2012, New York, NY: "Regardless, large amounts of data – especially data about users intended for use in an online system such as an e-commerce site, gaming platform, or ad network – is stored in HBase, and data scientists must be able to perform investigative analysis on this information to better understand their business and improve these online processes. And the read/write model of HBase offers advantages over HDFS to the data scientist building complex analysis pipelines."

'via Blog this'

Syndicated 2012-11-29 23:30:00 from sness

4686 older entries...