# Older blog entries for sness (starting at number 4696)

RepyTutorial – Seattle

RepyTutorial – Seattle: "This guide provides an introduction to using the Repy sandbox environment. It describes what restrictions are placed upon the sandboxed code with examples. At the end of reading this document you should be able to write Repy programs, manage the restrictions on programs, and understand whether Repy is appropriate for a specific task or program.

It is assumed that you have a basic understanding of network programming such as socket, ports, IP addresses, and etc. Also, a basic understanding of HTML is useful but not required. Lastly, you need a basic understanding of the Python programming language. If not, you might want to first read through the Python tutorial at  http://www.python.org/doc/ or the python tutorial in this site. You do not need to be a Python expert to use Repy, but as Repy is a subset of Python, being able to write a simple Python program is essential.

"

'via Blog this'

Syndicated 2012-12-01 20:10:00 from sness

Newton Institute Seminar : van Houwelingen, JC, 17/06/2008

Newton Institute Seminar : van Houwelingen, JC, 17/06/2008: "Global testing of association and/or predictability in regression problems with p>>n predictors"

'via Blog this'

Syndicated 2012-12-01 01:02:00 from sness

Logistic Regression

Logistic Regression: "Logistic Regression (SGD)
Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several predictor variables that may be either numerical or categories.

Logistic regression is the standard industry workhorse that underlies many production fraud detection and advertising quality and targeting products. The Mahout implementation uses Stochastic Gradient Descent (SGD) to all large training sets to be used.

For a more detailed analysis of the approach, have a look at the thesis of Paul Komarek:

http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en

See MAHOUT-228 for the main JIRA issue for SGD.

"

'via Blog this'

Syndicated 2012-12-01 00:59:00 from sness

Logistic

Logistic: "
In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.

Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.

le Cessie, S., van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics. 41(1):191-201."

'via Blog this'

Syndicated 2012-12-01 00:59:00 from sness

Logistic

Logistic: "Class for building and using a multinomial logistic regression model with a ridge estimator.

There are some modifications, however, compared to the paper of leCessie and van Houwelingen(1992):

If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix.
"

'via Blog this'

Syndicated 2012-12-01 00:02:00 from sness

WEKA - Convert from arff to csv from command line?

WEKA - Convert from arff to csv from command line?: " weka.core.converters.CSVSaver -i -o "

'via Blog this'

java -Xmx1500m -classpath /usr/share/java/weka.jar weka.core.converters.CSVSaver -i test.arff -o test.csv

Syndicated 2012-11-30 18:48:00 from sness

Getting Started

Getting Started: "/* local mode */
\$ pig -x local ...

/* mapreduce mode */
\$ pig ...
or
\$ pig -x mapreduce ..."

'via Blog this'

Syndicated 2012-11-30 18:41:00 from sness

4687 older entries...