Older blog entries for arauzo (starting at number 18)

24 Jul 2006 (updated 19 Jul 2008 at 23:37 UTC) »
Have you ever created thousands of files in /tmp?

We have created up to 2 million files:


arauzo@brain:/tmp $ ls | wc
2099630 2099630 43839565

It was by mistake. You know, that commented line that did not get uncommented, when it should have. I'm starting to think I do very strange things, and they are becomming very strong stress test for Linux.

Anyway, it is not so simple to delete a big bunch of files. The first you probably think of is:


arauzo@neuron2:/tmp$ rm *.net
-bash: /bin/rm: Argument list too long

Yes, I know it is long, but I NEED to remove those files. Let's try another thing:


arauzo@neuron2:/tmp$ for f in *.net; do rm -f $f; done
removed `mlp85_57_24-NI6MxY.net'
removed `mlp85_57_24-NbyBLS.net'
removed `mlp85_57_24-Nc7WVw.net'
...

Nice! This works for thousands of files. But now, what happends with our 2 million files?:


arauzo@brain:/tmp $ rm *.net
Connection to brain closed.
...
arauzo@brain:/tmp $ for f in *.net; do rm -f $f; done
Connection to brain closed.

It crashes! :-( Looks like a 'bug' on bash... :-?

Finally, we have managed to remove the 2 million files in groups by their prefix 32*.net 33*.net 34*.net ...

PD. A more intelligent solution (as it does not need to store the list of files anywhere), suggested by wtanaka and redi:


find /tmp -name "*.net" -print0 | xargs -0 rm -f

Can be 'simplyfied' to:


find -name "*.net" -exec rm -f \{\} \;

PD2. The simplification has an overhead of creating one process per file, while xargs creates a process for a group of files.

How to print short-edge duplex when short-edge duplex does not work in your printer

  1. Everything you print in Linux use to get the form of a postscript document. Just use "print to file" in your application, or pdf2ps and get your file.ps.

  2. Then you can convert your file.ps to another .ps with even pages turned upside-down with this command (using a4 paper, change size to fit your paper):

     pstops "2:0,1U(21cm,29.7cm)" file.ps >fileTurned.ps 

  3. You can see the results with ggv or print it with lp, xpp...
10 May 2006 (updated 10 May 2006 at 17:23 UTC) »

I do not have a blog in spanish, so I write this here though probably it is not interesting to those who do not read spanish.

On this Sunday, 14th of May there is an email convocated demostration on all mayor cities of Spain to ask for decent homes. Against the real estate bubble.

It is time to think, what we are asking for in these demos. I think we should ask for more fiscal help on taxes to those who buy a house to live in.

Demostrations are being been done in Europe against software patents. Among them, we have the photo demostration, where more than 2800 people appear, and the web demostration, where more than 320 sites are registered (and many others unregistered but following the demo).

We need more help. Are US citizens doing something to make people see how bad are software patents? Is anybody trying to change US law?

What do you expect to find, if you search for an "orange diagram"?

Wrong! You find my module to use SNNS neural networks from Orange data mining software. These are the strange things that you discover by looking at the web logs.

By the way, I forgot to post that this fantastic piece of software (OrangeSNNS) was available. ;-)

Late at night, waiting for my script to finish... Well, I did not have a more stupid thing to do, so I found that my weird quotient was 86 .

Long time without writing here... I hope I can avoid this in future. Now just a short note to break the ice.

I have found that blogs can be really useful. My today's personal discovery is a simple and fast implementation of the argmax function in python from Daniel Lemire's blog.

1 May 2004 (updated 1 May 2004 at 19:02 UTC) »

What I called host spam blocking lists are better known as Realtime Blackhole Lists (RBLs). Here there is a deeply studied text against RBL from someone that also had problems with RBLs.

There is also a public statement against RBL's from the Electronic Frontier Foundation (EFF), an organization that I trust. To know that they also agree on this with me makes me happy. ;-)

Do not use host spam blocking lists

Have you ever felt that you may be loosing incoming email? It is a bad feeling. Furthermore, if you think the nasty you may be, not answering those emails you never received, but they think you did.

Please, please, please, if you are a sys admin, please, do NOT use host blocking list. While their intention is good, their results may be awful.

There are much better ways to fight spam:

  1. Do not publish your email address in a web page. (Use the "user at provider.com" form)
  2. If you have a domain name, use something like spam shield, to avoid them taking your email from the registers.
  3. Mask email addresses in mailing list archives, if you are the admin. If you are not, avoid writing to a list which does not mask email addresses.
  4. If you are a web admin and email addresses should appear in a web page use SugarPlum
  5. Use safe operating systems like GNU/Linux, as in other unsafe systems there are virus collecting email addresses to spam.

I was looking for a python function to get the mean of a list. I thought there would be some simple statistical functions in the main distribution, but there are not. There are not YET. :-) There is an stat module already in the sandbox.

9 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!