Older blog entries for remle (starting at number 13)

It's been a while since I've written. For those trying to reduce Spam, check out the company Habeas. It uses copyright law to fight spam. Good stuff. Basically it adds a copyrighted haiku header to your messages if you say you are not a spammer. Others can use those headers to filter good mail into one pile. Basically a white list. A different approach than those trying to identify spam, it simply identifies good mail instead.

Fun
groom checked out my site migratus and tried to email me about it, but I had blocked .fr along with a bunch of other domains one day when I was getting to much spam. Anyhow, being a clever fellow he was able to send me mail from another domain. I've removed the block on .fr.

This weekend I wrote a device driver for linux. It's very simple and doesn't actually talk to any devices. It simply increments a global variable by one every time it is read from. The device returns 2 integers. One integer is a generation, the other a sequence. The generation part should be incremented every time the module is loaded, while the sequence integer is incremented every time the device is read from. There are 2 devices one can read from. One device is meant for C programs that read the data into an integer. It will never return an 'EOF'. Therefore if you want 10 sequences, just read 10 x (2 * sizeof(int)). Then close it. The other device is meant for scripts and returns and ascii string in the form generation:sequence. It returns 'EOF' when just one sequence is read. Writing to the device sets the device to whatever value you write to it.

With axehind's prodding, I've been looking into openMosix. It's is an extremely simple system to setup. He wants to update the userland tools. I didn't want to start that till I understood the oMFS file system. Now that I do (after many questions to the list) I should be able to start some work on it. First thing is to get the kernel-headers RPM working properly.

Work
After much research, I've decided not to use Apache 2.0 as the basis for my next SMTP server. I'm convinced that async i/o is the way to go. Thanks raph for mentioning that topic in your diary. My SMTP server system will use ReiserFS for a filesystem and perhaps /dev/epoll for event notification. I'm not worried about portability yet. I more interested in raw performance and /dev/epoll seems the best way to go. I've requested that this be open source so others can join in the developement. I'll be posting snippets either way.

18 Jun 2002 (updated 18 Jun 2002 at 17:58 UTC) »
raph: Here's an idea. When someone uses a person tag, write a recent-log into the person's directory with the content being the author's name. Then add a 'recent citations' to the home page that key's off the authenicated user.

If any Advogato users are bird watchers, I have a bird watching site based on Advogato code. It's at www.migratus.com. There are some user interface issues that I need to work on, but any feedback would be nice.

It's been a while since I've posted. I'm ready to write another version of the smtp server I wrote for work, but I would like to use Apache 2.x as the underlying system. I haven't been able to convince work that this could should be an open source project. I'm the only hardcore C programmer here, so having others look at the code would be of tremendous help.

I've been playing with mod_virgule code. StevenRainwater has been very responsive about my needs. I even submitted a patch for a bug, but Steven found that the bug is much deeper and has posted an even bigger patch. I'm playing with this code too so I can use it for a birding site that I've been trying to do for years. I would be cool if anything could be rated (ie diary entries, articles, etc). However, it seems like that is going to take a while to do.

I bought 'The GNU C Library Reference Manual' 2 weeks ago and did a fast read of it. I like how glibc2 added dynamic buffer allocations for things like sprintf (ie asprintf). Also of interest was the hash and tree functions as well as argp (a getopt alternative). There are a bunch of useful functions in glibc2 that my linux system doesn't have man pages for, so the books (it is 2 volumes) are going to be very useful. Oh, one thing that interested me very much was that all the stream functions (ie fprint etc) do intristic locking so that those functions are thread safe. It seems you get that even if you aren't writing a threaded program. There are corresponding _unlocked functions if you don't want locking.

19 Feb 2002 (updated 19 Feb 2002 at 20:45 UTC) »

Seems like I only post when I need help!

I've been running my SMTP server for quite a while now (a month) and I've noticed the following behaviour. Every now and then a client closes the socket during the DATA phase. I catch that and discard the message. However, after re-reading the RFC, I get 2 conflicting ways to handle this:

one part of the rfc states: 4.1.1.5 RESET (RSET)

... There are circumstances, contrary to the intent of this specification, in which an SMTP server may receive an indication that the underlying TCP connection has been closed or reset. To preserve the robustness of the mail system, SMTP servers SHOULD be prepared for this condition and SHOULD treat it as if a QUIT had been received before the connection disappeared.

ok, that says treat it as though a quit happened, which just doesn't sound right to me. If DATA was issued and I'm waiting to see a CRLF.CRLF, and the client closes the connection, I would think that it would be better to assume the entire message hasn't been sent.
however later I see:

4.1.1.10 QUIT (QUIT)
... If the connection is closed prematurely due to violations of the above or system or network failure, the server MUST cancel any pending transaction, but not undo any previously completed transaction, and generally MUST act as if the command or transaction in progress had received a temporary error (i.e., a 4yz response).

which doesn't quite make sense to me because basically it sounds like this:
DATA -> oops! -> back to MAIL state -> return a 4xx code but wait, the socket is closed!

So does anybody have some words of wisdom?

I'm thinking the correct behaviour is to treat it as the following: client closed socket server treats it as a RSET and QUIT.

I'm reachable at jeff @ virtualbuilder dot com

I'm moving on up!

Been a while since I've posted. Basically I've been optimizing the code. I've replaced several write() calls with writev, trying to minimize system calls as much as possible. I'm now starting on better RFC conformance. For example I must accept postmaster without a domain for a recipient. Also been thinking of how to get managment to let me opensource the code.

I signed up to a bunch of yahoo lists to see how the MTA is. So far no memory leaks. I've had a few timeouts, so I now change the process title for each SMTP phase, much like sendmail does. I also had 60k messages sent to it, and it handled them in under an hour (~16/sec) with 50 children. Thing is most of the children were idle. 4 machines running sendmail fed it. I was expecting the machine to be overloaded, but it didn't even burp. The test machine is an old ibm pentium II 350MHz with 256MB ram.

Thanks to all who responded to my request for help. Looks like I need to learn how to read. My smtp server now works with sendmail. Sendmail has this weird loop detection based on the 2nd word of the response to helo/ehlo. I've fixed the code and it works. Now for some more testing.

Ok, I am now testing MTA interaction with my smtpd server. Sendmail barfs. Here is what I posted to comp.mail.sendmail:
Hi,
I've written a custom MTA and I testing interaction with it from other MTA's.
It seems sendmail (8.11.6 and whatever AOL is using) doesn't like my MTA, but Yahoo and Exim does. Here's a smtp log:

/usr/lib/sendmail -v jeff.test@somehost.e-dialog.com
Subject: testing sendmail

This is a test from sendmail . jeff.test@somehost.e-dialog.com... Connecting to somehost.e-dialog.com. via esmtp... 220 e-dialog smtpd server $Id: smtp.c,v 1.14 2001/12/18 19:56:38 me Exp $ >>> EHLO server1.somedomain.com 250 server1.somedomain.com Hello, how are you? somehost.e-dialog.com. config error: mail loops back to me (MX problem?) >>> QUIT 250 Good bye jeff.test@somehost.e-dialog.com... Local configuration error /home/jeff/dead.letter... Saved message in /home/jeff/dead.letter Closing connection to somehost.e-dialog.com.


I don't understand why I'm getting the MX error. That is printed out by sendmail, I'm not generating it in my SMTP server. The only thing I can think that may be tripping up sendmail is that I'm doing 3 writes, one for the 250, one for the server name, and one for the message "Hello, how are you?" when I send send a response to ehlo.
Thank in advance for any pointers.

Like the message states, any pointers would be great. You can send email to me at jeff+advogato@virtualbuilder.com.

It's been a while since I posted.

The smtpd server is coming along nicely. I've added a simple blocking list capability. It reads a file for bad internet addresses. The server now properly tell's it's children to shutdown, and the children will shut down once they are done with the current SMTP transaction.

I guess I should outline how this server works. It is very minimal and is meant for bulk processing. It's just a simple pre-forking server that understands SMTP. It writes all messages it receives to disk. It doesn't validate users. It accepts all mail it gets unless the source is in it's block list. Messages are written to disk and after that it returns OK. Prepended to each message are a couple of lines:

  • Forward-Path:
  • Return-Path:
  • Received:

Forward-Path reflects the value given to the RCPT command.

Anyhow, the server simply creates a new directory every minute and puts the message in a sub-directory of the current 'minute' directory. That sub-directory is the domain portion of the RCPT command. The message itself is in the form PIDSEQ. After the file is written, it's renamed PIDSEQ.msg. No locking required. This should also keep the number of message per directory manageable. I'm gambling that the server doesn't need to write more than 50 messages a second. So how does mail get to it's final destination? That's up to other programs.

jfleck: Personally that post did wonders for me. Why spend time agonizing about things when the code is going to evolve anyway. Some of my co- workers were horrified with it though. The other Unix guy besides me loved it. The others are Windows/ColdFusion weanies. :-)

4 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!