IlyaM is currently certified at Journeyer level.

Name: Ilya Martynov
Member since: 2003-02-27 11:31:21
Last Login: 2007-10-31 13:09:36

FOAF RDF Share This

Homepage: http://martynov.org/

Projects

Recent blog entries by IlyaM

Syndication: RSS 2.0

8 Apr 2009 »

Running Puppet on big scale

This is a rehash of my comment in slashdot discussion and my comment on Alexey Kovrygin's blog post.

We run Puppet on hundreds of servers in two datacenters and it was pain to get it working right. There are many issues which show up here and there: memory leaks in both client (puppetd) and server (puppetmaster), periodic lock ups and even file corruption. Besides it is quite slow. These problems are being slowly fixed with each new release but right now using Puppet for big installations is a source of constant problems. Unfortunately you do not notice these problems until you get many servers to manage; on smaller installations it seems to work without problems or at least they happen much less often to be noticable. In our case number of servers we managed increased slowly so we felt into the trap and now rely on Puppet too much and now it is too late to change. At the end we have managed to work around of most of issues in the Puppet we have met so combined with monitoring to catch problems it works good enough for us. On the other hand if I were to start from scratch I would evaluate something different for the project. Perhaps I would use Cfengine. It is not as flexible and nice as puppet but probably is more stable simply because it is much more old. I talked to people who used Cfengine on much bigger scale (thousands of servers) and they did not recall stability problems with it. In the long run Puppet will be probably ok too as it is being developed actively but right now I'd consider it to be in “beta” state. Or maybe even in "alpha".

For anyone interested in how to get Puppet work for real work load this is what we do:

  • We run Puppet under Apache+Mongrel. By default it runs using WEBrick what breaks easily under any moderate load. So we use Apache+Mongrel instead of it. Another benefit of using Apache is that you can run multiple backends. This helps if you have multi-core server for puppetmaster as by itself it can use only one core. Alternatively you can use Nginx+Mongrel or another other web server with proxying capabilities + Mongrel.
  • Because Puppet is slow we load balance it across two boxes in each datacenter.
  • We restart backends from time to time because they leak memory. We have a cron job to do this every 15 minutes (yes, it is that bad).
  • Puppetmaster has a cache which we saw to get corrupted sometimes. Our "fix" is to delete it before each restart. This might be fixed in later version: I've seen some closed bug reports which loooked relevant but we still have this cache clean up just in case.
  • We do not run puppet client as daemon. We run it as a cron job. Puppet client when run as daemon leaks memory and gets stuck from time to time. In our cron job we add random sleep before starting client to make sure requests do not hit server at the same time and overload it.
  • We never serve big files over puppet using its fileserver. Puppet does a number of stupid things with big files like say reading them into memory first before serving it to puppet client. If you need to distribute big files use other means (HTTP, FTP, NFS, etc).

Syndicated 2009-04-08 12:50:00 from Ilya Martynov's blog

25 Mar 2009 »

STOMP messaging for non-Java programmers on top of Apache ActiveMQ

Recently I was researching available options for messaging between Perl programs. In the past I had quite a lot of experience with Spread and I don't want to repeat. I hated Spread as it was buggy and unstable. So I looked into other alternatives: XMPP, STOMP and AMQP. AMQP has no Perl client so it was out. STOMP and XMPP are closely tied in my view but then STOMP looked simplier so I decided to go with STOMP. There is very good Perl client library for STOMP: Net::STOMP.

Then there is a choose of the server. This is quite an important choice and here is why: STOMP is theoretically language agnostic protocol but in reality you are very likely to depend on semantics of specific STOMP server implementation. For example like I mention below STOMP protocol doesn't really define any rules of message delivery.

There are several servers which support STOMP but Apache ActiveMQ looked to me like one of the most robust implementations. While Apache ActiveMQ supports a wide range of interfaces its design is centered around JMS and it helps to understand basic concepts of JMS even if you use STOMP only. This was a problem for me and as I don't really program in Java and all JMS concepts were alien to me. Moreover most of documentation on STOMP and ActiveMQ takes for granted that you know JMS basics.

So I'm recording all my finding on STOMP/ActiveMQ from viewpoint of non-Java programmer. I hope it might be helpful for other non-Java programmers. Word of warning: all below might be specific to Apache ActiveMQ implementation of STOMP server. I didn't bother to check other STOMP servers.

Basic model

As I mentioned earlier STOMP protocol by itself doesn't specify rules of message delivery. It is up to STOMP server to define the rules. This is where JMS API model becomes important as STOMP implementation is basically just a mapping of JMS model to non-Java specific protocol. Below is the short summary of API model which is relevant to STOMP clients (this is mostly based on my reading of JMS tutorial, STOMP protocol description and description of JMS extensions to STOMP).

There are two distinct ways to organize messaging:
  1. Use queues. If one message gets into queue, only one of subscribers gets it. If there are no subscribers then server stores the message until someone shows up.
  2. Use topics. For each message sent to the topic all active (i.e. connected) subscribers get a copy of it. Actually non-active subscribers can get a copy as well if they register their subscription as durable in advance. If there are no subscribers message gets lost.

How do use queues and topics in STOMP client? It is all controlled by destination you specify when subscribing to messages or sending messages. Destinations like /queue/* act as queues. Destinations like /topic/* act as topics.

There is also a concept of temporary queues and topics in JMS. The idea is that they are visible only to connection which creates them so that client might have private queues and topics. I'm not sure if this is exposed to STOMP clients at all. It might be - I haven't researched this as I don't need it in my application.

Control over reliability of messaging

JMS API gives you some control over reliability of messaging and at least some of it is exposed to STOMP layer.

Message acknowledgement: STOMP client on subscription tells if it acknowledges messages automatically or not. Automatic means that messages is considered delivered even if subscriber doesn't actually read it. I guess there are cases when it makes sense but I'd argue that default behavior should be opposite as for most applications it doesn't.

Message persistence: if STOMP server dies it either losses undelivered messages or it rereads them from some permanent storage. Message persistence controls this.

Message priority: in theory JMS provider tries to deliver higher-priority messages before lower-priority. In practice I have no idea - I didn't research how ActiveMQ implements this as it is not important for my application. Anyway this bit is exposed into STOMP protocol as well.

Message expiration: this defines for how long time server keeps undelivered messages.

Transactions: not sure about this one. Both JMS and STOMP support concept of transactions but I'm not sure what is the exact overlap. I might look into this later but for my application transactions are probably not important.

Configuring ActiveMQ as a STOMP server

Latest version (5.2) seems to support STOMP out of box without need for any additional configuration. As a quick test you can run the following program. It is just a copy&paste from Net::STOMP perldoc - I'm adding it here in case they change perldoc later:

# send a message to the queue 'foo'
use Net::Stomp;
my $stomp = Net::Stomp->new( { hostname => 'localhost', port => '61613' } );
$stomp->connect( { login => 'hello', passcode => 'there' } );
$stomp->send(
    { destination => '/queue/foo', body => 'test message' } );
$stomp->disconnect;

# subscribe to messages from the queue 'foo'
use Net::Stomp;
my $stomp = Net::Stomp->new( { hostname => 'localhost', port => '61613' } );
$stomp->connect( { login => 'hello', passcode => 'there' } );
$stomp->subscribe(
    {   destination             => '/queue/foo',
        'ack'                   => 'client',
        'activemq.prefetchSize' => 1
    }
);
while (1) {
  my $frame = $stomp->receive_frame;
  warn $frame->body; # do something here
  $stomp->ack( { frame => $frame } );
}
$stomp->disconnect;

Default installation doesn't seem to do any authorization so any login/passcode works.

Syndicated 2009-03-25 15:07:00 from Ilya Martynov's blog

18 Nov 2008 »

Erlang debugging tips

I've just started playing with Erlang so I have a lot to discover but so far I've found several things which help me to debug my programs:

  1. I tried to write my programs using OTP principles but the problem for me was that by default it often causes Erlang to hide most of the problems. The faultly process just get silently restarted by its supervisor or even worse - the whole application just exits with unclear "shutdown temporary" message. The solution is simple: start sasl application and it'll start logging all crashes. For development starting Erlang shell as erl -boot start_sasl does the trick.
  2. If you compile your modules with debug_info switch then you can use quite nifty visual debugger to step through your program. Quick howto: you open debugger window with Erlang console command im() and then you add modules for inspection via menu Module/Interpret. Then you can either add breakpoints manually or configure debugger to auto attach on one of conditions (say on first call). Instead of clicking menus you can also use Erlang console commands to control the debugger. See i:help().
  3. With command appmon:start() you can launch visual application monitor which shows all active applications. One particular useful thing is ability to click on application what shows a tree of processes it consist of. Then you can enable tracing of individual processes. When tracing is enabled it seems to be showing messages send or recieved by the traced process.

Syndicated 2008-11-17 11:44:00 from Ilya Martynov's blog

6 Dec 2007 »

STL strings vs C strings for parsing

I'm working on a project where I need to build custom high performance HTTP server. One piece of this server is a parser for URLs in incoming requests. It is very simple and on the first glance it shouldn't be that slow compared with other parts of the server. Yet it was taking quite a lot of CPU according to the profiler. The parser is using STL and basically does several string::find() calls to find parts of URL. So I thought maybe string::find() is too slow and decided to benchmark it against strchr(). This is my benchmark code:

#include <string.h>
#include <string>
#include <time.h>
#include <iostream>

using std::string;
using std::cout;

int main() {
const char* str1 = " a ";
const string& str2 = str1;

const unsigned long iterations = 500000000l;

{
clock_t start = clock();

for (unsigned long i = 0; i < iterations; ++i) {
char* pos = strchr(str1, 'a');
}

clock_t end = clock();
double totalTime = ((double) (end - start)) / CLOCKS_PER_SEC;
double iterTime = totalTime / iterations;
double rate = 1 / iterTime;

cout << "Total time: " << totalTime << " sec\n";
cout << "Iterations: " << iterations << " it\n";
cout << "Time per iteration: " << iterTime * 1000 << " msec\n";
cout << "Rate: " << rate << " it/sec\n";
}

{
clock_t start = clock();

for (unsigned long i = 0; i < iterations; ++i) {
string::size_type pos = str2.find('a');
}

clock_t end = clock();
double totalTime = ((double) (end - start)) / CLOCKS_PER_SEC;
double iterTime = totalTime / iterations;
double rate = 1 / iterTime;

cout << "Total time: " << totalTime << " sec\n";
cout << "Iterations: " << iterations << " it\n";
cout << "Time per iteration: " << iterTime * 1000 << " msec\n";
cout << "Rate: " << rate << " it/sec\n";
}
}

Turns out strchr is much faster as long as the benchmark code is compiled with optimizations on:

ilya@denmark:~$ g++ -O3 test.cc && ./a.out
Total time: 0 sec
Iterations: 500000000 it
Time per iteration: 0 msec
Rate: inf it/sec
Total time: 15.5 sec
Iterations: 500000000 it
Time per iteration: 3.1e-05 msec
Rate: 3.22581e+07 it/sec

ilya@denmark:~$ g++ -O2 test.cc && ./a.out
Total time: 0 sec
Iterations: 500000000 it
Time per iteration: 0 msec
Rate: inf it/sec
Total time: 15.76 sec
Iterations: 500000000 it
Time per iteration: 3.152e-05 msec
Rate: 3.17259e+07 it/sec

ilya@denmark:~$ g++ -O1 test.cc && ./a.out
Total time: 0 sec
Iterations: 500000000 it
Time per iteration: 0 msec
Rate: inf it/sec
Total time: 19.23 sec
Iterations: 500000000 it
Time per iteration: 3.846e-05 msec
Rate: 2.6001e+07 it/sec

ilya@denmark:~$ g++ -O0 test.cc && ./a.out
Total time: 18.64 sec
Iterations: 500000000 it
Time per iteration: 3.728e-05 msec
Rate: 2.6824e+07 it/sec
Total time: 16.89 sec
Iterations: 500000000 it
Time per iteration: 3.378e-05 msec
Rate: 2.96033e+07 it/sec

I checked the same code with callgrind and from call graph it looks like strchr() call was inlined while string::find() wasn't. It could be the reason for the difference in the performance. Maybe compiler is even smarter and optimized whole cycle with strchr() out. I'm not sure that the benchmark is completly fair. Anyway one thing is certain: I'll should try to rewrite my URL parser using strchr() and see if the real code is faster.

Syndicated 2007-12-06 13:06:00 from Ilya Martynov's blog

21 Sep 2007 »

Beyound XSS and SQL injections

What is common about HTML, XML and CSV files, SQL and LDAP queries, filenames and shell commands? All these things are based on text which is often generated by programs. And one commonly observed flaw in such programs is encoding rules are not being followed. These days many developers are aware about SQL injection and XSS problems as many books, online tutorials, blogs, coding standards, etc speak about them. Yet I'm not sure there is enough education so that developers use correct methods to protect their code from these problems. And besides this there is a lack of awareness that it is not just SQL and HTML. Definitely developers should think more broadly: if you generate programmatically any kind of text you must think about proper encoding of all data used in the generated text.

Talking about correct methods to secure code from text encoding related problems one my pet peeve is when people try to strip input data when they really should be thinking about protecting output. Nitesh Dhanjani covers this really well in his blog "Repeat After Me: Lack of Output Encoding Causes XSS Vulnerabilities". Quote:
The most common mistake committed by developers (and many security experts, I might add) is to treat XSS as an input validation problem. Therefore, I frequently come across situations where developers fix XSS problems by attempting to filter out meta-characters (<, >, /, “, ‘, etc). At times, if an exhaustive list of meta-characters is used, it does solve the problem, but it makes the application less friendly to the end user – a large set of characters are deemed forbidden. The correct approach to solving XSS problems is to ensure that every user supplied parameter is HTML Output Encoded
A good example of wrong approach is PHP's invention called magic quotes. I have mixed feelings about this thing. On one hand it was probably a good thing because so many web based software is developed by dilettantes so overall we are living in a slightly better world as magic quotes do somewhat limit damage from bad code. On the other hand it teaches bad habits while not fixing all problems in bad code. Also it causes everybody else to suffer. Good news is that they are getting rid of this abomination in PHP6.

Now let's take a look for some examples how not to generate text which I saw in real life. I'll skip HTML and SQL as this is well covered elsewhere and I'll take a look on other things I mentioned in the beginning of this article.

XML files: bad code which generates XML often shares similar problems as bad code which generates HTML - after all these two are closely related. But as XML is a more generic tool it is used in many domains other then web development where developers are not "blessed" with knowledge of XSS like problems. Moreover I noticed even web developers for some reason often consider XML to be something very different then HTML and suddenly forget they have to escape data. I'm especially amused when that many people are not aware that you cannot put arbitrary binary data in XML. You have to either encode it into text (base64 encoding is quite popular for this) or put it outside of the XML document.

CSV files: this format is still quite popular for exchange of tabular data between programs. Guess what? I've seen so many naive CSV producers and parsers that ignore reserved characters and which break later when these programs get real data. No, to write CSV file you cannot just do
print join ",", @columns
What if one of columns contains say "," (comma)?

LDAP queries: being text based query language it is a target of very similar problems as SQL. But while many developers are aware of SQL injection problem, not many are aware that you have exactly the same problem with LDAP queries too. Also it doesn't help that while nearly all SQL libraries provide tools to escape data in SQL queries it doesn't always seem to be the case for LDAP libraries. For example: PHP's LDAP extension - there is no API to escape data at all.

Using shell to execute commands: if you are running a command using system() in C, Perl, PHP or any other language and you are constructing the command from your data you again should treat this as a problem of proper encoding. The example below is from mozilla's source code
sprintf(cmd, "cp %s %s", orig_filename, dest_filename);
system(cmd);
Guess what happens if any of these filenames were not escaped for characters which are special for shell?

While I'm at this I'd mention that it is probably a good idea to avoid APIs which use shell to execute commands at all. Simply because shell programming is too hard to get right.

What would help a lot if tools would support developers better when writing correct code which deals with text based APIs. Sometimes it is just lack of documentation on encoding rules. For example a month ago I was learning Facebook APIs. One of the provided APIs is API to execute so called FQL queries. This is an SQL like query language and naturally I'd expect FQL injections to be covered in documentation. They don't, it is not even documented how to escape string data in FQL queries! I played with different queries in FQL console and it seems like standard SQL-like method (i.e. using "\" (backslash)) does work as an escape character in strings but why do I have to find this on my own? It is also shame when libraries built around text APIs do not provides means to properly encode data for used text formats. I mentioned one such example above: PHP's LDAP extension provides no functions to escape data for LDAP queries. How hard is it to add this? If you are creating text based APIs or libraries around such APIs it is your duty to help developers who will be using them. So do document encoding rules and do provide tools to automatically encode data!

Syndicated 2007-09-21 22:55:00 from Ilya Martynov's blog

6 older entries...

 

IlyaM certified others as follows:

  • IlyaM certified cwinters as Journeyer
  • IlyaM certified ask as Journeyer
  • IlyaM certified chromatic as Journeyer
  • IlyaM certified jmcnamara as Journeyer
  • IlyaM certified IlyaM as Journeyer
  • IlyaM certified pudge as Journeyer
  • IlyaM certified merlyn as Master
  • IlyaM certified japhy as Journeyer
  • IlyaM certified Simon as Master
  • IlyaM certified Spoon as Journeyer
  • IlyaM certified autarch as Journeyer
  • IlyaM certified petdance as Journeyer
  • IlyaM certified markjugg as Apprentice
  • IlyaM certified chaoticset as Apprentice
  • IlyaM certified adrianh as Journeyer

Others have certified IlyaM as follows:

  • dtucker certified IlyaM as Apprentice
  • IlyaM certified IlyaM as Journeyer
  • Spoon certified IlyaM as Journeyer
  • markjugg certified IlyaM as Journeyer
  • adrianh certified IlyaM as Master
  • chaoticset certified IlyaM as Journeyer
  • autarch certified IlyaM as Journeyer

[ Certification disabled because you're not logged in. ]

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page