Older blog entries for jas (starting at number 8)

Real-world Performance Tuning with Callgrind

This post describe the process of identifying and profiling an inefficient part of GnuTLS. The tool I’m using is callgrind. I won’t describe the tool in detail since I’m not a callgrind expert, instead the focus is on the methodology in finding and fixing a problem. My hope is that this is useful as an insight to how maintainers go about fixing a performance-related problem. It also demonstrates how immensely useful tools like valgrind and callgrind are.

Today I stumbled over something as rare as a post that contains example code to reproduce a problem. The post was written by Edgar Fuß on the openldap list: link to edgars post. First, there is something to be learned here: if there hadn’t been code in that post, I would most likely have ignored it because it is too much trouble for me to understand and duplicate the problem. Especially when this isn’t a real GnuTLS bug report, but just discussion on a non-GnuTLS mailing list. By posting such example code, it was easy for me to compile and run it. As it turned out, the code was very slow and my curiosity peeked. Edgar’s bug triggering code was (somewhat modified for readability):

d = opendir("/etc/ssl/certs");
while ((dent = readdir(d)) != NULL) {
  sprintf(ca_file, "/etc/ssl/certs/%s", dent->d_name);
  stat(ca_file, &s);
  if (!S_ISREG(s.st_mode)) continue;
  gnutls_certificate_set_x509_trust_file(ca_list, ca_file,

If you aren’t familiar with GnuTLS, I’ll describe what the code is intended to do. The code will iterate over all files in /etc/ssl/certs/ and calls gnutls_certificate_set_x509_trust_file for each file. That function will read one or more X.509 CA certificates stored in the file, and add it to the CA trust store inside GnuTLS. The files are typically small (1-2kb) but contain base64 encoded ASN.1 data which is decoded by GnuTLS. The CA trust store is used to determine whether to trust a client or server’s certificate or not. While this is typically only done at startup, and not during each TLS connection, it is not particulary performance critical. Still, if it is excessively slow it will slow down application startup.

On my system (x86 Debian testing) the /etc/ssl/certs directory contains 206 files. I compiled the test code and ran it:

jas@mocca:~$ gcc -o foo foo.c -lgnutls
jas@mocca:~$ time ./foo /etc/ssl/certs/

real 0m40.821s
user 0m40.743s
sys 0m0.036s

40 seconds! Delaying startup of an application by that amount is pretty significant, so I understand this was considered a problem.

I became curious how much memory the process was using. If my machine had started paging (unlikely with 2GB but you never know), I could understand if it was this slow. However, the top output was relatively stable:

6538 jas 20 0 5548 3668 752 …

In other words, the virtual size was about 6MB, which seemed normal. My next step was to run the binary under valgrind, to possibly detect any memory corruption or other problem that might explain the slowness. Valgrind slows down execution significantly, and after around 7 minutes I gave up and modified the code slightly so that it only iterated over the first couple of files. After running valgrind once, I discovered that the code didn’t call the needed gnutls_certificate_free_credentials() and gnutls_global_deinit(). You can download the updated example code gnutls-callgrind.c. The valgrind suppressions file to shut up some known libgcrypt internal memory leaks is available as libgcrypt.supp.

jas@mocca:~$ gcc -o foo foo.c -lgnutls
jas@mocca:~$ time ./foo

real 0m0.222s
user 0m0.216s
sys 0m0.004s
jas@mocca:~$ time valgrind –suppressions=foo.supp –quiet ./foo

real 0m7.619s
user 0m7.488s
sys 0m0.108s

Nice! No memory leaks. We can now be relatively certain that the problem is really with the intention of the code, rather than some memory related bug. Given that we use C here, you want to rule out such problems early on because they are a nightmare to debug.

Now for the performance tuning session. Let’s run callgrind on the application. First, we must make sure we use a GnuTLS compiled with debugging information. The easiest way to do this is to compile it against the static libgnutls. Proceed as follows.

jas@mocca:~$ gcc -o foo foo.c /usr/lib/libgnutls.a -lgcrypt -ltasn1 -lz
jas@mocca:~$ time valgrind –tool=callgrind –quiet ./foo

real 0m18.292s
user 0m18.129s
sys 0m0.084s
jas@mocca:~$ ls -la callgrind.out.8164
-rw——- 1 jas jas 69654 2008-02-26 21:46 callgrind.out.8164
jas@mocca:~$ kcachegrind &

As you can see, execution time is even slower than with valgrind. The profile data output file is quite small. Running kcachegrind yields this output (click to enlarge):

The amount of information in kcachegrind can be overwhelming at first. However, the interesting values are in the percentages and call counts columns to the left. It tells us that gnutls_certificate_set_x509_trust_file() was invoked 22 times, and that 98% of the time is spent there. (If you are surprised by the 22, look at the code, the variable i starts at 0, and the loop is run until it is larger than 20, i.e., it is 21, which makes for 22 iterations in the loop.) The code for that function looks like:

  int ret, ret2;
  size_t size;
  char *data = read_binary_file (cafile, &size);

  if (data == NULL)
      gnutls_assert ();
      return GNUTLS_E_FILE_ERROR;

  if (type == GNUTLS_X509_FMT_DER)
    ret = parse_der_ca_mem (&res->x509_ca_list,
			    data, size);
    ret = parse_pem_ca_mem (&res->x509_ca_list,
			    data, size);

  free (data);

  if (ret 

The callgrind output tells us that 96% of the programs time is spent inside the 22 calls to generate_rdn_seq(), which in turn call gnutls_x509_crt_get_raw_dn a total of 506 times. The calls to gnutls_x509_crt_get_raw_dn make up 95% of the program’s execution time.

We can now conclude that the problem is inside the generate_rdn_seq() function, and not in the rest of the gnutls_certificate_set_x509_trust_file() function body. Given that 95% of the time is actually in a function that generate_rdn_seq calls (i.e., gnutls_x509_crt_get_raw_dn), the problem is either that the function is called too many times or that it is too slow.

Some words about what generate_rdn_seq() is intended to do: it pre-computes a list of names of the CA certificates. In TLS servers, this list of names of trusted CAs is sent to clients. The list is used by the client to find a client certificate issued by a CA that the server recognize. To avoid computing the list every time it is needed (i.e., for every connection), GnuTLS pre-computes this and store it in the credential structure. One credential structure can be associated with one or more TLS sessions.

We are now prepared to look at the code for generate_rdn_seq:

static int
generate_rdn_seq (gnutls_certificate_credentials_t res)
  gnutls_datum_t tmp;
  int ret;
  unsigned size, i;
  opaque *pdata;

  /* Generate the RDN sequence
   * This will be sent to clients when a certificate
   * request message is sent.

  /* FIXME: in case of a client it is not needed
   * to do that. This would save time and memory.
   * However we don't have that information available
   * here.

  size = 0;
  for (i = 0; i x509_ncas; i++)
      if ((ret = gnutls_x509_crt_get_raw_dn
                      (res->x509_ca_list[i], &tmp)) x509_rdn_sequence.data != NULL)
    gnutls_free (res->x509_rdn_sequence.data);

  res->x509_rdn_sequence.data =
          gnutls_malloc (size);
  if (res->x509_rdn_sequence.data == NULL)
      gnutls_assert ();
  res->x509_rdn_sequence.size = size;

  pdata = res->x509_rdn_sequence.data;

  for (i = 0; i x509_ncas; i++)
      if ((ret = gnutls_x509_crt_get_raw_dn
                       (res->x509_ca_list[i], &tmp)) x509_rdn_sequence);
	  gnutls_assert ();
	  return ret;

      _gnutls_write_datum16 (pdata, tmp);
      pdata += (2 + tmp.size);
      _gnutls_free_datum (&tmp);

  return 0;

Take some time to read and understand this code. Really. I’ll be here waiting to explain it when you are finished.

The res->x509_ca_list variable contains the entire list of CA’s stored in memory so far. It is initially empty. After reading the first file, it will contain one certificate. After reading the second file, it will contain two certificates, and so on.

What is happening here is that FOR EVERY certificate to add, the entire list of certificates is iterated, not just once but twice! The computational complexity of invoking the function is O(2*n^2) where n is the number of certificate names to be added. The first iteration is to compute the size of the string to hold the CA names. The actual data is discarded. The second iteration calls the same function, for the same data, but now store the output in the appropriate place.

The first step to optimize this is to realize that you don’t need to iterate through all CAs every time a new one has been added. Adding the name for the most recently added certificate should be sufficient. Since more than one CA can be added at each time, the function needs to take another parameter: the number of recently added CAs for which to pre-compute the names for.

Our new code will look like:

static int
add_new_crt_to_rdn_seq (gnutls_certificate_credentials_t res, int new)
  gnutls_datum_t tmp;
  int ret;
  size_t newsize;
  unsigned char *newdata;
  unsigned i;

  /* Add DN of the last added CAs to the RDN sequence
   * This will be sent to clients when a certificate
   * request message is sent.

  /* FIXME: in case of a client it is not needed
   * to do that. This would save time and memory.
   * However we don't have that information available
   * here.
   * Further, this function is now much more efficient,
   * so optimizing that is less important.

  for (i = res->x509_ncas - new; i x509_ncas; i++)
      if ((ret = gnutls_x509_crt_get_raw_dn
                        (res->x509_ca_list[i], &tmp)) x509_rdn_sequence.size +
                       2 + tmp.size;
      if (newsize x509_rdn_sequence.size)
	  gnutls_assert ();
	  _gnutls_free_datum (&tmp);

      newdata = gnutls_realloc
                        (res->x509_rdn_sequence.data, newsize);
      if (newdata == NULL)
	  gnutls_assert ();
	  _gnutls_free_datum (&tmp);

      _gnutls_write_datum16 (newdata +
            res->x509_rdn_sequence.size, tmp);
      _gnutls_free_datum (&tmp);

      res->x509_rdn_sequence.size = newsize;
      res->x509_rdn_sequence.data = newdata;

  return 0;

I changed the function name, to reflect that it now just update the list of names for a set of recently added certificates. That also helps to find and update all callers of the old function.

As you can see, this function should be of complexity O(n) instead, since it just adds the name of the most recently added CA’s.

But is it faster in practice? Let’s check it! First, remove the if (i++ > 20) break; statement in the test code, so that we do the full test. First link against our old libgnutls.a and run the old test case again, for illustrative purposes.

jas@mocca:~$ gcc -o foo foo.c /usr/lib/libgnutls.a -lgcrypt -ltasn1 -lz
jas@mocca:~$ time ./foo

real 0m40.756s
user 0m40.679s
sys 0m0.052s

Again, it takes around 40 seconds. Now recompile against our modified libgnutls and run the same test:

jas@mocca:~$ gcc -o foo foo.c /home/jas/src/gnutls/lib/.libs/libgnutls.a -lgcrypt -ltasn1 -lz
jas@mocca:~$ time ./foo

real 0m0.288s
user 0m0.280s
sys 0m0.012s

Wow! Down from 40 seconds to 0.3 seconds. Let’s take a look at the callgrind output now.

The code now spends around 60 % of the time in add_new_crt_to_rdn_seq() which seems reasonable. There are 206 files in /etc/ssl/certs, which explains the call count. Some of the files actually contain several CA certificates (in particular /etc/ssl/certs/ca-certificates.crt contains a lot of certificates), which explains why gnutls_x509_crt_get_raw_dn is called 408 times.

At this point, I stopped optimizing the code further.

Syndicated 2008-02-26 23:17:45 from Simon Josefsson's blog

IDNA flaws with regard to U+2024

In a bug report against libidn, Erik van der Poel gives an example of an internationalized domain name that is handled differently by different implementation. Another example of one such string is:

‘räksmörgås’ U+2024 ‘com’

If your browser supports Unicode, the string is: räksmörgås․com. Use cut’n'paste of the string into your browser and see what it tries to lookup (please let me know what you notice!).

The problem with this string is that it is on the form “[non-ASCII][DOT-Like code point]com”. Here ‘räksmörgås’ represents the non-ASCII string, which can be any non-ASCII string. Further, the U+2024 represent one character which looks like a dot, there are others that also contain dot-like characters.

The IDNA algorithm (section 3.1) implies that applications should treat the string as one label. The U+2024 character is not one of the dot-like characters that needs to be treated as a label separator. The ASCII string which is output after applying the IDNA algorithm is:


Note that the string contains an ASCII dot ‘.’ (0×0E). If applications are not careful how they resolv the name in the DNS, they will request information in a non-existing top-level domain ‘com-l8as9u’. This is because the DNS do not use ‘.’ to separate labels, but instead uses a length-value pair for each label. Thus the wrong string to lookup would be:


Whereas the right string to lookup would be:


Using DNS master file syntax, the name to lookup is xn--rksmrgs\.com-l8as9u.

What’s interesting here is that some implementations, such as Microsoft Internet Explorer and Firefox implements IDNA not according to the standard. Instead, they compute the following string:


Arguable, this is a better approach than what is specified by RFC 3490. MSIE/Firefox recognize that U+2024 is a “dot-like” character, by using NFKC. What is debatable is whether U+2024 will actually occur in practice, Unicode expert Kenneth Whistler says U+2024 will not be entered accidentally.

As the maintainer of GNU Libidn, I’m not yet sure about what to do about the situation. The conservative approach is to do nothing until the RFCs are updated. I have come up with a patch to add a new IDNA flag that treat U+2024 as a dot-like character early on. This would at least make it possible to produce the same (RFC non-conforming) output that MSIE/Firefox computes.

Syndicated 2008-01-14 15:38:27 from Simon Josefsson's blog

PAM module for Yubico

During the autumn, in Yubico, we have been working on a PAM module for the Yubikey. It allows you to use the Yubikey to login to your machine, to unlock the screensaver, and so on. I decided to let Google Code host this project, which is the first time I’ve used them. It will be interesting to see how working with their site is going to turn out.

ObLink: code.google.com/p/yubico-pam/

You can buy Yubikeys on our web shop. If you have an interesting idea about what can be done with the key, let me know and I may be able to arrange a good deal for you. :)

Syndicated 2008-01-14 10:39:22 from Simon Josefsson's blog

Response to GnuTLS in Exim Debate

Marc Haber blogs about GnuTLS in Exim4, and it suggests there is a long list of technical issues in GnuTLS. Given my involvement in GnuTLS, I decided to analyze each bug to see what we can learn and possibly improve.

I looked at the all bugs tagged with gnutls in the exim4 bug tracker. My impression is that Marc Haber has done a very good job as Exim4 maintainer in dealing with these GnuTLS related problems. Some of the frustration seems to be because submitters don’t respond to questions. Also it seems different problems are discussed at the same time, which makes it very difficult to help isolate and solve the problem. The only serious problem I’ve identified is the entropy depletion problem, and the GnuTLS team will try to address it. To me, the concern seems more of a volunteer time issue than a technical one.

Quick Summary

Bug #348046 is so complex that I cannot judge it. If the submitters are willing, it may be best to re-submit each problem separately. The problem with TheBat is interesting, but given the non-free status of TheBat and no other reports, it doesn’t seem serious. To reduce entropy consumption is something we should work on, but it is a ‘wishlist’ kind of bug, and to some extent may have already been solved by removing the DH generation code which depleats the entropy pool quickly. The rest appears to be already solved or should be tagged as ‘wontfix’.

Bug #316522

When the email client TheBat talks with exim4 4.50-8, gnutls (in exim4) will log (gnutls_handshake): An error was encountered at the TLS Finished packet calculation. Other clients than TheBat reportedly works. An older version of Exim4, specifically 4.32-2, worked though. It is unclear whether the version of GnuTLS changed when exim4 was upgraded from 4.32-2 to 4.50-8. There is no discussion of whether changing to OpenSSL would solve the problem.

Conclusion: The problem with TheBat warrants debugging. However, this do not seem to be a widely reported problem. Further, TheBat is not free software so we cannot help debug it.

Questions: The reported said earlier versions worked, which GnuTLS versions was this? Can the problem be pin-pointed to a specific GnuTLS release or Exim4 release? Does the problem go away with OpenSSL?

Bug #348046

This is a long bug report by several submitters. The initial report from Martin A. Brooks is stalled when he doesn’t respond to the (appropriate and relevant) questions that Marc Haber asks. The problem that Ian Zimmerman reports seems to be different, now GnuTLS clients work fine but OpenSSL clients fail to connect to the Exim4/GnuTLS server. The OpenSSL errors may suggest it only wants to talk SSLv2 for some reason (local configuration?). Debugging the OpenSSL problem further would be the appropriate response, and should likely be treated as an OpenSSL bug (!) until more evidence can be gathered. Later, Andrew McGlashan reports a problem where neither GnuTLS and OpenSSL works against the ‘Incredimail’ MUA.

Conclusion: The bug should really be forked into several problems, one for the initial reports where the submitter stopped responding, one for the OpenSSL problem, and one for the Incredimail problem (and as Incredimail isn’t a Debian package, it’s not Debian’s problem). Caveat: I may have missunderstood some parts of this bug report because different problems are discussed at the same time. Once that is done we can try to address each problem separetely.

Bug #338319

Report about an old version of exim4, seems to have been solved by removing the (buggy) exim4 code to re-generate DH parameters file needlessly. Last message says it will be closed in a few weeks.

Conclusion: Solved.

Bug #343085

Seems the same as the previous one, about exim4 in sarge. Not clear why this is open?

Conclusion: Solved.

Bug #390712

Appears to be triggered by GnuTLS implementing MAC padding to solve a security problem in TLS. OpenSSL reportedly does not implement the same work around, and would thus appear to be vulnerable to that problem.

Conclusion: Appears to be a ‘wontfix’ bug. Personally, I think GnuTLS could provide a simpler mechanism to disable MAC padding if applications deem this necessary. Someone could double check how important the MAC padding security concern is.

Bug #426013

Appears to be an unreprodicible problem with a specific certificate/key which the user cannot reveal. Another certificate/key from the same CA works fine. Theory: could it be CRLF problems? Other non-ASCII characters in the file? Nothing indicates a real GnuTLS problem here.

Conclusion: Likely not a GnuTLS problem.

Bug #446036

There is two technical claims here: that GnuTLS consumes too much entropy, and this would be a wishlist item that we could work on. The other claim is that ‘openssl actually supports full certificate chain lookups, so you can be guaranteed that this cert was signed was signed by that ca. gnutls does not, to the best of my knowledge.’. As far as I can understand with Stephen Gran refers to, that is simply false. It is suggested that GnuTLS’ performce is worse than OpenSSL, but there is no measurements to support that.

Syndicated 2007-11-09 11:00:14 from Simon Josefsson's blog


A free software conference in Sweden? That’s a rare one. Organized by the FSFE and Henrik Sandklef, it will be held on the 7-8 December 2007. I hope we’ll see more of this in Sweden. I’m proud to have been invited to talk about both GnuTLS and OpenID. I’m happy to see that there is a OpenMoko talk as well. If you want to participate, there is an early bird discount if you register now. If someone is going and would like to chat, drop me an email.

Syndicated 2007-10-23 10:29:02 from Simon Josefsson's blog


The TLS-AUTHZ document (protocol spec here) describes a mechanism to add support for authorization in the TLS protocol. The idea is part of a patent application, see the patent notification to the IETF. The protocol has a complicated history in the IETF. Right now a third last call is open to request feedback from the community. I’ve written about TLS-AUTHZ before.

RedPhoneSecurity is now trying to circumvent the IETF standardization process by trying to get the document published as an ‘experimental standard’. The document earlier failed to get consensus for publication on the standards track.

The responsible IETF Area Director, Tim Polk, argues that because there exists independent implementations, the community benefits from having the document published. The argument is silly because the only independent implementation is mine and I’m opposed to publication of the standard. Further, the document will remain accessible to anyone in the community with access to the Internet since it has been published as an Internet Draft. To clarify that we have no interest in a standard with patent claims, we have decided to remove the tls-authz implementation from GnuTLS. Together with the FSF we came up with the following statement which is part of the GnuTLS 2.0.2 release announcement:

** TLS authorization support removed.
This technique may be patented in the future, and it is not of crucial importance for the Internet community. After deliberation we have concluded that the best thing we can do in this situation is to encourage society not to adopt this technique. We have decided to lead the way with our own actions.

If you are concerned about having patented standards adopted by the IETF, now is a very good time to make your voice heard! The last call ends on October 23th. Please read about the issue, and familiarize yourself with the IETF process (RFC 2026, with updates related to patents in RFC 3989) and send your feedback to ietf@ietf.org.

Syndicated 2007-10-18 14:23:15 from Simon Josefsson's blog

Home Audio Server

Procrastinating real work, I documented my home audio server setup. I needed a cross-platform solution, and as a first step, I settled with MPD. The setup is only a few days old, and I may decide to change software eventually. But the current setup works under Gnome, Windows, Mac OS X and even on my Nokia 6233.

Home Audio Server

What may be missing is FM/DAB Radio and streaming of TV, but I’m not sure the little NSLU2 is up to it. We’ll see.

The writeup on how to do this is long, so I put it at a separate page:

(This is a continuation of my series to document the devices that run my home, the first was the internet setup).

Syndicated 2007-09-25 19:19:43 from Simon Josefsson's blog

GnuTLS v2.0

I released GnuTLS v2.0 yesterday, the announcement is available.

So now we can start thinking of nice stuff to have in the v2.1.x series. Integrating the PKCS#11 support is one. ECC support? TLS 1.2 may go into v2.0.x. Opaque PRF input support is planned. Some benchmarking and optimization could be interesting. Other ideas?

Syndicated 2007-09-05 07:08:58 from Simon Josefsson's blog

Building GnuTLS and GNU SASL without running ./configure

Sometimes it can be useful to build things without the autoconf ./configure machinery, and just use a simple and hand-maintained makefile and config.h. This is needed to build things in older uClinux environments. I wrote some instructions on how to build GnuTLS and GNU SASL, and their dependencies (libgpg-error, libgcrypt, libtasn1) without running ./configure, see:


The makefile/config.h aren’t specific to uClinux, so if you for some reason need to build these projects in some other environment, without autoconf, the files may be useful.

(Although if you want to build GnuTLS/GSASL properly under a modern uClinux, you’ll be better of reading an earlier post.)

Syndicated 2007-08-21 12:26:16 from Simon Josefsson's blog

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!