Older blog entries for aleix (starting at number 38)

Integer comparisons

This may be something basic, but I lost a bit of time last week trying to find a bug at work, so I thought it was worth mentioning it.

When comparing signed and unsigned expressions of the same size, the compiler produces what it might be unexpected results. Suppose you have this code:

#include <stdio.h>

int
main (void)
{
  unsigned short int a = -12;
  signed short int b = -12;

  printf ("%s\n", (a == b) ? "OK" : "failed");

  return 0;
}

Here, you would probably expect an “OK” output, but surprisingly you get “failed“. Why is this? If you know about integer promotion, you may skip to the end if you don’t want to follow all the process.

Fortunately, I received a copy of K&R last week, so I started digging into the issue to try to understand why this was happening.

About equality operators, K&R section A7.10 reads

The equality operators follow the same rules as the relational operators…

Section A7.9, about relational operator, reads

The usual arithmetic conversions are performed on arithmetic operands…

So, how these arithmetic conversions work?

This is also clearly explained in K&R section A6.5 (the same rules apply to the C99 standard, section 6.3.1.8, Usual arithmetic conversions). The part that interests us is after having evaluated all the real type conversions, when it says

Otherwise, the integer promotions are performed on both operands…

So, before performing the comparison of our operands, both undergo integer promotion. Integer promotion (K&R, section A6.2) says that

If an int can represent all the values of the original type, then the value is converted to int; otherwise the value is converted to unsigned int.

An int can represent all the values of our operands, so after the integer promotion is performed, both operands have int type. Having a look back to our operands, we know that the a variable holds the value 0xFFF4, and after applying the integer promotion, it maintains the value. The same happens with variable b, that holds 0xFFFFFFF4 to represent -12. Clearly, both values are different and the check fails.

At the end of K&R section A6.5 this is explicitly explained

The new rules are slightly more complicated, but reduce somewhat the suprises that may occur when an unsigned quantity meets signed. Unexpected results may still occur whan an unsigned expression is compared to a signed expression of the same size.

Note, that this issue does not occur with 32-bit values, as both operands would be “0xFFFFFFF4″.

So, be careful when comparing unsigned and signed types.

Syndicated 2007-09-16 17:33:32 from axelio

Simple macros for Check

During my vacation I have had the opportunity to add unit testing to SCEW (Simple C Expat Wrapper). I looked at various C unit testing frameworks and I decided to use Check. Most of them follow the xUnit approach, but I chose Check because tests run in a separate address space other than the test runner.

I found that writing test cases was a bit hard using Check’s syntax, for example following the manual you can write this integer check:

fail_unless (money_amount (m) == 5,
             "Amount not set correctly on creation");

This is fine if you are reading the code, but if the check fails the output doesn’t show you what the actual or expected values are, so the manual suggests changing it for:

fail_unless(money_amount (m) == 5,
            "Amount was %d, instead of 5", money_amount (m));

which is quite better than the first one, but to painful if you have to write it for every check. So, why not write a helper macro that checks for integers, prints the actual and expected values and also shows you the check that is being done?

#define CHECK_INT(A, B, MSG, ...)                                       \
  do                                                                    \
    {                                                                   \
      enum { MAX_BUFFER = 250 };                                        \
      static char buffer[MAX_BUFFER];                                   \
      sprintf (buffer, MSG, ##__VA_ARGS__);                             \
      fail_unless ((A) == (B),                                          \
                   "(%s) == (%s) \n  Actual: %d \n  Expected: %d \n  %s", \
                   #A, #B, A, B, buffer);                               \
    }                                                                   \
  while (0);

With this macro you can now write code like this:

CHECK_INT (scew_list_size (list), N_ELEMENTS,
           "Number of children found searching by name");

which is really easy to read and in the test’s output you can see the actual and expected values, the performed test and the user message clarifying the intention of the check.

element.c:205:F:Core:test_search:0: (scew_list_size (list)) == (N_ELEMENTS)
  Actual: 1
  Expected: 12
  Number of children found searching by name

The same happens with strings, so instead of writing this:

fail_if (strcmp (money_currency (m), "USD") != 0,
         "Currency not set correctly on creation");

or this:

if (strcmp (money_currency (m), "USD") != 0)
  {
    fail ("Currency not set correctly on creation");
  }

we can write a string checking macro that shows the actual and expected strings and the check being done:

#define CHECK_STR(A, B, MSG, ...)                                       \
  do                                                                    \
    {                                                                   \
      if (strcmp ((A), (B)) != 0)                                       \
        {                                                               \
          enum { CHECK_MAX_BUFFER = 250 };                              \
          static char buffer[CHECK_MAX_BUFFER];                         \
          sprintf (buffer, MSG, ##__VA_ARGS__);                         \
          fail ("(%s) == (%s) \n  Actual: %s \n  Expected: %s \n  %s",  \
                #A, #B, A, B, buffer);                                  \
        }                                                               \
    }                                                                   \
  while (0);

As before, this would be the new output:

element.c:97:F:Core:test_accessors:0: (scew_element_name (element)) == (NAME)
  Actual: new_root
  Expected: root
  Element name do not match

Well, this is not a big deal, but I have found it quite useful. Below, is the list of macros I am using right now:

#define CHECK_INT(A, B, MSG, ...)                                       \
  do                                                                    \
    {                                                                   \
      enum { MAX_BUFFER = 250 };                                        \
      static char buffer[MAX_BUFFER];                                   \
      sprintf (buffer, MSG, ##__VA_ARGS__);                             \
      fail_unless ((A) == (B),                                          \
                   "(%s) == (%s) \n  Actual: %d \n  Expected: %d \n  %s", \
                   #A, #B, A, B, buffer);                               \
    }                                                                   \
  while (0);

#define CHECK_STR(A, B, MSG, ...)                                       \
  do                                                                    \
    {                                                                   \
      if (strcmp ((A), (B)) != 0)                                       \
        {                                                               \
          enum { CHECK_MAX_BUFFER = 250 };                              \
          static char buffer[CHECK_MAX_BUFFER];                         \
          sprintf (buffer, MSG, ##__VA_ARGS__);                         \
          fail ("(%s) == (%s) \n  Actual: %s \n  Expected: %s \n  %s",  \
                #A, #B, A, B, buffer);                                  \
        }                                                               \
    }                                                                   \
  while (0);

#define CHECK_PTR(A, MSG, ...)                                          \
  do                                                                    \
    {                                                                   \
      enum { MAX_BUFFER = 250 };                                        \
      static char buffer[MAX_BUFFER];                                   \
      sprintf (buffer, MSG, ##__VA_ARGS__);                             \
      fail_unless ((A) != NULL, "(%s) != NULL \n  %s", #A, buffer);     \
    }                                                                   \
  while (0);

#define CHECK_NULL_PTR(A, MSG, ...)                                     \
  do                                                                    \
    {                                                                   \
      enum { MAX_BUFFER = 250 };                                        \
      static char buffer[MAX_BUFFER];                                   \
      sprintf (buffer, MSG, ##__VA_ARGS__);                             \
      fail_unless ((A) == NULL, "(%s) == NULL \n  %s", #A, buffer);     \
    }                                                                   \
  while (0);

Update 2007/08/11: I have updated the macros so variable number of arguments are allowed (see variadic macros).

Syndicated 2007-08-06 16:07:23 from axelio

Be careful with packed structures!

If you use the C language, you may have probably wanted to pack your structures so no alignment is done by the compiler. This can be useful for example to build network packets. You can find the basics of packed structures using GCC, here or here.

If everything seems clear, why am I writing this? Today, a coworker has found a bug related to packed structures. The issue was with internal structures (i.e. a substructure). A year ago, or so, I knew that substructures were not packed even if its enclosing structure is packed, but it seems I forgot about it, so I have decided that it was worth writing it here so I do not forget it again (I’m sure I will).

I will take the example found in GCC documentation (only since version 3.4.0). Suppose you have the following code:

struct my_unpacked_struct
{
  char c;
  int i;
};

struct my_packed_struct
{
  char c;
  int  i;
  struct my_unpacked_struct s;
} __attribute__ ((__packed__));

struct my_packed_struct my = {
  .c = 10,
  .i = 20,
  .s.c = 30,
  .s.i = 40
};

If we generate the assembly for this (I have omitted some things not needed for the example), we will get:

        .globl _my
        .data
_my:
        .byte   10  <--- c
        .long   20  <--- i
        .byte   30  <--- s.c
        .space 3    <--- 3 bytes of alignment
        .long   40  <--- s.i

As you can see, the compiler has not aligned the internal structure, but the enclosing one. So, what you need to do if you want the internal structure also packed is to pack my_unpacked_struct:

struct my_unpacked_struct
{
  char c;
  int i;
} __attribute__ ((__packed__));

Now, we get what we initially expected:

        .globl _my
        .data
_my:
        .byte   10  <--- c
        .long   20  <--- i
        .byte   30  <--- s.c
        .long   40  <--- s.i

Packing the whole structure my_unpacked_struct is fine if you do not use it anywhere else, but it would be great to use variable attributes (we have used type attributes so far), so we could only pack the internal substructure variable like this (it doesn’t work):

struct my_packed_struct
{
  char c;
  int  i;
  struct my_unpacked_struct s __attribute__ ((__packed__));
} __attribute__ ((__packed__));

Update 2007/07/25: read the first comment to understand why the variable attribute is not working in this case.

By the way, in the example I have initialized the structure my using designated initializers.

And remember, be careful with packed structures!

Syndicated 2007-07-24 18:07:25 from axelio

Erlang R11B-5 in Fink

My friend jao has asked me to update the Erlang package in Fink to the latest version. It is still under validation, so until it is committed you can use this file. To install it, just type:

$ tar zxvf erlang-otp-R11B-5-fink.tar.gz
$ sudo cp erlang-otp-R11B-5-fink/*
          /sw/fink/dists/unstable/main/finkinfo/languages
$ fink index; fink rebuild erlang-otp; fink install erlang-otp

It’s better if you backup your old Erlang Fink files, just in case.

Update 2007/07/06: committed to Fink unstable.
Update 2007/06/28: added unixodbc2 and ncurses5 dependencies.

Syndicated 2007-06-19 22:30:15 from axelio

ECL in Fink

For anyone interested in ECL (Embeddable Common-Lisp), I have created the package for Fink. It is still under validation, so until it is committed you can use this file. To install it, just type:

$ tar zxvf ecl-0.9i-fink.tar.gz
$ sudo cp ecl-0.9i-fink/* /sw/fink/dists/local/main/finkinfo
$ fink index; fink install ecl

Update 2007/07/06: committed to Fink unstable.

Syndicated 2007-06-19 08:26:50 from axelio

Packing and unpacking bit structures in Python

Last week, I released the first version of the BitPacket Python module which allows you to pack and unpack data like the struct and array modules, but in an object-oriented way. At work I needed an easy way to create network packets and at that time I did not know the existence of the struct and array modules, so I googled a bit and I found out the BitVector class for a memory-efficient packed representation of bit arrays, which I decided to use for my purpose.

I implemented three classes, BitField, BitStructure and BitVariableStructure (the lastest two are derived from BitField). A network packet would be represented by the BitStructure class, which at creation does not contain any field, and the idea is that any BitField subclass might be added to it.

I’ll will show you the most basic example. Suppose, you need a simple network packet like the one below:

+---------------+-------------------+
|  id (1 byte)  |  address (4 byte) |
+---------------+-------------------+

You could easily create a network packet using BitStructure, like this:

>>> bs = BitStructure('mypacket')
>>> bs.append(BitField('id', BYTE_SIZE, 0x54))
>>> bs.append(BitField('address', INTEGER_SIZE, 0x10203040))

and print its contents:

>>> print bs
>>> (mypacket =
>>>   (id = 0x54)
>>>   (address = 0x10203040))

In order to unpack an incoming packet, we could use the variable created above or a new one without default values:

>>> bs = BitStructure('mypacket')
>>> bs.append(BitField('id', BYTE_SIZE))
>>> bs.append(BitField('address', INTEGER_SIZE))

In order to unpack an incoming array of bytes, we would do the following:

>>> data = array.array('B', [0x38, 0x87, 0x34, 0x21, 0x40])
>>> bs.set_stream(data)

We can then access to the packet fields by their name:

>>> print '0x%X' % bs['id']
0x38
>>> print '0x%X' % bs['address']
0x87342140

There are a lot more possibilities to pack and unpack bit field structures by using BitStructure and BitVariableStructure. You can see all of them in the module’s online documentation.

Syndicated 2007-06-16 13:52:32 from axelio

Honeymoon and code poetry

code poetryDuring my honeymoon in Egypt (yes, I got married three weeks ago!) I had the chance to visit the new Bibliotheca Alexandrina. Before the guided tour, we spent some time walking around the different exhibitions inside the library building. In one of them, I found a set of punch cards and the great thing is what I could read in them:

“It gives a feeling of completeness”

“A great code is probably as rare as diamonds”

“Like a good poem, there are levels of insight to be gained at each reading”

“Beautiful software reflects a profound understanding of the world in some way”

“Like a good lecture, it leaves me with a desire to use what I’ve just learned”

I don’t know if this was a common behavior in the punch cards days or if people were used to it (if anyone knows I would be really interested in it). I found it really refreshing, it seems that in those days people loved what they were doing, they loved to write software.

Syndicated 2007-05-26 13:36:43 from axelio

Easy acronyms generation

At work, we needed a way to easily generate a list of acronyms of our documents, so I wrote an script (tex-acronyms.py) that parses the acronyms in a specified LaTeX file and tries to find a description (specified as a nomencl definition) in the files (*.tex) located in a directory. The script will create two files: one with acronyms and their descriptions and the other with conflicts, that is, acronyms with no or duplicate description.

The following example is a sample of an acronyms list file. Note that the LaTeX document must use the nomencl package, since the acronyms are defined using the nomencl syntax.

\nomenclature{ADDR}{Address}
\nomenclature{ANSI}{American National Standards Institute}
\nomenclature{API}{Application Programming Interface}
...

It is also possible to provide a list of words to be excluded in the exclude_acronyms.txt file, that must be located in the data directory by default (a different file can be specified using -x). For example:

CHAPTERS
CLOSED
DUPLICATE
FIRST
FIXED
BOTH
BUS
...

It is easy to find words to be excluded because they will be treated as errors (and written to the errors file) as no definition is available for them.

Finally, it is possible to parse the input files recursively passing the -r argument. This will parse all the files included with \input.

A possible program call could be:

tex-acronyms.py -r -d ~/acronyms -i article.tex -o acronyms.tex 
                -e acronyms.errors

where the acronyms found in the article.tex file (and its dependencies) will be matched against the acronyms found in ~/acronyms/*.tex. The matched acronyms will be written to acronyms.tex and the errors in acronyms.errors. Now, you just need to include acronyms.tex in your document.

Update 2007/02/23: GlossTeX does the same (and much more) but à la TeX way.

Syndicated 2007-02-22 15:01:50 from axelio

Designated initializers

Last year, I discovered, thanks to the book “C, A Reference Manual“, a great C99 feature: designated initializers. Designated initializers allow you to initialize components of an aggregate (structure, union or array) by specifying their names within an initializer list.

Arrays initialization

What most people normally use to initialize an array is the following idiom:

int v[4] = { 4, 2, 1, -5 };

in which you need to initialize each component of the array sequentially. Designated initializers allow you to specify which component of the array you want to initialize. Thus, we could write the line above as:

int v[4] = { [1] = 2, [2] = 1, [0] = 4, [3] = -5 };

Note that we have specified the component indexes which has allowed us to initialize the array with our desired order. If we do not initialize all the components, those not initialized will get 0 values. We can also mix both methods, so the line below would be also correct:

int v[4] = { [1] = 2, 1, [3] = -5 };

in which the component not referenced goes right after the named one.

A possible use of this kind of initializations would be a mapping between a list of identifiers and a list of strings.

// The public interface

typedef enum {
  id_one,
  id_two,
  id_three
} id_t;

extern char const* string_by_id (id_t id);

// The private implementation

static char const* strings[] =
{
  [id_one] = "identifier one",
  [id_two] = "identifier two",
  [id_three] = "identifier three"
};

char const*
string_by_id (id_t id)
{
  return strings[id];
}

Structures and unions initialization

Designated initializers are also useful to initialize components of structures and unions by their name. In this case, the component to be initialized takes the form .c, where c is the name of the component. So, suppose we have the following structure:

struct point { float x; float y; float z; };

we could initialize each component of a struct point variable like this:

struct point my_point =
{
  .x = 0.34,
  .y = 0.98,
  .z = 1.56
};

With unions, we will use the same method, so having the following union:

union integer
{
  unsigned char int_8;
  unsigned short int int_16;
  unsigned long int_32;
};

we can initialize it by any of its components:

union integer value = { .int_16 = 24000 };

Finally, we can merge both cases, so we can have arrays of structures or unions that can be initialized using designated initializers:

struct point pointvector[3] =
{
  [0].x = 0.34, [0].y = 1.78, [0].z = 3.18,
  [1] = { .x = 3.5, .y = 6.89 },
  [2] = { .y = 2.8, 1.23 }
};

Syndicated 2007-02-19 09:02:46 from axelio

Split directives into multiple files

Yesterday, I found out that I needed an application to split the virtual hosts of an Apache 1.3 configuration into separate files so I could use them in the sites-available/sites-enabled Debian’s Apache 2 way. I googled just a bit and I did not find anything so I did my own one (vhost-split.py).

Just pass it the configuration file and the script will generate a bunch of files named with the ServerName variable found in each virtual host. Note that repeated entries will generate separate files (www.mydomain.com, www.mydomain.com-1, …). The script will also report commented entries.

Syndicated 2007-02-15 18:57:09 from axelio

29 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!