Older blog entries for DV (starting at number 165)

libxml2/libxslt

Made a new set of releases over the week-end libxml2-2.6.9 and libxslt-1.1.6 are out. I pushed them to the build system as well as the xmlsec1-1.2.5 release from Aleksey Sanin, they should all show up on Rawhide soon. There is a number of bug fixes in libxml2 and the xml:id implementation. Libxslt release includes only a couple of keys related bugs.

the blog effect

I didn't google for libxml2 recently, and when trying yesterday I noticed a clear change from what I was used to, blog entries are now making a significant part of the references to the project. Started to play with Orkut a bit, though the project is fading away (popularity wise) this is a huge amount of metadata, it's clear Google will be able to use it, this holds both promises and is a bit scary. They can now build person profiles, associate it to the home page and infer a lot of valuable relations from those data...

mailman bounces

As usual William Brack delivers what he promises, so there is a Python version of the mailing-list admin script, I also extended the bounce config file over the last week.

Boston

I will be near Boston from beginning of next week up to mid May visiting the other Red Hat folks there, I hope to be able to see other people there, notably the Ximians and William who will fly from California to finally meet face to face :-) . BTW clarkbw never underestimate the horrible traffic mess that is Boston and the name "Turnpike" really means "big mess" as far as I can remember !

xml:id

Fist working draft is out. First implementation hit libxml2 CVS tonight it took less than an hour, including setting up basic regression tests. Not finished of course, we will see how long it takes to bring that one to REC if it ever gets there.

I answered a few questions in the mailing-list about it too.

The germans gets it

While searching on the web I found a CVS web repository at http://cvs.zeit.de/code/view/mod_xslt/cms_extensions.html?rev=HEAD . The Zeit is one of the largest German newspaper, it does a lot of electronic publishing of course and has a CMS system. Big business, industry leader, not computer related, *but* since they developped extensions for Apache and Zope for their CMS, they documented their architecture and created a public CVS base ! They understand that it is not in their business interest to keep it internal and that giving back changes is important.

It's also SAP a leading german IT company who gave back their (aging but solid) Database for which MySQL is apparently providing support now. It seems to me, that the german industry understood the concepts and perspectives of open source and free software earlier, reading their IT press reinforce that feeling.

mailman bounces handling

The perl script I hacked last week-end works fairly well, it reduced the work to a few manual bounces uncaught and approving the valid but blocked posts. In the meantime William Brack rewrote the script in python, I didn't checked it out yet, but I will certainly switch since I feel better maintaining Python code.

TheReg and XSLT publishing

The Register is among the News site I look at from time to time, and I noticed beginning of this week that the format changed. I then later discovered that they are using libxslt to build the pages. I loved the following statement "We're proud that The Register uses valid XHTML and CSS on its pages" with the suggestion to report breakages. Even if TheReg specializes in treating the topics in an sensational way, their technical attitude should be noted too :-)

In general the [Database ->] XML -> XHTML processing though XSLT seems one of the best way to generate well-formed and even valid XHTML, and as long as the transformation is either static or cached, the cost for doing so is reasonable. The full xmlsoft.org site is generated that way, the pages are XSLT produced by the Makefile in libxml2 (and libxslt) doc directory, and also validated against XHTML1 DTDs (which are in the catalog if you installed the xhtml1 RPM) automatically. That way I'm sure that even if the doc content might not be ideal and certainly outdated it's structure and presentation at least are garanteed to be clean.

Formatting

sdodji pointed me at Prince an XML formatter using CSS for the rules and generating PDF and Postscript, which uses libxml2. I tend to agree with him that if commercial implementation are developped then that means that there is a need for such formatter and the Open Source community work like libcroco and sewfox may have a bright future. I just hope that the xmlroff project will continue too, I'm not sure Sun is still actively supporting the project. Formatting is hard but we have most of the infrastructure, maybe aiming at the full XSL is just too hard and a CSS based tool is more likely to get finished, well I hope so.

Mailman lists handling

dsandras pointed out at mladmin.pl to help handling the bounces. To me it wasn't very useful because the only automatif processing was to discard everything or accept everything, plus it's written in perl ... markmc though I was a bit too strongly biased agaisnt the language, and that the code had potential, but I really didn't want to learn per ... so far.

But considering how painful my lists handling have been, I really needed a tool, to I started looking at the perl tutorial online and hack the thing. The results are:

  • perl definitely sucks as a language, thank you !
  • an augmented version of mladmin.pl which will read $HOME/.bounces and filter stuff by default.
  • an example configuration file
  • about a thousand pending bounces less on mail.gnome.org
  • a couple of crontab entries in key machines >:->

Of course William was on it too :

<bill> DV: you cleared the xml-bindings admin page?
<DV> bill: better :-)
<DV> bill: I'm finishing up the perl-based :-( bounces cleaner
<bill> DV: I'm working on a Python-based version, and having a great "learning experience" concerning cookies....
<DV> bill: I afraid I beat you time wise :-)
<bill> DV: but all of a sudden my testing list (xml-bindings) had no more messages :-)

William pointed out too that today's is 4/4/4 which is apparently a very bad date as 4 for chinese is really a bad number.

I still hope he will come with a maintainable Python equivalent :-)

Spring, pictures and travels

Spring is really starting here, so I took my bicycle yesterday and had a nice ride in the valley. I took a few pictures of a very nice plum tree blooming in the valley. I really like the pictures taken by the S50, I'm buying an extra CF card and battery since I expect to go to Jamaica after the trip to the USA next month and I will really try to get good underwater pictures. Plus there is a lot of orchids growing there, I'm really looking forward those vacations.

libxml2/libxslt

Made a new set of release 2.6.8/1.1.5, mostly bugfixes. The website xmlsoft.org is now hosted on a separate machine at INRIA here in Grenoble, the new processor makes searches far more effective, and the dedicated bandwitdh helps too. The accesses increased significantly lately due to the references from PHP main page.

The languages dilemna

So Sun CEO don't want to see Java Open Sourced. This is a major screwup IMHO ! This mean the Open Source world will never be able to use Java as defined by Sun but only reimplementation (likely to be a limited and to some extend incompatible subset) or a different language like Mono/C# . If the goal is to avoid fragmentation it can only be a failure, if the goal is to maintain Java as a "Sun asset" then it might be a success but its value will diminish. This is not the reaction of an enterprise confident in its technical expertise but sounds more about keeping the familly jewells in the safe in case of disaster.

W.r.t. Microsoft granting Royalty Free licence to ECMA 334/335 standards all I saw so far is just a mail archive saying that Microsoft "will" do this, it doesn't sound that great.

One thing is sure to me, dealing with a dying Sun Microsystem over patents or Java branding might not be any easier than dealing with Microsoft about C# rights.

Red Hat

I'm moving to the Red Hat Desktop group internally, I'm quite happy to join that group along with some of the other GNOME hackers, this is exciting but also mean I will have less time left for libxml2 and libxslt, so don't expect XPath2 or XSLT2 implementations in the near future.

18 Mar 2004 (updated 18 Mar 2004 at 12:43 UTC) »

Mono

Quick entry because Miguel wrote:

Microsoft has granted RAND+Royalty Free licenses to any patents they might own that are required to implement the ECMA 334/335 standards. So at least our core VM, classes and compilers are safe from any litigation from *Microsoft*.

Where ??? RAND is not sufficient, and though I have been looking at all informations posted on this topic so far I have never seen a "written" statement from Microsoft about Royalty Free Licence being granted (and to whom) explicitely. This information must show up from a Microsoft spokeperson to have any legal value, right ?

ncm don't worry, a number of people still value lean, fast and robust libraries written in C, when there is a lot of reuse this makes sense, but for developping application on top of them it's clear those new languages are more efficient from a programmer perspective (though memory hungry).

Madrid

Pure barbary, nothing can justify this, nobody will have any benefit from it either, just pointless cruelty. I'm very pessimistic about stopping this though, there will always be fanatics, most people need to believe in some dogma, and sometimes it just goes out of control. I'm afraid this inhuman behaviour is well precisely one of the worse aspect of mankind, it always existed over history, except technology allows to do it anonymously now.

End of Rufus.W3.Org, a.k.a. rpmfind.net

Tuesday morning the box rebooted, it had lost 3 drives from the array, I was expecting to take it down soon anyway since W3C/MIT couldn't really host it anymore. Still this was a bit of a shock, that box was setup in 98 before going back to France, it was a good box. No big issue with it except loosing a drive from time to time, something I could usually handle remotely, and the initial mistake of using the infamous BP6 motherboard. Some of the remains will be sent back to France to help consolidate fr.rpmfind.net . INRIA will now host xmlsoft.org, which was also on the same box, and I switched rpmfind.net top DNS to the server at speakeasy, apparently it didn't like it too much Wednesday the load was around 650 (but the box was still responsive to ssh). Expect some service troubles until I really finish to set all this up...

FOSDEM

I noticed shortly after posting my last blog that I forgot to thank dsandras and the FOSDEM staff for an excellent conference. Very geeky, lot of people, good talks (Keith made a hit, everybody is waiting for the next generations of X framework, impressive demo !), a bunch of good friends and truely good beer on to of this, what a nice week-end. My only regret was to miss some of the discussion over free JVMs integrations, because he had to drive early to Paris to avoid the traffic jams.

XML hollidays

Heading for 3 days of hollidays in Cannes on the French Riviera ... to go to a W3C XML meeting ! Right now I see snow falling in Grenoble, and I'm a bit worried since I will drive there through part of the Alps. Hopefully we will have some kind of connectivity there. I am looking forward seeing Norm Walsh and Liam Quinn, and a lot of other "XML standard" and corporate geeks.

In perspective the comparison between FOSDEM FS and OSS geek meeting to the W3C corporate geek meeting will be interesting, I will try to bring a few photos back, corporate geeks are far more policed, but they can be as enthusiast or even crazy at times. A W3C group meeting room look far more quiet and prepared than say the KDE developper room at FOSDEM, but on the tables you will find the same amount of laptop, the same deseperate attempts to get a wireless IP and usually the same mess of ethernet cables all around :-)

bugsquad

Thanks John ! Thanks Luis, and happy birthday to you and have fun ! (IIRC Flat Top is that pool place close to MIT, and I enjoyed it a lot too)

libxml2 and libxslt

More releases, more bug found, more bug fixed. Yesterday I cleaned up the use of _private field in the libraries, so to should ease the life of the people using it for wrappers like PHP and Perl bindings. I also wanted to check that the recent changes didn't break the very large file support, this still works fine (RHEL AS 3, Celeron, 4.8 GByte XML file):

server:~/XML -> ls -l db24000000.xml
-rw-rw-r--    1 veillard vcsa     4843680040 Feb 27 15:57 db24000000.xml
server:~/XML -> xmllint --stream --timing db24000000.xml
Parsing took 1153361 ms
server:~/XML -> xmllint --stream --timing --relaxng db.rng db24000000.xml
Compiling the schemas took 1 ms
Parsing and validating took 2348820 ms
db24000000.xml validates
server:~/XML -> ./testSAX --timing db24000000.xml
768000006 callbacks generated
Parsing took 468294 ms
server:~/XML -> grep -C 2  MHz /proc/cpuinfo
model name      : Celeron (Coppermine)
stepping        : 6
cpu MHz         : 701.620
cache size      : 128 KB
fdiv_bug        : no
server:~/XML ->

OSS job

lauris, you're right, it's hard to get paid for OSS jobs, the industry is built only around the fact that they are gonna pay only if they are forced to. I think it's the lesson I take from Red Hat Linux end of life (i.e. the company had to force corporate users to buy the support -- well that's my own analysis of the Red Hat Enterprise Linux/Fedora change), I think it's unfortunate that short vision on investment is the common rule. It doesn't mean that OSS can't feed you, but it's tricky. Things like GPL'ing and offering dual licences seems the most common way to reach the corporate wallet. Your point about fame vs. income is partly right, releasing under a liberal licence also allow to grow mindshare, then the potential customer base is quite larger, seems to me the two approachs possible are really:

  • Strict GPL licencing, smaller mindshare but allows you to sell under a dual licence, this is fits better if you target a smaller specific market.
  • Very liberal licencing (Apache/BSD/MIT) allows you to get a larger mindshare, but selling your services is harder, it's better suited for very common pieces if you can get a big piece of the "market", Apache is a prime example.

About paying for support, I'm still a bit dubious about this model, in the sense I see it as requiring a lot of resources to be viable, a large organization can make an economy of scale for servicing similary a large number of users, but for a small business based on OSS, seems to me you better off trying to keep the number of bugs as small as possible, limit the time spent dealing with trivial issues and instead try to cash on new feature requests, optimization or special development which sounds a better use of the time and deep knowledge of the code which are the developer main assets.

Trying to live of OSS in a small structure still looks extremely difficult to me, even if the corporate world start to understand what OSS and Free Software are.

FOSDEM

Heading to FOSDEM tomorrow. Doing part of the way with sdodji and strider, that should be fun :-)

libxml2 and libxslt

Released libxslt-1.1.3, it breaks the latests yelp version but we have 2 fixes, the problem will be solved in next version of either yelp or libxslt, it's in CVS too. Thanks skvidal for pointing me at Dive into Mark last blog, it's nice to get some good feedback (and no I didn't sold my soul to anybody ... or I just didn't noticed), I exchanged a couple of mails with Mark Pilgrim, he seems to be a nice person. I also got in touch with the PHP folks, apparently the integration in PHP5 is progressing nicely, that's good :-)

Advogato

The "preview" for diaries looks instantaneous, I don't know if this is related to the hardware change but it's a significant improvement for me.

156 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!