Older blog entries for badger (starting at number 82)

Wanted: C++ Programmer to work with Inkscape upstream

One of the things to have emerged from the hallway track at the Google Summer of Code Mentor Summit was the need for a robust, featureful, free software whiteboarding tool. This would allow people to collaboratively work on project design, model workflow, and do things more visually than the current round of instant messaging, pastebins, collaborative text editors, and voip.

Currently, I know of two potential competitors for this. The first is Coccinella, a tcl program that does free-form drawing with a few caveats. Here's what mizmo, one of the main Fedora Design Team Members has to say about it:

For free-form drawing, Jabber-based Coccinella gets me close, but it's a little clunky and when people join a meeting late they don't get to see what was drawn on the whiteboard before they joined. I'd like it to automatically snapshot the whiteboard at various points and synchronize the snaps with the text conversation and automatically email me a report.

Additionally, coccinella doesn't have many of the tools that make diagramming, flow charting, and other, more structured drawings easier. For this, many artists use inkscape. Inkscape allows artists and designers to make mockups and quickly prototype new designs. At least a few open source developers also use it for making charts and diagrams to visualize their program's structure and execution. It would be great if we could collaborate on these over the Internet using inkscape's rich toolset. This is where the inkscape whiteboard plugin enters the picture.

The whiteboard plugin, inkboard, was written as a GSoC project in 2005. Although there's been some work on it since then, development has not kept pace with the rest of inkscape. Currently, it is disabled in the configure script since it doesn't work. However, I talked with inkscape developer Jon A. Cruz at the Mentor Summit and found that all is not lost. Although someone is needed to step up and work on inkboard to bring it back, recent changes in the core of inkscape will make it easier to implement. Removal of id tags in the SVG that bloated the image size and caused potential conflicts between two synchronizing inkscape programs as well as incorporation of a new XMPP implementation should make the next version of inkboard easier to write and more robust.

Now where do you come in? From time to time someone will write me an email that says, "I've been using Linux for years and now I want to give back to the community. I've got programming experience in C++, how can I help?" This is your chance to step up! Contact Jon or subscribe directly to the inkscape developers mailing list. Check out the inkscape code from svn. And then get hacking!

Adel Gadllah (dragoo1) ran my script on his computer with a couple other compressors: pbzip2 (a parallel implementation of bzip2) and pigz (a parallel version of gzip). His computer is a quad core with 6GB of RAM. A definite upgrade from the machine I tested on (dual core with 1GB of RAM). The results are quite interesting.

Since no new algorithms were introduced, just new implementations, the compression ratios didn't change much. But the times for the parallel implementations were very interesting. pbzip2 runs faster than gzip. pigz -9 runs faster than lzop -1! If compression was the only process being run on the machine then the parallel implementations are definitely worthwhile.

Well, after reading this message from notting about speeds and sizes of xz compression at various levels, I got curious about how gzip falls into the picture. So I wrote a little script to do some naive testing, found a 64MB text file (an sql database dump), and ran a naive benchmark. First, the script so you can all see what horrible assumptions I'm making:


#!/bin/sh                                              


LZOP='lzop -U' GZIP='gzip' BZIP='bzip2' XZ='xz'

TESTFILE='/var/tmp/test.dump'

for program in "$LZOP" "$GZIP" "$BZIP" "$XZ" ; do case $program in gz*) ext='.gz' ;; bz*) ext='.bz2';; xz*) ext='.xz';; lz*) ext='.lzo';; *) echo 'error! No configured compressor extension' exit ;; esac

COMPRESSEDFILE="$TESTFILE$ext"

for lvl in `seq 1 9` ; do c_time=`/usr/bin/time -f '%E' 2>&1 $program -$lvl $TESTFILE` c_size=`ls -l $COMPRESSEDFILE |awk '{print $5}'` d_time=`/usr/bin/time -f '%E' 2>&1 $program -d $COMPRESSEDFILE` printf '%-10s %10s %10s %10s\n' "$program -$lvl" $c_time $c_size $d_time done done

As you can see, I'm not flushing caches between runs or anything fancy to make this a truly rigorous test. I'm also running this on my desktop (although I wasn't actively doing anything on that machine, it was logged into a normal X session with all the wakeups and polling and etc that that implies.) I also only used a single input file for data. Binary files or tarballs with a mixture of text and images and executables could certainly give different results. Grab the script and try this out on your own sample data. And if you get radically different results, post them!


Compressor   Compress     Size   Decompress
----------   --------   -------  ----------
none [*]_     0:00.43   67348587    0:00.00


lzop -U -1 0:00.57 16293912 0:00.35 lzop -U -2 0:00.62 16292914 0:00.40 lzop -U -3 0:00.62 16292914 0:00.34 lzop -U -4 0:00.57 16292914 0:00.42 lzop -U -5 0:00.57 16292914 0:00.42 lzop -U -6 0:00.67 16292914 0:00.41 lzop -U -7 0:13.53 12824930 0:00.30 lzop -U -8 0:39.71 12671642 0:00.32 lzop -U -9 0:41.92 12669217 0:00.28

gzip -1 0:01.96 11743900 0:01.02 gzip -2 0:02.04 11397943 0:00.92 gzip -3 0:02.77 11054616 0:00.89 gzip -4 0:02.59 10480013 0:00.82 gzip -5 0:03.42 10157139 0:00.78 gzip -6 0:05.44 9972864 0:00.77 gzip -7 0:06.71 9703170 0:00.76 gzip -8 0:13.64 9592825 0:00.91 gzip -9 0:15.89 9588291 0:00.76

bzip2 -1 0:20.17 7695217 0:04.73 bzip2 -2 0:21.68 7687633 0:03.69 bzip2 -3 0:23.48 7709616 0:03.63 bzip2 -4 0:26.00 7710857 0:03.69 bzip2 -5 0:25.45 7715717 0:04.09 bzip2 -6 0:26.95 7716582 0:03.95 bzip2 -7 0:28.13 7733192 0:04.23 bzip2 -8 0:29.71 7756200 0:04.36 bzip2 -9 0:31.39 7809732 0:04.50 [@]_

xz -1 0:08.21 7245616 0:01.86 xz -2 0:10.75 7195168 0:02.23 xz -3 0:59.45 5767852 0:01.90 xz -4 1:01.75 5739644 0:01.83 xz -5 1:09.70 5705752 0:02.60 xz -6 1:46.23 5443748 0:02.09 xz -7 1:50.37 5431004 0:02.19 xz -8 2:02.41 5417436 0:02.19 xz -9 [#]_ 2:18.12 5421508 0:02.55

.. _[*]: Time to copy the file. .. _[@]: What's up with bzip2? Why does the size increase with higher levels? .. _[#]: Note, xz -9 is unfair on two counts: 1) it pushed me into swap. 2) As for the size, xz had this output during that run:: Adjusted LZMA2 dictionary size from 64 MiB to 35 MiB to not exceed the memory usage limit of 397 MiB

My conclusions based upon entirely too little data :-)

  • If you want transparent compression, use lzop at one of the lower compression settings. I got 25% of the size at 100 MB/s with lzop -2.
  • Do not use lzop with -7 or higher. If you want more compression than -2/3/4/5/6 (the algorithm for these is currently all the same) use gzip. You'll get better compression with better speed.
  • The only reason to use bzip2 is if you must have both a smaller size than gzip and you can't deploy xz there. If you don't need the smaller size or the remote side can get xz then bzip2 is a waste. This applies to distributing source code tarballs as two formats, for instance. If you're going to release in two formats, use tar.gz and tar.xz instead of tar.gz and tar.bz2.
  • xz gets the smallest size but it's versatile in other ways too: xz -2 is faster than gzip -9 with better compression ratios.
  • gzip beats xz at decompression but not nearly as badly it beat bzip2.

So thanks to cdfrey, I'm a little closer on two fronts.

First, the problem as given has a solution for hack #2 but apparently not hack #1. Here's the new sequence of commands:


git checkout base_url
git log
# Manually find the last commit in staging before I branched
git rebase --onto master [COMMIT ID FROM ABOVE]
git checkout master
git merge base_url

So no more patches, yay! However, you probably notice that we still have to use git log to find the branchpoint. After some discussion of this, it seems that if we have merged from the feature branch back to the branch it came from, there's no way around this. git does not maintain the history of where something came from and where it goes back to, it holds onto the heads and then follows the chain of commits back. So once we're merged, there's no branch point anymore... the trees are the same.

However, we did figure out a potential way to implement our workflow in the future. Instead of branching from staging, the feature branch should start off branching from master. After it's been worked on, it gets merged to staging. But since it started off from master, that should still leave the feature branch with a clear path of changes to apply to master. Once the changes have been tested in staging, we can merge the feature branch into master and it's then "okay" for the branchpoint to disappear since the work is completed.

Okay, git lovers, I have an incredibly simple problem but so far the only working solution is a kludge. I'm hoping someone can tell me what the elegant way to solve this problem is.

I'm working with three branches keeping configuration information for our environment. master is where our production configs live. staging is a branch where we merge changes and test them in our staging environment. Once tested, they get cherrypicked to master.

base_url is where I've been working on a new change that spans several commits. It was branched off of staging. After completion, I merged the changes into the staging branch and tested. So far so good.

Now I want to merge my branch into master. How do I do that?

Here's an idealized diagram of the branch relationships. In reality, sometimes changes go into master before staging.


master       staging   base_url
  |             |  merge |  ___
  | cherrypicks +<-------+   ^
  +<------------+        |   |
  |    (cp)     |        |  How do I merge these to master?
  +<------------+ branch |   |
  |    (cp)     +------->+  _V_
  +<------------+
  |   branch    |
  +------------>+
  |
  |
/srv/puppet

So far everything I've tried with git rebase or git merge seems to be sending changes from the staging branch that were present before I branched to base_url into master. I don't want that. I did my changes on a separate branch so I could merge just my changes to both staging and master later.

Here's the kludge that did work:


git checkout base_url
git log
# Manually find the last commit in staging before I branched
git format-patch [COMMIT ID FROM ABOVE]
git checkout master
git am [patch 0001 0002 0003....etc]

The two things that I find too hacky about this solution are:

  1. using git log to find the branch point. git should know where I branched from staging... I just need to tell it that I want to pull changes from the branch point forward somehow and it should find the proper commit id.
  2. generating patches and then applying them. git should be able to do this without generating "temporary files" like this. The deltas are in the repo, why pull them out into text files?

I have copies of my repository before I "fixed" it with the patch commands. So send me your recipes and I'll see if any of them work. Once we have a winner, I'll post strategies that worked and ones that didn't.

Of course, even after I know how to do this, there's still all sorts of follow on questions -- like, what happens if this new feature took a long time and I needed to remerge the base_url branch with staging in the middle?

a.badgergmail.com or abadger1999 on irc.freenode.net

Do not buy Swingline stapler model #545xx

My wife was having problems stapling today so I looked inside this one and found that the staple had fallen over inside the stapler. So, instead of the staples forming the upside down "U", all ready for the teeth to punch into the paper being stapled, the staples were positioned with the teeth to the front and the base of the "U" facing the springloaded rear of the stapler.

This is just poor design as some misguided engineer tried to cut costs. All other staplers I've seen have some sort of platform down the middle of the staple feed chamber. This allows the base of the "U" to rest supported on the platform and not depend on standing upright on their feet. Getting rid of that platform means that the staples can fall over when the stapler is loaded or if the spring's tension is off.

Or perhaps it isn't poor design -- A little experimentation showed that the stapler has one feature sure to please a company exec, provided it's the exec of Swingline: It's nearly impossible to load small quantities of staples into this design. With nothing to support them, they just fall over and slide underneath the other staples.

Had a productive evening planned out but didn't get to do any of it because of a chicken emergency. First time I've actually seen "it gave a spasm that threw its whole body in the air and died." Parents get back tomorrow night so hopefully I can start working long hours again starting next week.


On 07/21/2009 04:24 AM, Dimitris Glezos wrote:
> For me, Fedora isn't so much what we think it is -- it's
> what our community wants it to be. And if a part of our community
> wants to try new things out, given that the resources needed won't be
> unmanageable, we should encourage them to do so.
> 
Posted on the Fedora Advisory Board list

I think that this is very, very true and something that we need to keep in mind as go about defining what Fedora is. Thanks Dimitris, for phrasing that so succinctly!

24 Jun 2009 (updated 24 Jun 2009 at 17:31 UTC) »
FISL in the Morning

The Fedora booth has been well populated by Fedora Ambassadors from all around Latin America from Brazil to Mexico. For someone from the insular world of the United States, it's awe-inspiring to watch the ambassadors in action. Even though some speak Spanish and others Portuguese, they cheerfully work out their differences in language and laughingly toss jokes at one another. A line of potential Fedora users stretches out from the booth, entertained to watch interviews of Fedora ambassadors and developers as they wait to sign up for a FAS account and get the Brazilian Fedora 11 spin.

The conference is huge. And very oriented on free software. I attended LinuxWorld in San Francisco once and the crowd was roughly this size. The type of attendee is very different, though. Where LinuxWorld seemed to have an abundance of businessmen looking to buy or sell a solution to someone else, FISL seems populated mostly with enthusiasts eager to meet up with fellow contributors to the projects they are involved in. A very nice crowd to watch and try to interact with despite my limited Portuguese.

73 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!