17 Mar 2008 ClimbNorth   » (Apprentice)

It's been.. about 4 years now since my last post. I have since graduated school, taken a few jobs, and now I'm freelance developer. I have worked with a handfull of technologies and learned a lot about the real world of software development. In the past few years I have worked on great projects- so many I can't list them all. The most rewarding ones were the ones that seemed so impossible from before I start working on them. Here are some of my 'speed coding' projects that I completed under the gun that worked flawlessly on first implementation- and are still in production today without change (well, last I heard):
Grid App: I was tasked by the CEO of a former company at about 7pm to create a windowing application that manages a set of fields and places them into an area for the web designers to be able to visualize theme easier (they worked exclusively with CSV so the application would write direct to CSV to save the area dimensions). I had not worked with MFC in over a year at that point and had never really worked with many of the controls to the point where I was comfortable. I designed and built the software exactly to the specifications and was completed by the next morning flawlessly. Unfortunately the end of that story was not so fun- the person who tasked me the project was not satisfied (with his own design- because I would have made it differently if I were instructed) and the project was canned (that same day- go figure).
Siteminder Proxy: I created an ISAPI extension that acted as a proxy server for Siteminder authentication- again in record time. From concept to implementation it took about a day and a half. I had heard of and used siteminder before but there are no specs out there for doing what I did- so it was sort of all guess work to get it to work properly. I can't take complete credit for this one because there was some troubleshooting/debugging I had help with.
Mini Language Interpreter: Another project I felt particularly proud of was the implementation of a very silly scripting language used to calculate dimensions of areas in a grid- but in a clever way by creating relations between the other area dimensions. It was not my design, but I made it work- and had implemented it as a drop in replacement DLL for another one that had been in development for weeks. My version was completed in about 2 days- and worked flawlessly where as the original had many bugs and did not correctly implement the logic.
Huge improvements to DirectShow transform filter: This one wasn't speed coding- although it only took a day, it was extremely rewarding to see huge performance gains by doing optimization. The company I was working for had a simple transcoding software built on DirectShow that was grossly shoehorned into their custom programming environment. I took a look around and saw that there was a transform filter that was performing a very simple overlay on frames of video. I decided to take a stab at making it run fast- using intrinsics for simd extension calls, reducing the complexity of the operation that was used, and trimming the area that actually required calculation to the occluded area there were tremendous speed improvements. The first improvement was to measure the affect of rewriting teh application so that it uses optimized structure of the function calls and used faster operators (ex: shift) instead of slow ones (ex: multiply). After this first improvement we saw about 30-40% speed increase! Using pixel packing (pushing all RGBA bytes into register to do single op on all instead of 4 separate ops) using SIMD extensions using intrinsics (intrinsics are C++ functions that do SIMD operations) an additional 70% increase! I then stopped- but after running tests of writing assembly instead of intrinsics- I noticed I probably would have saw another 30% speed increase! The last operation of segmenting the video to only use the filter on the affected areas I did not implement. This part was done by another individual so I am not sure if it was done properly. We did see a small improvement- for an image in the scene that was 1/4 the size of the scene we saw about 20% speed increase. I noticed there are other transcoders that already do this operation much faster but I was very happy with my results from spending some time learning about intrinsics and applying that to a real world problem- speeding up the operations by a huge factor. So the final numbers: Original- 60 seconds, First enhancement- 42 seconds, Second enhancement- 12.6 seconds, Third enhancement- 10.08 seconds. If I had done the assembly version I would have seen 7.06 seconds.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!