It's been.. about 4 years now since my last post. I have
since graduated school, taken a few jobs, and now I'm
freelance developer. I have worked with a handfull of
technologies and learned a lot about the real world of
software development. In the past few years I have worked
on great projects- so many I can't list them all. The most
rewarding ones were the ones that seemed so impossible from
before I start working on them. Here are some of my 'speed
coding' projects that I completed under the gun that worked
flawlessly on first implementation- and are still in
production today without change (well, last I heard):
Grid App: I was tasked by the CEO of a former
company at about 7pm to create a windowing application that
manages a set of fields and places them into an area for
the web designers to be able to visualize theme easier
(they worked exclusively with CSV so the application would
write direct to CSV to save the area dimensions). I had
not worked with MFC in over a year at that point and had
never really worked with many of the controls to the point
where I was comfortable. I designed and built the software
exactly to the specifications and was completed by the next
morning flawlessly. Unfortunately the end of that story
was not so fun- the person who tasked me the project was
not satisfied (with his own design- because I would have
made it differently if I were instructed) and the project
was canned (that same day- go figure).
Siteminder Proxy:
I created an ISAPI extension that acted as a proxy server
for Siteminder authentication- again in record time. From
concept to implementation it took about a day and a half.
I had heard of and used siteminder before but there are no
specs out there for doing what I did- so it was sort of all
guess work to get it to work properly. I can't take
complete credit for this one because there was some
troubleshooting/debugging I had help with.
Mini Language Interpreter:
Another project I felt particularly proud of was the
implementation of a very silly scripting language used to
calculate dimensions of areas in a grid- but in a clever
way by creating relations between the other area
dimensions. It was not my design, but I made it work- and
had implemented it as a drop in replacement DLL for another
one that had been in
development for weeks. My version was completed in about 2
days- and worked flawlessly where as the original had many
bugs and did not correctly implement the logic.
Huge improvements to DirectShow transform filter:
This one wasn't speed coding- although it only took a day,
it was extremely rewarding to see huge performance gains by
doing optimization. The company I was working for had a
simple transcoding software built on DirectShow that was
grossly shoehorned into their custom programming
environment. I took a look around and saw that there was a
transform filter that was performing a very simple overlay
on frames of video. I decided to take a stab at making it
run fast- using intrinsics for simd extension calls,
reducing the complexity of the operation that was used, and
trimming the area that actually required calculation to the
occluded area there were tremendous speed improvements.
The first improvement was to measure the affect of
rewriting teh application so that it uses optimized
structure of the function calls and used faster operators
(ex: shift) instead of slow ones (ex: multiply). After
this first improvement we saw about 30-40% speed increase!
Using pixel packing (pushing all RGBA bytes into register
to do single op on all instead of 4 separate ops) using
SIMD extensions using intrinsics (intrinsics are C++
functions that do SIMD operations) an additional 70%
increase! I then stopped- but after running tests of
writing assembly instead of intrinsics- I noticed I
probably would have saw another 30% speed increase! The
last operation of segmenting the video to only use the
filter on the affected areas I did not implement. This
part was done by another individual so I am not sure if it
was done properly. We did see a small improvement- for an
image in the scene that was 1/4 the size of the scene we
saw about 20% speed increase. I noticed there are other
transcoders that already do this operation much faster but
I was very happy with my results from spending some time
learning about intrinsics and applying that to a real world
problem- speeding up the operations by a huge factor. So
the final numbers: Original- 60 seconds, First enhancement-
42 seconds, Second enhancement- 12.6 seconds, Third
enhancement- 10.08 seconds. If I had done the assembly
version I would have seen 7.06 seconds.