This is a collection of techniques that I've found useful for working on small coding projects at high speed. I'm interested in other people's suggestions as well.
This is a collection of techniques that I've found useful for working on small coding projects at high speed. I'm interested in other people's suggestions as well.
I should note that I'm being deliberately contrary here - that is, my motivation in writing this is to point out some things that may be in direct opposition to the conventional wisdom. But I'm not being flippant, I have seriously used all of the techniques described.
When I say "small project" I mean one where you can keep the whole detailed design in your head, where you can "design as you go" rather than doing a detailed design up front. More importantly, you can keep all of the requirements specification of the project in your head.
Needless to say, most of these techniques would be a disaster for a large project. (In fact, for large projects I would recommend that you do the exact opposite of the techniques I recommend here.) Because the design of large projects is so much richer and more complicated, it's not possible to visualize the whole design in a single brain except at the most abstract, non-detailed level. Gathering requirements is often one of the most lengthy phases of a large project, often taking many months. Without a detailed specification and design, you will end up writing a whole mess of code and discover that everything you've written is wrong, or have to suffer through continually redesigning your project as new requirements are discovered.
However, with a small project, especially in an area where you have a lot of experience, you can imagine the design at a much more detailed level, without having to write it all down. Of course, there are going to be some portions of code that will have to be re-written, but with a small project (and especially with rapid coding and prototyping techniques), code churn is not a big deal - in fact, a certain amount of churn is a good sign, it means your design is evolving and improving.
The overall approach that one can take with a small project is that of a sketchpad or blackboard - one is constantly erasing, re-writing, sketching out details and then later filling them in, constantly changing the architecture.
Of course, if you can do a detailed design, then there's no reason why you shouldn't. A formal design phase can be useful, even for a small project. Typically the kinds of "designless" techniques I talk about are useful when you are exploring at the edges of your experience, where you have only a vague, indistinct idea of what a good design consists of. They can also be useful when you've written the same app so many times that you've essentially memorized the design. (For example, I've written UI toolkits for games six times, each one an improvement on the last, so at this point I can almost type one out from memory.)
I often find that when I'm writing some new tool, I am unable to tell whether my approach is good or bad until I actually see the code. That is, I have enough experience to look at the code and say "yes, this is an elegant solution", but I can't just write down the solution on paper without trying to actually build it. (The act of writing it down is the same as coding it, or at least equivalent in effort.) Or I will in some cases create a detailed design, only to discover as I'm writing the code that there are a whole set of issues that I forgot to take into account.
1. Keep things in one source file as long as possible.
There's a tendancy to want to break up projects into multiple source files, with one "module" per source file. This is especially true in object-oriented programming, where typically a class consists of one source file plus one header file.
However, I often find that when I first create a class, I don't often get the name of the class right. I often find myself re-factoring in mid-project, which is a pain because I not only have to re-name all of the source files I have created, but I also have to adjust the Makefiles/Project files/etc.
Instead, I tend to develop classes or modules in a single "scratchpad" file, and only push them out to individual source files once they become relatively stable. (Unfortunately, this is not an option in Java.) If your scratchpad file gets too big, then pick some portion of the code that you haven't touched in a while, and move it to its own file.
Also, I find I can navigate around a single source file much faster than having multiple files open in the editor. In particular, if I change my mind as to the naming of a variable, I can do a single-file search and replace much more easily than having to do a multiple-file search. Similarly, if I decide to undo a change, I don't have to undo a bunch of changes in seperate files.
2. Make data members public instead of creating accessor functions.
Horrors! Whatever happened to encapsulation?
Part of the reason for creating accessor functions is that it allows you to transparently change the implementation without having to change all of the code that uses that variable. However, it takes time to set up those accessor functions - and in a small project, you probably don't need them. Why? Well, remember the distinguishing characteristic of a small project is that you can keep the whole thing in your head. So you probably have a pretty good idea of how that variable is going to be used, and if you guess wrong, it's not that hard to revise the code that uses it. It's not like you're going to be searching through a million lines of code changing "float" to "int".
3. Cut and Paste first, then re-factor later.
It's considered good practice to isolate common code in a subroutine or class. However, when writing code fast, it's often difficult to tell where the boundaries of the subroutine should be.
I tend to use cut-and-paste programming rather than subroutines at first. Each time I paste, I go ahead and modify the pasted code, changing as needed to fit into the new context. After the code has stabilized, I go through the code and collect all of the pasted copies. I look at how they have mutated from one another (this tells me what arguments need to be passed to the subroutine), and what parts of the pasted code have been edited away (this tells me where the subroutine should begin and end.)
Sometimes it can be tricky trying to name the subroutine, since the "definition" of the subroutine is based on implementation details rather than design semantics. In some cases, I will move code in or out of the subroutine so as to make the "meaning" of the subroutine more comprehensible. Usually a good balance can be struck, if not then I hold off on creating the subroutine.
This process of mechanically collecting the mutated code into subroutines is one that I find rather relaxing - it doesn't require much concentration, and gives me a sense of progress. Sort of like untangling yarn, or knitting chainmail :)
4. Don't make everything an object, at least not at first.
This is really a variation on (3). Basically, I tend to use just a few core classes (representing the data model) at first, and do everything else with functional code. After the code has begun to stabilize, I can see where the natural divisions between classes lie, and I will at that point define new classes to replace the functional code. I'll look and see what actions are being performed on a data element, and tranform them into methods with meaningful names.
The reason for this is that I have found that while classes can have a greater potential for re-use and data isolation, functional code tends to be more "plastic", more capable of being textually manipulated without the pain of reorganization. Of course, in an object-orientated language, my "functions" are typically object methods written in a functional style.
5. Use the class hierarchy that comes with the underlying libraries.
If you are using a user-interface tool kit, chances are that it will supply a class hierarchy. This class hierarchy is more than just an arrangement of classes, it's a conceptual sorting system that can give you a clue as to where all of your own ideas should go. My advice is to drink the kool-aid, and organize your project around the class hierarchy that's supplied with the toolkit, rather than inventing your own "philisophical wrapper" around it. In particular, if you want to code things fast, it will help if you can utilize whatever event/listener model that the UI toolkit has. (If you are not doing a UI, then there may be some equivalent source of structure, such as the networking socket classes or something.)
6. Don't be afraid to rewrite.
Reading this article, you might think that some of these techniques would actually slow you down rather than speed you up. Several times I talk about re-writing, re-factoring, etc. However, in a small project none of these activities take very much time. Most of the rote mechanical activities such as collecting code into a subroutine, or changing the type of a variable, can be done very quickly; The slow part is deciding what new code to write.
Re-writing allows you to test and throw away a whole bunch of different designs quickly. Rather than having to spend the time to calculate what the consequences of a particular design decision would be, you can just see it on the screen. Naturally, in a large project this would not work, since a single design decision can result in a large pile of code. But in a small project, no design feature is so large that it can't easily be conveniently re-structured.
In a sense this is taking Brooks' law to a finer level of granularity: "Prepare to throw each subroutine/class away, you will anyway."
7. Use a late-binding, interpreted language.
I've observed that the more "late-binding" a language is, the faster I can code in it. That is, I can code faster in Java than in C++; I can code faster in Python than either of them. Using a late-binding language means that you can defer a lot of your decisions to runtime; You can modify classes and objects during execution, rather than having to create everything up front. In addition, using an interpreted language means that your edit/debug cycles will be as rapid as possible.
Of course, there is a cost for this in execution speed, but that's seldom a problem for small projects. (In fact, the need for high performance often turns a small project into a large one.)
8. Look at other people's designs.
The best way to get an intuitive sense of how to structure your code is by reading other people's code. Find some other project that's similar to yours, and start reading. You are free to choose to do it some other way, but in any case you'll be influenced by what you've read.
9. Look at your own designs.
The most important person you should steal code from is, of course, yourself. Rip out that main loop from the graphing program and paste it into the paint program, and start hacking! Grab that point-in-poly routine from one project and slam it in. If you find yourself using it in a bunch of projects, turn it into a library - but don't waste time doing that if it's one of those algorithms that is sensitive to the underlying data types. And don't try making it into a library until you've used it a few of times, otherwise you'll be freezing the interface too early.
In general, I have found that "pattern" and "idea" reuse are much more important than re-use of actual code.
I'm sure that most people here have written some code using exactly the methods you've mentioned. ;)
I've found some of them, especially numbers 1 and 3, very useful for getting _something_ to the point where you can tell if it's worthwhile. For small projects, you'll spend more time worrying about where something should go than in actually writing it. As you said, making major changes to the design is easier to carry out if you can get at every bit of code.
Of course, those sorts of changes really shouldn't be going on past the second or third iteration of your design. You've got to do the refactoring and rewriting before things get too far. And you need to do them before anyone else ever has to edit/debug/read your code. But then, for 'small' projects, there probably won't be more iterations than that anyway.
Could any of these things be automated? Like maybe splitting classes out into separate files? If you were consistent in your coding style, I think you could further reduce the time taken by the refactoring.
Skyscrapers contain support beams, but they're still mostly made out of drywall. In my experience the skills applicable to small projects work well on large ones, although they have to be supplemented a bit.
Throwing out your skills on large projects is counterproductive. Particularly bad are such goofy techniques as making everything be an object - Seriously, it's nice to have a lock on my front door, but it would be stupid for me to put one between my living room and bedroom.
This sounds very like what we've done in Mojo Nation, including using Python. :-)
Mojo Nation isn't small...
HACK imp:~/evil$ cat /tmp/allsrc.txt | wc -l 43020 HACK imp:~/evil$ sort -u /tmp/allsrc.txt | wc -l 24251 HACK imp:~/evil$ sort -u /tmp/allsrc.txt | grep -E "(if|while|def|class|apply|map|for|case)" | wc -l 8104
These are the stats for the Python and C/C++ code that we've written, plus there are 1600 unique lines of DTML for the user interface.
By the way, the suggestions you make here have much in common with "Extreme Programming", which is a much better and more sensible method than its "hype compliant" name would suggest.
Wow. This is exactly how i do most of my coding.
This is an especially good technique for exploring a problem, where you don't really know what the best solution is. If you keep the code small, you can try out lots of different approaches and maybe discover some short, elegant solution that wasn't obvious in your initial design.
A couple of additions to your list:
This is a corrolary to 6. If there's an alternative way to code something which you think might work better, try it out. Having a single source file makes this easy, and it might let you factor your code down to something even smaller.
(C++ only) STL is your friend, templates are your friend
The C++ STL is one of the neatest libraries out there. Even if i eventually write my own optimized code for something, i often hack up a prototype using STL algorithms.
If i'm not sure what exact type i'm going to use with a class or function, i usually template it (even if i later take out the template, once the design has solidified).
Sorry for not liking those ideas too much...
Actually, I think all 11 "rules" can be summarized by a single one:
Rule 1 of 1: When writing small programs, be lazy and just bang it out.
So far, this is nothing new. You don't even need to tell people to do so. When in need for just a small program which probably is used for just a few days or something, everybody will opt to hack it down. And, no one will try to remember an 11-rule-set just to know how to do so.
The problem remains that some small programs surprisingly turn out to be useful, and start to grow. At this point, our unfortunate hacker would need to spend some thoughts on code design. And this usually is exactly the moment when it turns out that all the cut-and-pasted stuff needs to be changed, but is already slightly different in all it's incarnations. And that just this single little member variable needs to be implemented as a accessor/mutator function pair because it now needs side-effects. But unfortunately it's just called 'f', and after the search-and-replace all the cool funor() loops and ifun() statements stop working.
So let's rethink rule 1.
Rule 1 (V2.0): When writing small programs, be lazy and just bang it out, but prepare to quickly pass it to someone you don't like in case the program will continue to exist.
Actually, especially when writing small programs, and there exists the slightest chance that it will grow, you should IMHO take the time to bring up a code design. And, because the program is small, it's not hard to keep that design in your head. You can write it down nearly as fast as a quick hack, and it will help you a lot in later steps. Or, okay, hack it down, but then prepare to rewrite it from scratch a few weeks later.
I believe your summary is subtly incorrect. Allow me to suggest another one -
Rule 1: Get something working
Of course design is important, but most projects simply never ship. An 'imperfect' product which ships is better than a perfect one which doesn't exist.
I have the slight feeling that your rules are somewhat C++ oriented.
2. You're clearly thinking of a programming language in which data encapsulation is closely linked to the class structure. This rule is mostly irrelevant in languages in which the class structure doesn't impose encapsulation, and the latter is done using an ad hoc mechanism (modules).
4. This is my main gripe with the C++ style of object orientation: the tendency to make classes just to put code in (as opposed to having classes describe data structures).
7. This rule is quite obvious for anyone except C++ programmers. (Flame about the choice of Python snipped).
oh, my goodness... how could you possibly advocate such a terrible method... why, i never, uh... except... well... i think you just more or less described how the word hacking was coined. ;-)
I don't know about this. My last small, one-person hobby project now has about 80,000 lines of code and six developers.
HLL->High Level Language
Lisp comes to mind, but Python will do. So might Tcl. I have worked on medium-sized commercial app (70kloc) written in Tcl and C. Tcl is just plain ugly, but it did the trick.
That said, I recommend Lisp first, particularly if your project is a commercial one and you can pay for licenses to commercial Lisp compilers. The reason is that Lisp can be (is) much much faster than Python, Tcl, etc... The reason is not only that you can add type information to your program to optimize it (after you've got it working), but that with Lisp's macros you can leave a lot of static computation to compile-time all the while extending the language to meet your needs (e.g., keeping your code simple).
Consider how defstruct is really just a macro that creates new functions to build/access structures, but the structures are just fixed size arrays and symbolic field references are quick because symbols are interned, so that field lookups are quicker than they would be if symbols were just strings -- besides, when symbolic field reference is constant the whole field index lookup can be done at compile time.
Try doing that in Python, or Perl, or Tcl. You just can't, and that's why Lisp ic (can be) much faster than many other HLLs. You can start with Lisp code that is as dynamic as it would be in those languages, and later you can optimize it by adding type information or writing better macros, or replacing certain functions with macros.
And then, you can use write macros and functions that help you code your program. You can extend the language to meet your needs.
I'm glad to see that there are both agreements and disagreements with the article. I was hoping to spark a certain amount of controversy, by challenging the conventional wisdom of what is considered a "proper" development lifecycle.
Another way to think about these techniques is that they are basically "research" oriented. In research, you often don't know what you are aiming for (if you did, it wouldn't be research). The act of designing requires that you be able to predict the consequences of your actions - but in a research space, that's often not possible. You don't know what the consequences of your actions are. In production systems, we use "design patterns" and "rules of thumb" to help cut down on the combinatorial explosion of possible designs. But in a research space, you have to evolve your own design patterns as you go. Once you get a feel for those consequences (by actually implementing possible designs and trying them out) you can then begin to think about the overall design in a coherent way.
Research isn't a specialized activity that occurs in a lab full of scientists - almost every software project requires some research. As P.J. Plauger said, the only programs worth writing are the ones that you don't quite know how to write - yet.
One should not expect a research prototype to be able to play the role of a production system. However, because small projects are so much more malleable than large ones, it is sometimes possible to re- structure a small project from a research prototype into a production system without re-writing it from scratch (this is usually impossible in a large system). The techniques that I have tried to develop are intended to make that migration as painless as possible.
For example, a lot of people reacted to my rule #4 - "don't make everything an object, at least not at first". My own experience is that bad object decomposition is worse than no object decomposition. The whole strategy of object orientation is based on the notion that you have a clear idea as to how the problem breaks down. But when you are working in an unfamiliar space, that's not always possible - you need to get some practical experience in working with the space before you have an intuitive sense of what the conceptual partitions between objects are. In particular, it's easy to use objects to "model the problem" but there's a big difference between a model of a problem and the solution to that problem. At lot of novice OO designers seem to operate by identifying all of the nouns in the problem space and turning each one into a class. However, real solutions involve a lot of abstract machinery such as timers, queues and threads which have nothing to do with the problem space. And in many cases, the "obvious" nouns aren't the "fundamental" nouns that really need to be worked with. Thus, it's often difficult to come up with a really good decomposition strategy. I have also found that changing your decomposition strategy in mid-stream involves a huge amount of work, because the entire project structure reflects it. Thus, my method is to use object decomposition for those parts of the project where you have a clear idea as to how the data is partitioned, and then use functional code for the rest - until you understand what's going on.
IMO, small projects are defined by their code and size with its limited
time of, say, 2 to 4 weeks. Going beyond that number would be
considered not a small project anymore.
With regards to size, I think three to seven KLOC would be appropriate, depending on the type of language used. For high-level language, the number could even be smaller as it will only be dealing with small complexity.
Also, code size is dependent on code reuse and the relationship between them would probably be linear or proportional.
But the ambiguous term I used here was the word complexity. It's definitely hard to define how complex a small project will be until one has seen and finished the project. As complexity will have a very large influence on code size and schedule. It might be easier to define what they are before cutting any code and find any existing code that can be copied over or linked to.
Code configuration is another case since the complexity is already in- place. The only question there is how complexity is going to move forward as requirements continue to come in and change the internal structure of the application.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!