The recent discussions of methods & methodologies for OSS projects prompted me to dust this off...
Its something I was putting together for work, but I haven't quite 'polished' it yet. Any feedback would be appreciated.
The recent discussions of methods & methodologies for OSS projects prompted me to dust this off...
Its something I was putting together for work, but I haven't quite 'polished' it yet. Any feedback would be appreciated.
I've been thinking about project management methods & coding standards for use with Open Source Projects. Most
programmers would agree that some form of management, and some standards are necessary. Its the which ones part
we have problems with. Traditional 'best practices' need a little tweaking to fit with OSS development. So here's my attempt - I would love
feedback & suggestions....
The main problems with traditional methods/methodologies stem from the fact that they were developed to support
scarce resources. A result of this scarcity was that these resources were also expensive. Not only were programmer
time, & machine time expensive, distribution was as well. The cost of distribution was not just limited to physically moving the
code from one place to another, but often incurred additional expenses such as technicians to install the code. This made frequent
releases of development code impractical. A side effect of this was that 'end user' input was often unavailable until the project was nearly
completed. As software was so expensive to develop, it was paramount that it be maintainable otherwise there was no hope of
recouping the cost of development. These conditions resulted in strict methods that required in-depth requirements docs up front. Users
were forced to determine all their needs in advance, with little feedback as development progressed. Detailed specs & designs were
produced from the requirements, along with detailed schedules. The entire process was documented heavily, with numerous 'sign-offs'
meant to ensure everyone agreed on what needed to be done and how it was to be accomplished. Once implementation was begun,
formal coding standards were used to increase code readability, maintainability and shorten the learning curve for new developers. While
the methods could be 'heavy' in terms of startup cost, the anticipation was they would reduce the overall project cost & improve the
quality of the resultant product.
A few of the benefits of these traditional methods are:
Predictable & Repeatable (at least theoretically :-) )
Project spec. established 'scope' & got everyone on the same page.
Master schedule projected resource requirements & delivery date
Coding Standards aided maintainability
Extensive documentation shortened learning curve for developers & users.
As well as liabilities:
Long development cycles with limited customer input
Doesn't scale well
OSS projects function in a different atmosphere. Resources have become 'cheap' or even free. Programmers are no longer
an expensive commodity; they're a freely available asset. Machines have become fast, cheap & plentiful. Languages and toolkits have
evolved, allowing programmers to produce programs much faster then ever before. Last but not least, the Internet has provides a fast,
way to tie all the pieces together. It is now possible for people scattered across the globe to collborate almost as easily as people in the
same room. Additionally, the low cost & ease of distribution the 'net offers (along with standardized hardware & packaging systems)
allows 'customers' to watch a project develop, and actively participate in its development. These changes have shattered the old,
monolithic development model and turned program development into a much faster, more dynamic process.
To take full advantage of this 'distributed development' model, 3 key areas must be addressed:
Communication is the key to any group activity, and is especially important with groups as geographically distributed, as
OSS projects tend to be. While the project web site & mailing lists are generally the main methods of communication, don't overlook
other tools. IRC channels, Newsletters, or Wiki's can all be used to increase communication between team members. It is also
important to realize that 'end-users' are part of the development team. They function as QA testers, first line support, even
documentation writers all rolled into one. The key is to have accurate information, easily available to all team members. Documentation
must be easy to find, easy to understand, and up to date. This allows new developers and users to 'come up to speed' rapidly. Note that
the larger a project (in terms of number of participants or code size), the more important documentation becomes!
At the minimum, the web site should contain:
A clear, succinct definition of the project goals
Project plan & implementation Schedule
A listing of each module, along with status & 'owner' information
Online bug tracking
Information on how to obtain the latest code from CM
Regularly produced 'beta' snapshots for more casual users
Project mailing lists & archives
Any additional resources (related projects, Wiki's, IRC channels, etc)
Optional: Who we are, Sponsors, Case studies, press clippings, etc.
Note: The project website needs to be updated often - nothing makes a project appear 'dead' more than a web site that hasn't been updated in 6 months...
Project management tends to be a religious issue, so I'll limit my comments here to: Use It!!
It doesn't matter what method you use, as long as the overhead is low (i.e. It shouldn't require more then 5 to 10 minutes per week from participants). A simplified version of Earned Value reporting has worked well for me in the past, however many examples of PM techniques are available on the web. Two of the most common questions team members are asked are "When will feature FOO be ready?" & "How can I help" - good PM can help provide realistic answers. The 'PM' should track progress and work with the project lead & module owners to resolve any issues (i.e. Locate resources, balance work load etc.) Current status by module should be posted back to the project web site.
A modular, detailed design and a good 'PM' can be crucial to alleviating two of the most common problems I've
in OSS projects: relatively high numbers of design changes, and turnover. Due to the increased 'customer' involvement in OSS projects,
design changes and enhancements are a common occurrence. Team members come and go, depending upon their interest level and
amount of free time. Having project management in place allows the team to judge how these changes will affect the overall project, both
in terms of resources required & implementation timeline.
Good programming practices start with the initial design. Make sure everyone has a clear understanding of the project
goals. Determine a written set of requirements for the project. In short, make sure everyone knows what the team is trying to
acomplish. Once this is done, initial design work can begin. The design will consume a significant amount of time, and its important
to resist the urge to start coding immediately. A good design is the difference between a solid, general purpose tool, and a 'one-off' that
only works during certain phases of the moon. For OSS projects a modular design with 'staged' or 'incremental' rollout is usually the
bet. A modular design is particularly applicable to OSS, as it allows project to 'evolve' while maximizing code re-use. This can be a
simple 'top-down' program design, or a more extreme 'plug-in' architecture that allow virtually complete customization at run time using
re-usable components (for instance, GIMP). 'Staged rollout' simply means releasing a program in
'stages' with successively more features implemented. This allows users to test, and become familiar with a program before its fully
implemented. In turn, feedback can be given much earlier in the project life cycle. This differs slightly from the traditional 'release early,
release often', as stages are slightly more formal. Stages are presumed not to break existing features, nor to damage data - in short
stages are usable, 'mini-programs' or 'lite' versions of the finished project. These techniques can be used together and are
Stages roughly correspond to a #.X release, while 'bugfix' releases correspond to a #.#.Y release.
This brings up an important point: You must design for reuse, it doesn't just happen !!
In order for code to be reused, it must satisfy a number of criteria. It must be easily understood. It must provide a specific well, documented function. It must be easily relocated (ie. code that relies upon numerous global vars. or external 'state' is a P.I.T.A. to reuse.). In short, modules must be small, well documented, and free from 'side-effects' - a 'blackbox' that can be picked up and moved wherever its needed. Physically writing out 'psuedo code' for the design can be a great aid in determining potential candidates for a module. Refactor Often.
At this stage, it may be helpful to code up 'testbeds' to evaluate various designs, data structures, etc. Resist the urge to 'hack' the
testbed into a finished project, without a final design !! Though it may seem time consuming, a good design will more then pay for
itself with the time saved later and the stability of the finished project.
Once the design is ready, it should be compared to the requirements to make sure it meets the project's goals. Next, the
'critical path' should be determined. The critical path is the smallest set of modules that must be implemented in order to produce a
'usable' system. The critical path modules should be implemented first, with the other modules 'stubbed out'. This method not only
supports 'staged rollout' but allows features the end-users need to be implemented before lower priority features. This is also the
time to start planning test cases, using whatever methods the team is comfortable with.
As an example of 'staged rollout' , a project to produce a word processor might rollout the following :
stage 1: full top-level gui interface, simple 'edit box' functions, and load/save functions.
Most of the gui would do nothing - it would be tied to 'stubbed out' functions. However, it allows users to 'test drive' the gui, checking
the 'feel'. It also provides an opportunity for users to provide suggestions for enhancements, clarify requirements, & report bugs.
stage 2: 'Print' menu stubbed out, along with basic 'print' functionality.
Advanced features, such as paper orientation,etc. may not be implemented yet.
Successive stages would activate more of the gui - search & replace, spell checking, etc.
The finished design & critical path must be fully documented, along with test cases. This information, along with a list of modules,
projected completion dates, and module 'owners' (with contact info.) should be posted on the website. This information will not only guide
existing team members, but will allow new developers to join the team without having to read & understand all of the source code.
Make sure everyone is designing the same program! USE the requirements.
Avoid side effects in modules wherever possible !!
This can be the single greatest aid to reuse & maintainability!
(In short, a module should only rely on its parms. However, more info. is available on the web.)
As a rule of thumb, a single module should not be more then 10-15 hours work. (If it is, break it into sub-modules)
Test the Failure Modes!!
All programs get bad input at some point, or machines suffer hardware failures.
Make sure testing covers the 'worst case' scenarios.
Design Changes Happen
Live with it, plan for it, use the requirements & project goal to make sure changes make sense.
Programming standards are a must! This has long been recognized, notably by the venerable FSF & the new kid on the block,
As with PM, the important thing with programming standards is to use them. The initial project team can pick whatever works best for
them, but once decided upon the conventions need to be followed! The same can be said for commenting & documentation systems.
Find a set that is comfortable, and stick with them ! (javadoc, Doxygen, etc)
Use of a source control system such as cvs, aegis, perforce is a must. Even better is use of a build management system.
Simple automated builds can help ensure a tree remains consistent, but do little else. A much better solution is regression tests
automated into the build cycle. Regression tests check that new code hasn't broken old functionality, and can be collected by designing
test case each time a function is completed or a bug fixed. This results in a surprisingly thorough set of tests in a very short time (ie -
whole classes of bugs, such as handling of invalid parameters get caught very early in the development cycle). A 'test down' method
works well, and simply means checking return & error conditions any modules your module depends on. This can be quite handy -
especially when someone changes the returns of one of your dependencies & doesn't tell you. Don't forget to test the failure modes of
your module as well! How it performs under error conditions is a critical piece of system stability.
( Out of time - gotta go...below is a list of tools that may be of some use...)
The Mozilla project has a very nice setup of build management tools.
Aegis features configuration mgmt. with built in Regression testing.
Just starting to play with this one.....However, its been in use for years, and is supposed to be very stable.
Perforce is free for use in OSS Projects & lacks cvs's more annoying habits.
Perforce also has a collection of OSS tools, such as web browsing the repository. Look here
Forgot to add: If the code is hard to write, then your design is wrong.
With the 'right' design & (more importantly) 'right' data structures, the code should come together fairly easily. If your having problems - step back and re-think your design/data structures !!
And, yes there are exceptions to this...but the exceptions just prove the rule :-)
You've written an important article. I think that open source software in general can benefit from better practices. I'm aiming to do that with the LinuxQuality project at http:// linuxquality.sunsite.dk/.
There's not a lot there yet but see the articles section - contributions are appreciated, and I'll put something up there that links back to your article here.
developing on the internet doesn't make collaboration with remote developers as easy as when they're in the same room, and it certainly doesn't make good programmers a plentiful resource. The resource which is plentiful online is lurkers, people who "join a project", lending it an artificial sense of near-success when none of them have the necessary experience with the problem domain, motivation to attack it in earnest, CS knowledge to work out the solution, or coding skill to implement the solution once it's found.
The practises you describe, performed in isolation, lead to lots of things called "projects", each of which has an overdocumented heap of "design", and few if any actual programs. cf. the 17,000 projects on sourceforge. I wish they were all real programs, but they're not.
programs are the central issue. programs that are well written and solve an important problem. in a direct, simple, immediate, correct, efficient way. these are by no means simple to come by, there is no formula for finding them, and they require a very delicate balance of available time, skill, inclination and knowledge in the individual (or small group of individuals) who write them. the incrimental "let's all join hands and sing" development model works wonders for cleaning up, maintaining, and extending such a program once it's written, but it does not get it written in the first place.
I'm simply trying to point out that while lack of procedure can slow a program's development, abundance of procedure cannot make a program. Your article is about "best practises", but it doesn't even mention the practise of trying to become a better programmer: read code, do exercizes, learn from others' code, learn other languages, study other systems, read books about solving problems, about methodology and modelling and factoring, find out what formal models of your problem already exist, learn math, learn complexity theory, examine existing libraries, etc.
I raise sourceforge as an example of this procedure imbalance. Anyone can start a project (good) and the project has massive free infrastructure and management tools (good). Nonetheless, most projects there are little more than vague ideas. If someone is trying to write "my first program 1.0", they need a lot of help learning to program. Doesn't matter if there's a bug tracking system. Doesn't matter if there's a Staged Rollout Plan or a Design Discussion Wiki. The overwhelming need is for skill development, for self-improvement.
Free software is in a unique position to help with this. For a couple dollars, I can have several hundred megs of source on a CD. A couple dollars more and I have a library card. These rich historical resources, plus a few compilers or interpreters on which to exercize, are the "best practise" for most of us. This is what will move us most surely towards producing good programs which solve problems.
I guess the initially hostile tone of my response stemmed from my perception that your article was suggesting that the only thing a programmer needs to be effective is a good management infrastructure. Such a position is unfortunately quite common amongst the snake-oil salespeople crowding the software methodology world. "all you need is to use this here handy organization chart", etc. I don't buy that, but it seems you don't either. Sorry if I misinterpreted what you were saying.
For the latter question, of why "appreciation" of programs is declining and what might be done about it, I'd suggest the culprit is computation being replaced by communication. That is to say, the use of computers primarily as byte stream conduits (web browsers, cell phones, mail servers, etc) has opened up a huge amount of gainful employment in the development of code for hauling data. Humans love to communicate, even if it's totally trivial data they're communicating, and you can make a living off writing code to facilitate it. Unfortunately such code is as inspiring to a programmer's creative side as a garden hose is to a civil engineer. Not at all. Correctness is only mildly important, most algorithms can be reduced to linear time, the system's complexity stems from requested "bell-and-whistle features" rather than anything inherent in the problem.
There remains a highly inspiring (to me, anyway) field of programming in which correctness is still critical, every ounce of speed is necessary, system complexity is inherent in the problem, theoretical concerns are just as real as practical concerns, everything from machine-level instruction timing through to high level algorithm design is open for exploration, modelling the system taxes your creativity, and engineering tradeoffs are everywhere: scientific applications. Bioinformatics, quantum mechanics, molecular modelling, geospatial imaging, fluid dynamics, this sort of stuff seems to have arbitrarily difficult problems waiting for exploration.
Perhaps if the margins in commodity communication infrastructure drop low enough and scientific computation regains a dominant role in the field (say, biotech or quantum), we'll see more elegant work. For the time being, I'm afraid there's still a fair amount of money in the "you've got mail" business :)
I think that some of the ideas in XP fit in very well with open source software engineering. Particularly the parts about test drivers - I am also a big fan of the related Design by Contract stuff by Bertrand Meyer. If you make it easy for someone else to test that their change did not break the system - it is easier for others to contribute and to have a useful community.
OnsiteCustomer might not be that big of an issue though, if you think of "scratch your own itch". :-)
1) Give more emphasis on development tools followed by communication tools. Project management tools are OK for cathedral type of organizations and tools for these kind should be emphasized as well.
2) Provide highlights on how these tools can be integrated together (like a puzzle) in a given environment, say, distributed or single location.
Overall, a very good article.
The talkbacks are excellent too. I wish I have something to say, but...
mechanix said this article was intended to help competent programmers get used to open source. In that case, here's a couple of links: Alan has a talk called "Dear Mr Brooks" which is available in ogg and mp3 format from ftp.linux.org.uk and which is about differences and similarities between "traditional" and free software development methods. And Taj gave a talk about KDE development and history which has a pile of useful "learn from our experiences" mentions in it. He promised to write it up himself properly, and I look forward to it.
2 points: first, about communications software, and second, about the "best practices" being discussed here.
graydon - While I agree that the rabid, crazed commercialization of the Internet has created some pretty crappy software, I do also think that your criticism of communication-oriented programming is misguided. Quality is important in every field. For example, a robust, secure, distributed messaging system would be helping people to communicate "totally trivial" data, but it involves lots of interesting problem areas -- enough that I believe it remains an unsolved problem -- and a chat system is arguably the simplest possible application of communications programming. Even in software where computational problems are not significant at all, the user's experience of the program can be tightly related to the specifics of its implementation, even at the low levels. (Ask anyone who's used a Java applet in Netscape 4.0 for evidence of this.)
mechanix - About "best practices"...
While "design" gets a lot of lip service these days, there seems to be very little consensus as to what constitutes good and bad design. Many of the most useful designs I've seen have arisen from informal discussions between several experienced programmers in the same application domain.
The key word there is experienced -- I would argue that many of these practices are actually a bad idea not only for fledgling programmers, but for any programmer who does not have intimate knowledge of the problem domain in which they will be working. By "has intimate knowledge", I mean "has implemented at least one system that does something similar already". Very good, very experienced programmers who have never worked with, for example, TCP/IP before often have amazing misconceptions about writing robust networked applications. Design discussions/documents in the absence of such experience are worse than pointless -- they give developers a false confidence in the design they come up with, and increase the perceived cost of throwing away bad designs and implementations.
The second point, of course, is that the first iteration of the system will probably need to be thrown away, or at least bits and pieces of it will. This is necessary, and a good development methodology will expedite getting to this point.
In the rare case where you have a team of open-source developers already assembled, at least half of whom have implemented a similar system in the past, the suggested ideas are mostly good ones, except for the section on "re-use". Re-use is a very bad idea. What one should aim for is modularity; each module in a finished program should be able to be used completely independantly of the others without being changed. By the time a module is fit to be used externally, any other project which uses it should count it as an external dependency, referring to its documented interface, and not incorporate the code wholesale or change it. If the second project exposes flaws in the design of a module in the first, that module should be kept up to date with changing requirements, or rewritten in such a way that it satisfies the requirements of both, not just forked into to similar-but-different submodules of each project.
Moderation in all things can help, too: "evolving requirements" can mean the same thing as "feature creep". "Hacking the testbed" can mean the same thing as "iterative prototyping". Use careful judgement whenever you hit something that seems suspicious, and don't just follow your standard blindly. The most useful standard is to be flexible in your standardization; the "science" of software development methodologies is a very, very new one, and if you've got an idea that you think will work for your project, try it out and share your results!
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!