'Best Practices' for Open Source ?

Posted 21 Mar 2001 at 21:28 UTC by mechanix Share This

  The recent discussions of methods & methodologies for OSS projects prompted me to dust this off...
Its something I was putting together for work, but I haven't quite 'polished' it yet. Any feedback would be appreciated.


'Best Practices' for Open Source ?

  I've been thinking about project management methods & coding standards for use with Open Source Projects. Most programmers would agree that some form of management, and some standards are necessary. Its the which ones part we have problems with. Traditional 'best practices' need a little tweaking to fit with OSS development. So here's my attempt - I would love feedback & suggestions....

  The main problems with traditional methods/methodologies stem from the fact that they were developed to support scarce resources. A result of this scarcity was that these resources were also expensive. Not only were programmer time, & machine time expensive, distribution was as well. The cost of distribution was not just limited to physically moving the code from one place to another, but often incurred additional expenses such as technicians to install the code. This made frequent releases of development code impractical. A side effect of this was that 'end user' input was often unavailable until the project was nearly completed. As software was so expensive to develop, it was paramount that it be maintainable otherwise there was no hope of recouping the cost of development. These conditions resulted in strict methods that required in-depth requirements docs up front. Users were forced to determine all their needs in advance, with little feedback as development progressed. Detailed specs & designs were produced from the requirements, along with detailed schedules. The entire process was documented heavily, with numerous 'sign-offs' meant to ensure everyone agreed on what needed to be done and how it was to be accomplished. Once implementation was begun, formal coding standards were used to increase code readability, maintainability and shorten the learning curve for new developers. While the methods could be 'heavy' in terms of startup cost, the anticipation was they would reduce the overall project cost & improve the quality of the resultant product.
A few of the benefits of these traditional methods are:
   Predictable & Repeatable (at least theoretically :-) )
   Project spec. established 'scope' & got everyone on the same page.
   Master schedule projected resource requirements & delivery date
   Coding Standards aided maintainability
   Extensive documentation shortened learning curve for developers & users.
As well as liabilities:
   Long development cycles with limited customer input
   Doesn't scale well
   'Heavy' processes

  OSS projects function in a different atmosphere. Resources have become 'cheap' or even free. Programmers are no longer an expensive commodity; they're a freely available asset. Machines have become fast, cheap & plentiful. Languages and toolkits have evolved, allowing programmers to produce programs much faster then ever before. Last but not least, the Internet has provides a fast, easy way to tie all the pieces together. It is now possible for people scattered across the globe to collborate almost as easily as people in the same room. Additionally, the low cost & ease of distribution the 'net offers (along with standardized hardware & packaging systems) allows 'customers' to watch a project develop, and actively participate in its development. These changes have shattered the old, monolithic development model and turned program development into a much faster, more dynamic process.
To take full advantage of this 'distributed development' model, 3 key areas must be addressed:
   Communication
   Project Management
   Programming Practices


Communication

  Communication is the key to any group activity, and is especially important with groups as geographically distributed, as OSS projects tend to be. While the project web site & mailing lists are generally the main methods of communication, don't overlook other tools. IRC channels, Newsletters, or Wiki's can all be used to increase communication between team members. It is also important to realize that 'end-users' are part of the development team. They function as QA testers, first line support, even documentation writers all rolled into one. The key is to have accurate information, easily available to all team members. Documentation must be easy to find, easy to understand, and up to date. This allows new developers and users to 'come up to speed' rapidly. Note that the larger a project (in terms of number of participants or code size), the more important documentation becomes!
At the minimum, the web site should contain:
   A clear, succinct definition of the project goals
   Design documentation
   User documentation
   Project plan & implementation Schedule
   A listing of each module, along with status & 'owner' information
   Online bug tracking
   Project News
   Information on how to obtain the latest code from CM
   Regularly produced 'beta' snapshots for more casual users
   Project mailing lists & archives
   Any additional resources (related projects, Wiki's, IRC channels, etc)
   Optional: Who we are, Sponsors, Case studies, press clippings, etc.
Note: The project website needs to be updated often - nothing makes a project appear 'dead' more than a web site that hasn't been updated in 6 months...


Project Management

  Project management tends to be a religious issue, so I'll limit my comments here to: Use It!!
It doesn't matter what method you use, as long as the overhead is low (i.e. It shouldn't require more then 5 to 10 minutes per week from participants). A simplified version of Earned Value reporting has worked well for me in the past, however many examples of PM techniques are available on the web. Two of the most common questions team members are asked are "When will feature FOO be ready?" & "How can I help" - good PM can help provide realistic answers. The 'PM' should track progress and work with the project lead & module owners to resolve any issues (i.e. Locate resources, balance work load etc.) Current status by module should be posted back to the project web site.

  A modular, detailed design and a good 'PM' can be crucial to alleviating two of the most common problems I've encountered in OSS projects: relatively high numbers of design changes, and turnover. Due to the increased 'customer' involvement in OSS projects, design changes and enhancements are a common occurrence. Team members come and go, depending upon their interest level and amount of free time. Having project management in place allows the team to judge how these changes will affect the overall project, both in terms of resources required & implementation timeline.


Programming Practices

  Good programming practices start with the initial design. Make sure everyone has a clear understanding of the project goals. Determine a written set of requirements for the project. In short, make sure everyone knows what the team is trying to acomplish. Once this is done, initial design work can begin. The design will consume a significant amount of time, and its important to resist the urge to start coding immediately. A good design is the difference between a solid, general purpose tool, and a 'one-off' that only works during certain phases of the moon. For OSS projects a modular design with 'staged' or 'incremental' rollout is usually the best bet. A modular design is particularly applicable to OSS, as it allows project to 'evolve' while maximizing code re-use. This can be a simple 'top-down' program design, or a more extreme 'plug-in' architecture that allow virtually complete customization at run time using re-usable components (for instance, GIMP). 'Staged rollout' simply means releasing a program in 'stages' with successively more features implemented. This allows users to test, and become familiar with a program before its fully implemented. In turn, feedback can be given much earlier in the project life cycle. This differs slightly from the traditional 'release early, release often', as stages are slightly more formal. Stages are presumed not to break existing features, nor to damage data - in short stages are usable, 'mini-programs' or 'lite' versions of the finished project. These techniques can be used together and are complementary. Stages roughly correspond to a #.X release, while 'bugfix' releases correspond to a #.#.Y release.

  This brings up an important point: You must design for reuse, it doesn't just happen !!
In order for code to be reused, it must satisfy a number of criteria. It must be easily understood. It must provide a specific well, documented function. It must be easily relocated (ie. code that relies upon numerous global vars. or external 'state' is a P.I.T.A. to reuse.). In short, modules must be small, well documented, and free from 'side-effects' - a 'blackbox' that can be picked up and moved wherever its needed. Physically writing out 'psuedo code' for the design can be a great aid in determining potential candidates for a module. Refactor Often.

At this stage, it may be helpful to code up 'testbeds' to evaluate various designs, data structures, etc. Resist the urge to 'hack' the testbed into a finished project, without a final design !! Though it may seem time consuming, a good design will more then pay for itself with the time saved later and the stability of the finished project.

  Once the design is ready, it should be compared to the requirements to make sure it meets the project's goals. Next, the 'critical path' should be determined. The critical path is the smallest set of modules that must be implemented in order to produce a 'usable' system. The critical path modules should be implemented first, with the other modules 'stubbed out'. This method not only supports 'staged rollout' but allows features the end-users need to be implemented before lower priority features. This is also the time to start planning test cases, using whatever methods the team is comfortable with.

As an example of 'staged rollout' , a project to produce a word processor might rollout the following :
  stage 1: full top-level gui interface, simple 'edit box' functions, and load/save functions.
     Most of the gui would do nothing - it would be tied to 'stubbed out' functions. However, it allows users to 'test drive' the gui, checking
     the 'feel'. It also provides an opportunity for users to provide suggestions for enhancements, clarify requirements, & report bugs.
  stage 2: 'Print' menu stubbed out, along with basic 'print' functionality.
     Advanced features, such as paper orientation,etc. may not be implemented yet.
  Successive stages would activate more of the gui - search & replace, spell checking, etc.

The finished design & critical path must be fully documented, along with test cases. This information, along with a list of modules, projected completion dates, and module 'owners' (with contact info.) should be posted on the website. This information will not only guide existing team members, but will allow new developers to join the team without having to read & understand all of the source code.
Design Notes:
   Make sure everyone is designing the same program! USE the requirements.
   Avoid side effects in modules wherever possible !!
     This can be the single greatest aid to reuse & maintainability!
     (In short, a module should only rely on its parms. However, more info. is available on the web.)
  As a rule of thumb, a single module should not be more then 10-15 hours work. (If it is, break it into sub-modules)
  Test the Failure Modes!!
     All programs get bad input at some point, or machines suffer hardware failures.
     Make sure testing covers the 'worst case' scenarios.
  Design Changes Happen
     Live with it, plan for it, use the requirements & project goal to make sure changes make sense.

Programming standards are a must! This has long been recognized, notably by the venerable FSF & the new kid on the block, Java. As with PM, the important thing with programming standards is to use them. The initial project team can pick whatever works best for them, but once decided upon the conventions need to be followed! The same can be said for commenting & documentation systems. Find a set that is comfortable, and stick with them ! (javadoc, Doxygen, etc)

  Use of a source control system such as cvs, aegis, perforce is a must. Even better is use of a build management system. Simple automated builds can help ensure a tree remains consistent, but do little else. A much better solution is regression tests automated into the build cycle. Regression tests check that new code hasn't broken old functionality, and can be collected by designing a test case each time a function is completed or a bug fixed. This results in a surprisingly thorough set of tests in a very short time (ie - whole classes of bugs, such as handling of invalid parameters get caught very early in the development cycle). A 'test down' method works well, and simply means checking return & error conditions any modules your module depends on. This can be quite handy - especially when someone changes the returns of one of your dependencies & doesn't tell you. Don't forget to test the failure modes of your module as well! How it performs under error conditions is a critical piece of system stability.

( Out of time - gotta go...below is a list of tools that may be of some use...)
The Mozilla project has a very nice setup of build management tools.
  Highly recommended
Aegis features configuration mgmt. with built in Regression testing.
  Just starting to play with this one.....However, its been in use for years, and is supposed to be very stable.
Perforce is free for use in OSS Projects & lacks cvs's more annoying habits.
  Perforce also has a collection of OSS tools, such as web browsing the repository. Look here


If its hard your doing it wrong !! :-P, posted 21 Mar 2001 at 22:24 UTC by mechanix » (Journeyer)

Forgot to add: If the code is hard to write, then your design is wrong.

With the 'right' design & (more importantly) 'right' data structures, the code should come together fairly easily. If your having problems - step back and re-think your design/data structures !!

And, yes there are exceptions to this...but the exceptions just prove the rule :-)

Site for Good OSS Software Practices, posted 22 Mar 2001 at 03:13 UTC by goingware » (Master)

You've written an important article. I think that open source software in general can benefit from better practices. I'm aiming to do that with the LinuxQuality project at http:// linuxquality.sunsite.dk/.

There's not a lot there yet but see the articles section - contributions are appreciated, and I'll put something up there that links back to your article here.

strange premises, posted 22 Mar 2001 at 16:40 UTC by graydon » (Master)

developing on the internet doesn't make collaboration with remote developers as easy as when they're in the same room, and it certainly doesn't make good programmers a plentiful resource. The resource which is plentiful online is lurkers, people who "join a project", lending it an artificial sense of near-success when none of them have the necessary experience with the problem domain, motivation to attack it in earnest, CS knowledge to work out the solution, or coding skill to implement the solution once it's found.

The practises you describe, performed in isolation, lead to lots of things called "projects", each of which has an overdocumented heap of "design", and few if any actual programs. cf. the 17,000 projects on sourceforge. I wish they were all real programs, but they're not.

programs are the central issue. programs that are well written and solve an important problem. in a direct, simple, immediate, correct, efficient way. these are by no means simple to come by, there is no formula for finding them, and they require a very delicate balance of available time, skill, inclination and knowledge in the individual (or small group of individuals) who write them. the incrimental "let's all join hands and sing" development model works wonders for cleaning up, maintaining, and extending such a program once it's written, but it does not get it written in the first place.

Strange Premises ?, posted 22 Mar 2001 at 17:38 UTC by mechanix » (Journeyer)


graydon:
  I plead guilty to a bit of poetic license with the "as easily as people in the same room" comment. However, I do feel programmers are more plentiful then they have ever been. I've been a coder for 20+ years, and there are definately more of us now, then when I started.
  I'm a bit puzzled by the rest of your statements. Yes, I agree that finding good developers for a project is difficult. As you pointed out, lack of Domain Expertise can be a crippling factor. And I would hazard a guess that in most projects a 'core' of ~5 programmers do 90% of the coding. But what does that have to do with the article? The article doesn't attempt to address finding good coders.
  If your suggesting that design & process are useless overhead and progress is only made 'by one guy hacking all night' - I would strongly disagree. Can you provide some further detail to support that arguement ?

growth, posted 22 Mar 2001 at 18:18 UTC by graydon » (Master)

I'm simply trying to point out that while lack of procedure can slow a program's development, abundance of procedure cannot make a program. Your article is about "best practises", but it doesn't even mention the practise of trying to become a better programmer: read code, do exercizes, learn from others' code, learn other languages, study other systems, read books about solving problems, about methodology and modelling and factoring, find out what formal models of your problem already exist, learn math, learn complexity theory, examine existing libraries, etc.

I raise sourceforge as an example of this procedure imbalance. Anyone can start a project (good) and the project has massive free infrastructure and management tools (good). Nonetheless, most projects there are little more than vague ideas. If someone is trying to write "my first program 1.0", they need a lot of help learning to program. Doesn't matter if there's a bug tracking system. Doesn't matter if there's a Staged Rollout Plan or a Design Discussion Wiki. The overwhelming need is for skill development, for self-improvement.

Free software is in a unique position to help with this. For a couple dollars, I can have several hundred megs of source on a CD. A couple dollars more and I have a library card. These rich historical resources, plus a few compilers or interpreters on which to exercize, are the "best practise" for most of us. This is what will move us most surely towards producing good programs which solve problems.

Re: Growth, posted 22 Mar 2001 at 20:36 UTC by mechanix » (Journeyer)


  graydonI feel we're discussing different ends of the same problem. "Best Practices" as applied to the U.S. I.T. industry generally deals with the best approachs to management & implementation of software projects. In short, they generally address methods & methodologies. You seem to be discussing the education & training of new programmers. While this is an important issue, it is not the focus of this article.

  Basically, this article was intended to help competent programmers adapt to distributed development (It was originally written for professional programmers who's companies are begining OSS projects). If it fails in that regard, I would love some feedback on how to improve it.

  While this topic definately deserves its own article, I would like to mention that I agree with you position w.r.t. fledgling programmers. However, as you pointed out - the resources & information to become a better programmer are readily available. I feel the reason they're not being made use of, is simply that programming has transitioned from a 'passion' to a 'hobby'. If you look at any other hobby, say cars or gardening, you'll find a few people who live for it, and others who are simply weekend tinkerers. Welcome to the weekend tinkerer age of computers :-)

  Personally, I also feel that ego played a major role in early OSS development. People took a lot of PRIDE in their programs. Programming was considered an art, and an ends unto itself - solving the problem was almost secondary to how 'elegantly' you solved it. Having two or three options to solve a given problem was expected to breed 'healthy competition', and the best program (or best programmers) would 'win'. I fear those days are nearly over. Already, 'ease of use' is replacing the 'effeciency' as the battle cry of programmers. The focus shifts ever higher up the abstraction chain, and fewer and fewer people actually understand the underlying mechanisms. As an example, look at the java standards. Java has grown from a nice little language to an almost complete software simulation of a computer - processing, audio, video, networking, etc. And an ever increasing number of API's is provided to 'solve' (standardize) the 'difficult' problems. The cost in performance & hardware requirements is met with cries of "ease of use","increased productivity","hardware gets faster/cheaper all the time". This is not meant to slight Java coders, simply to point out that their focus has moved so far up the abstracton chain, that the underlying hardware is of no interest.

  And quite frankly, I haven't a clue how to help the situation...How do you give someone an appreciation for the beauty in a well written piece of code? How do you instill a desire to be a good programmer in someone ? How do you demonstrate that a grasp of basic algorithms & structures is a pre-requisite, in an age where canned libraries exist for almost any need ?

  I'd love to hear any suggestions you have....

improvement, posted 23 Mar 2001 at 01:53 UTC by graydon » (Master)

I guess the initially hostile tone of my response stemmed from my perception that your article was suggesting that the only thing a programmer needs to be effective is a good management infrastructure. Such a position is unfortunately quite common amongst the snake-oil salespeople crowding the software methodology world. "all you need is to use this here handy organization chart", etc. I don't buy that, but it seems you don't either. Sorry if I misinterpreted what you were saying.

For the latter question, of why "appreciation" of programs is declining and what might be done about it, I'd suggest the culprit is computation being replaced by communication. That is to say, the use of computers primarily as byte stream conduits (web browsers, cell phones, mail servers, etc) has opened up a huge amount of gainful employment in the development of code for hauling data. Humans love to communicate, even if it's totally trivial data they're communicating, and you can make a living off writing code to facilitate it. Unfortunately such code is as inspiring to a programmer's creative side as a garden hose is to a civil engineer. Not at all. Correctness is only mildly important, most algorithms can be reduced to linear time, the system's complexity stems from requested "bell-and-whistle features" rather than anything inherent in the problem.

There remains a highly inspiring (to me, anyway) field of programming in which correctness is still critical, every ounce of speed is necessary, system complexity is inherent in the problem, theoretical concerns are just as real as practical concerns, everything from machine-level instruction timing through to high level algorithm design is open for exploration, modelling the system taxes your creativity, and engineering tradeoffs are everywhere: scientific applications. Bioinformatics, quantum mechanics, molecular modelling, geospatial imaging, fluid dynamics, this sort of stuff seems to have arbitrarily difficult problems waiting for exploration.

Perhaps if the margins in commodity communication infrastructure drop low enough and scientific computation regains a dominant role in the field (say, biotech or quantum), we'll see more elegant work. For the time being, I'm afraid there's still a fair amount of money in the "you've got mail" business :)

XP, posted 23 Mar 2001 at 05:07 UTC by cullenfluffyjennings » (Journeyer)

I think that some of the ideas in XP fit in very well with open source software engineering. Particularly the parts about test drivers - I am also a big fan of the related Design by Contract stuff by Bertrand Meyer. If you make it easy for someone else to test that their change did not break the system - it is easier for others to contribute and to have a useful community.

XP in open source, posted 23 Mar 2001 at 16:23 UTC by pphaneuf » (Journeyer)

Some XP stuff might be hard to do in the open source arena. For example, PairProgramming, OnsiteCustomer or FortyHourWeek (many people do this in their spare time).

OnsiteCustomer might not be that big of an issue though, if you think of "scratch your own itch". :-)

On XPLC (when I have the time to work on it), I use a number of XP practices. My #1 favorite is without doubt UnitTests.

Excellent Article, posted 23 Mar 2001 at 23:52 UTC by nymia » (Master)

Suggestions:

1) Give more emphasis on development tools followed by communication tools. Project management tools are OK for cathedral type of organizations and tools for these kind should be emphasized as well.

2) Provide highlights on how these tools can be integrated together (like a puzzle) in a given environment, say, distributed or single location.

Overall, a very good article.

The talkbacks are excellent too. I wish I have something to say, but...

some other links..., posted 24 Mar 2001 at 08:28 UTC by Telsa » (Master)

mechanix said this article was intended to help competent programmers get used to open source. In that case, here's a couple of links: Alan has a talk called "Dear Mr Brooks" which is available in ogg and mp3 format from ftp.linux.org.uk and which is about differences and similarities between "traditional" and free software development methods. And Taj gave a talk about KDE development and history which has a pile of useful "learn from our experiences" mentions in it. He promised to write it up himself properly, and I look forward to it.

Telsa

Re: improvement, posted 11 Apr 2001 at 09:52 UTC by glyph » (Master)

2 points: first, about communications software, and second, about the "best practices" being discussed here.

graydon - While I agree that the rabid, crazed commercialization of the Internet has created some pretty crappy software, I do also think that your criticism of communication-oriented programming is misguided. Quality is important in every field. For example, a robust, secure, distributed messaging system would be helping people to communicate "totally trivial" data, but it involves lots of interesting problem areas -- enough that I believe it remains an unsolved problem -- and a chat system is arguably the simplest possible application of communications programming. Even in software where computational problems are not significant at all, the user's experience of the program can be tightly related to the specifics of its implementation, even at the low levels. (Ask anyone who's used a Java applet in Netscape 4.0 for evidence of this.)

mechanix - About "best practices"...

While "design" gets a lot of lip service these days, there seems to be very little consensus as to what constitutes good and bad design. Many of the most useful designs I've seen have arisen from informal discussions between several experienced programmers in the same application domain.

The key word there is experienced -- I would argue that many of these practices are actually a bad idea not only for fledgling programmers, but for any programmer who does not have intimate knowledge of the problem domain in which they will be working. By "has intimate knowledge", I mean "has implemented at least one system that does something similar already". Very good, very experienced programmers who have never worked with, for example, TCP/IP before often have amazing misconceptions about writing robust networked applications. Design discussions/documents in the absence of such experience are worse than pointless -- they give developers a false confidence in the design they come up with, and increase the perceived cost of throwing away bad designs and implementations.

The second point, of course, is that the first iteration of the system will probably need to be thrown away, or at least bits and pieces of it will. This is necessary, and a good development methodology will expedite getting to this point.

In the rare case where you have a team of open-source developers already assembled, at least half of whom have implemented a similar system in the past, the suggested ideas are mostly good ones, except for the section on "re-use". Re-use is a very bad idea. What one should aim for is modularity; each module in a finished program should be able to be used completely independantly of the others without being changed. By the time a module is fit to be used externally, any other project which uses it should count it as an external dependency, referring to its documented interface, and not incorporate the code wholesale or change it. If the second project exposes flaws in the design of a module in the first, that module should be kept up to date with changing requirements, or rewritten in such a way that it satisfies the requirements of both, not just forked into to similar-but-different submodules of each project.

Moderation in all things can help, too: "evolving requirements" can mean the same thing as "feature creep". "Hacking the testbed" can mean the same thing as "iterative prototyping". Use careful judgement whenever you hit something that seems suspicious, and don't just follow your standard blindly. The most useful standard is to be flexible in your standardization; the "science" of software development methodologies is a very, very new one, and if you've got an idea that you think will work for your project, try it out and share your results!

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page