| CPAN Ratings for Perl: Possible Problems |
Posted 22 Mar 2004 at 14:43 UTC by scrottie ![]() |
Just like this and a thousand whiny articles, users are now able to post reviews of Perl modules on CPAN. CPAN is the well known, well used repository for modules for Perl. Anyone may contribute, many contributions ultimately become part of Perl or very popular extensions. Dealing with quality and redundancy have been struggles but the open environment has let it grow into perhaps the largest library of reusable code known to man. Large code repositories have an interesting set of problems and CPAN has lessons to teach. I argue that openness is critical to success and its opposite is easy to accidentally fall prey to. This article should be interesting to anyone using high level languages or anyone interested in code reuse.
Perl's CPAN, the
Comprehensive Perl Archive Network, contains
tens of thousands of modules by thousands of authors.
A dozen windowing toolkits, dozens of database interfaces,
too many network protocols and file formats, interfaces to
other programs and languages, and a gross assortment of oddities
makes it a staple of serious Perl programers. It's size and
usefulness spurred other languages to emulate it, but it the
philosophy behind it that is resonsible for its success.
Anyone may contribute a module. The powers that be elect to
include these modules in indices, but all are searchable.
Redundant modules are tolerated. There are often several modules
that attempt to solve the same problem. Automated testing of
included test suites on various platforms and naming requirements
to be included in an index are the only signs of administrative
input. Those and the removal of anything malicious and the
manual application process to be granted a userid on the system.
Documentation from modules is online, formatted nicely for
viewing, and modules may take advantage of the bug tracking
system.
Many intermediate programmers have gone on to become advanced
programmers from the feedback and suggestions they've gotten from
novices and gurus alike. Writing a module and releasing it to the
world is a growth experience.
This openness is responsible for the explosive growth of the system,
and the result of half-arsed attempts of intermediate programmers,
not to mention no longer maintained code and inferior "me too"
re-implementations litter the site. Discussions of how to cope
with things done in poor style, long broken, or overly redundant
keeps poping up. No single plan seems to fit. If old modules are
expired, then mature, popular, stable code is thrown away.
Some of the best modules haven't changed in years, even though they
may have gone through years of growth and bug fixes before that.
Whether something is redundant or not is subjective and can't
be automatically tested. Some important popular modules are written
in poor style because style has changed over the years, and, again,
style is hard to quantify (in general, quality is hard to quantify).
So a system of ratings was introduced. Users can rate a module
and explain why they do or don't like the module. This is a form
of closedness, as alluring as it sounds. While it might be worth
while to solve the problems at hand, I for one don't think it does,
and I think it causes harm.
1. People write bad reviews as a way of asking for help. Complex
but good modules tend to get bad reviews because people become
frustrated with them. Some things are inherently complex and
even a brilliant object model can't save them. People tend to
voice an opinion when they have a complaint rather than a
compliment - we expect things to work, and we scream when they
don't.
2. Previously, module authors got their feedback privately or
atleast tactfully in the form of email or bug reports on the
bug tracker. This feedback helped them grow to be a better programmer.
Communication was written addressing the author of the software
rather than addressing the public so it read like "you might consider
doing X to avoid Y problem". When phrased as an address to the public
it sounds like a repremand - "Joe should do X to avoid the Y problem
this module has". This is humiliating and makes CPAN authorship
competitive rather than cooperative.
3. While the feedback is clearly a system of opinion, it is aggreated
into a number of stars that psychologically seems authoratative.
An author looking at the display for his module and seeing only
one star because a single user gave it a review that happened to
be a bad review is damaging. Our first attemps are always lacking
and this encourages people to give up rather than try again.
There are other solutions. Make the existing discussion lists
more prominate and let people off the street chime in with
opinions and encourage module authors to ask for help. Make the
bug tracking system handle feedback as well as bugs in a seperate
category. Even making it more statisticly pure and requiring all
users to vote on the quality of the module or accepting no
votes would be an improvement, or a trust metric system could
reduce noise associated with random people chiming in.
Taking a page from Freshmeat and Sourceforge and simply
reporting on vitality, number of contributors, number of open
bug reports, and so forth would let people decide for themselves
whether the module meets their criteria without hurt feelings
or confusingly terse information.
C, Perl, Python, Ruby, Java, and numerous other languages are finding
real strength in code sharing in the form of libraries, objects,
and modules rather than just entire applications. Especially
with server side languages and scripting it is common to
bring on dependencies readily. Coping with code sharing is a
relatively new frontier, one with a lot of lessons still to
be learned and problems to be solved. It is part of a world
where programmers cater primarily to other programmers and
open source projects scale beyond what one core team can do.
It is sings of a whole culture and commerce rising behind the
scenes, with nitches, specialization, channels, and all that.
It is cool and exciting =)
Reguards,
-scott
Maybe I'm missing something obvious, but I just went poking around the CPAN site and found no mention of ratings.I'm dubious about a rating system as well, although I don't see the issue about public "maybe you could do X" comments. I see that in mailing lists , and no one seems to take it badly. Of course, the suggestions are being directed to experienced programmers who presumably don't have fragile egos about their coding. Usually there are many experienced developers on the list, and "you could do X" generates more suggestions, often resulting in a better solution.
The ratings are of "distributions", not "modules". So for example, for Template::Extract it's at the distribution page not the module page
Taking a page from Freshmeat and Sourceforge and simply reporting on vitality, number of contributors, number of open bug reports, and so forth would let people decide for themselves whether the module meets their criteriaI think I read that in the Debian packaging system, the number of open bug reports against a package was to a first approximation proportional to the number of people using it, rather than having anything to do with the quality or bugginess of the package.
When I'm assessing suitability of a library or module for my own use, my primary recourse (if it has more than one developer, at least) is to the project mailing lists to see what impression of the development process I get. It's subjective and can't be reduced to a single metric, but works pretty well for all that.
Do we see the same thing happening in the PHP, ASP, JSP, Javascript, etc. worlds? Are there getting to be enough "sets" of code floating around that we need some kind of rating or trust metric system to separate the men from the boys, so to speak? I thought that PHP was going to take over where Perl left off. What's happening in the PHP library world? Sorry, folks -- I can't follow all of this stuff for myself.
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!