Thought the advogato community might be interested in a short report I have written based on sourceforge project data. It gives a general overview of projects housed at sourceforge. Any comments are greatly appreciated.
Thought the advogato community might be interested in a short report I have written based on sourceforge project data. It gives a general overview of projects housed at sourceforge. Any comments are greatly appreciated.
I know that I'm referenced in this, but despite this, you should have look at this. There's some very interesting data and information about projects, number of developers, number of administrators, age, maturity and the like, and renster's done some great work on looking at possible correlations. We (the community) need to understand better how to create successful projects, and what this needs, and this sort of research is very important as an underpinning.
The majority of SourceForge projects have no source code and will never have any source code. If you could somehow filter out all the projects that have no source code that would help focus your research on actual live projects.
I was trying to figure out which programming languages are most used by open source/software libre hackers, and I didn't want the zillions of dead-on-arrival projects to influence my results, so I just looked at the "top 50 most active" and the "top 100 most downloaded": popularity of programming languages among open source hackers.
Regards,
Zooko
Total Projects: 29,905
Active Projects: 7,205
Consider that:
Active_Projects/100 == 20 / (100 - Percentile_20th_Most_Active_Project)
Then:
Active_Projects == (100 * 20) / (100 -
Percentile_20th_Most_Active_Project)
So with a value of 99.7224 for the Audacity project on todays SF home
page, you get:
Active_Projects == 2000 / .2776 == 7,205
While I love what renster has done, the critique that SourceForge by its nature (a nature that I think is a fine and wonderful thing) hosts more great ideas than code by raw count is a valid one. I think it would be interesting to plot which admins worked on project with code separately from those with just good ideas for example.
There is a *lot* of good stuff in the numbers in renster's report. Now it remains for us to tease out just what the numbers are saying.
There are certainly some more numbers to play around with. I was looking for a way to graphically present a network map to show relationships among projects - ie how closely projects are related based on the number of shared admins and/or developers. It isn't really in my area of interest and I didn't want to stray too far but would certainly provide some more information. Unfortunately I don't have data on all the developers. I collected information on project admins only. Another extension would be to track project founders and the number of projects they are invovled in over time.
I have other data on all time statistics for projects. I think the next stage is to try and work with some more fine grained data on statistics by month over the life of the project. Then it will be possible to track activity and get some feel for the projects and where they are heading. Growing, declining etc. The trouble with some of the summary data is that it is incorrect. Some of the numbers don't add up and you can see discussion about this issue on the sourceforge forums. It will be necessary, for the sake of insuring data integrity, to go through the bug tracking information to follow through on things like bug submission and bug resolution etc to get a better understanding of project 'performance'. I think this would require some cooperation from sourceforge staff and I'm not sure about their stance on the whole research thing as yet (hint, hint to any sourceforge staff reading this).
I am currently looking at user comments from slashdot (and advogato perhaps when i get around to gathering it) that mention open source software development and coding it to get a handle on the nature of open source software development and the factors that are mentioned as being associated with success and failure. It's a big coding task but I hope it will provide a good starting point of factors to send out to experts and others who can then provide me with more feedback, details and direction.
thanks for your comments so far.
Here's his abstract:
The nexus of open source development appears to have shifted to Europe
over the last ten years. This paper explains why this trend undermines
cultural arguments about "hacker ethics" and "post-scarcity" gift
economies. It suggests that classical economic theory offers a more
succinct explanation for the peculiar international distribution of open
source development: hacking rises and falls inversely to its opportunity
cost. This finding throws doubt on the Schumpeterian assumption that the
efficiency of industrial systems can be measured without reference to
the social institutions that bind them.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!