24 Sep 2013 jas   » (Master)

BLURB: Software repository metadata convention

As a maintainer of several software packages I often find myself copying text snippets from the README file into different places (savannah, github, freecode, emails, etc). Recently I had a need to generate a list of software packages that included every project’s name, brief summary, license and URL. I could have generated that list manually by copying the text from every project’s README and COPYING files. However then I would have to maintain my list manually to keep it in sync with all the projects. This easily leads to stale information, so I thought a better approach would be to put the information I needed into each projects’ source code version control system. The advantage is that the manual work to extract the information may be automated by a script, since the data is in a usable format. I’ll explain here how my solution works.

To be able to find the information using only the URL to the repository, I needed a filename convention. The filename I chose was BLURB; for the etymology, see Wikipedia’s page about Blurb. The data format in the file is similar to normal email headers.

An example illustrate the principles well. Below is the BLURB file for one of my projects.

Author: Yubico
Basename: libyubikey
Homepage: http://opensource.yubico.com/yubico-c/
License: BSD-2-Clause
Name: Yubico C low-level Library
Project: yubico-c
Summary: C library for manipulating Yubico YubiKey One-Time Passwords (OTPs)

The format is simple: UTF-8 text with each line starting with a header followed by a colon (“:”), some whitespace, and some content. If a line starts with whitespace, it is a continuation of the previous line’s content (trim leading whitespace). The following table describes the fields that I use. I may update this blog post in the future with new fields or improved explanations (for reference, current date is 2013-09-24).

Header Meaning
Project Short identifier for the project (e.g., ‘gcc’, ‘emacs’)
Name Official name of the project in English
Summary Brief one-line summary of the project’s purpose in English
Author Origin of the project
Homepage URL to the project’s website
License License keyword, preferrably using one of the SPDX license identifiers
Basename The tarball basename, if different from the project name

Finally some reflection of the solution. After quick design, I thought that I couldn’t be the first one with this problem, and I tried to find other similar efforts. I haven’t been able to find any standardization effort that have the following properties:

  • Stores the information inside each upstream project’s own source code repository
  • Provides a filename convention so that it is possible to find the data with only the source code repository link
  • Encode data in a format that is easy to extract using simple command line tools
  • Not encode information about releases (i.e., what happened in a particular version)

The related efforts that I found were SPDX which at first look seemed to offer what I wanted. However on closer examination it failed to deliver on all the requirements above, and appeared to have somewhat different goals. However I found the SPDX license list useful and refer to it. Another effort is Eric S. Raymond’s freecode-submit and shipper but its primary focus is to encode information about each release. The design of the BLURB file is clearly influenced by these tools. Another influence has been Debian’s specification for machine-readable copyright information. The Free Software Foundation’s list of software projects seemed like another candidate, but it doesn’t suggest any way to store the information in the upstream project itself.

flattr this!

Syndicated 2013-09-24 21:03:45 from Simon Josefsson's blog

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!