Archive for December, 2007

DOAP + OpenId + PURL = Single OSS Metadata Namespace?

Thursday, December 6th, 2007

When writing code to search for DOAP for a particular OSS project you need to understand that the homepage and old-homepage elements are unique identifiers. Edd Dumbill came up with this idea when designing the schema. I get it, it makes sense. Tools can figure out the homepage for a project, but it’d be easier for humans to search for DOAP by a short project name. That isn’t possible for a number of reasons though, the biggest obstacle being the distributed nature of DOAP prevents enforcing unique project names.

We have doap:store, which offers a web browser form for SPARQL queries, but I want an easy way to find a single DOAP record, known to be authored or authorized by a member of a particular OSS project, without using a web form.

DOAP moves. There can be multiple copies of DOAP for the same project floating around in triple stores, but which is the one true DOAP the author keeps up-to-date? And if you’re familiar with RDF, you know that having DOAP in a triple store with bnodes can prevent us from knowing the URL of the DOAP.

An idea I came up with is similar to what Karl Fogel describes in an article about an idea for a Galctic Project Registry. I got in contact with Karl a few months back to see if he’d made any progress, but he hadn’t worked on it since the artcle was written in Sept. 2006. The basic idea Karl had was to have one up-to-date, canonical ‘namespace’ for all of OSS using DOAP.

There are no details in the wiki on how this would be accomplished, but I think I’ve come up with a way to implement this without immediately needing the backing of all the major ‘forges’ and software package indexes he lists (Google Code, SourceForge, Freshmeat etc.) to get started. It also differs in scope from what Karl had in mind, I think, but I’ll cover that in an upcoming article.

I’ve started work on doapurl.org, which functionally works exactly like a PURL resolver. That is, it simply performs HTTP redirects. Who is allowed to create and edit these PURLs and the way the PURLs are categorized is where doapurl.org differs from traditional PURL resolvers.

There is a one-level structure of categories which makes remembering and searching for DOAP easy. Unlike a traditional PURL resolver, only members of an Open Source project may create and edit a PURL on doapurl.org for their own project. Using OpenId delegation from a known URL namespace members of a particular project can write to, we can determine that a person from the project is elligible to create and edit a PURL pointing to a DOAP record without requiring humans to manually research and approve anything. This will only work for well-known places we know people can write to, such as SourceForge, berlios, etc. but that covers a vast amount of hosted projects and will relieve the burden of people verifying quite a bit of DOAP authorship.

The goal of doaprul.org is simply to create an index of authoritative DOAP URLs. Using this collection of URLs will allow others to create searchable package indexes or find metadata for Open Source projects quickly and easily by using a unique name or category and name pair. Redundant PURL servers would be a breeze to have in place, they only need to sync an .htaccess file containing all the redirects, if they use Apache.

Having a permanent URL for your DOAP will also make it easier for existing package indexes and release notification systems such as SourceForge and Freshmeat to take full advantage of the benefits of DOAP and its distributed nature. Create your DOAP for a new project, use that DOAP to import it into Freshmeat etc.

After getting doapurl.org functional, I’ll use doapspace.org to import the PURLS, creating a categorized browsable package index. It will be the job of doapspace.org and other DOAP-based package indexes to periodically spider DOAP. Using a ping service like PingTheSemanticWeb will make this very easy.

I have a command-line client called ‘doapfiend‘ which should easily demonstrate all of this. Say you want to know the URL of the bug tracker for firefox:

doapfiend -f bug-database www-clients/firefox

doapfiend simply fetches the DOAP from the URL resolved from the PURL at http://doapurl.org/www-clients/firefox parse the DOAP and displays the bug tracker URL.
If you don’t know the category firefox belongs in you can ommit it:

doapfiend -f bug-database firefox

The doapfiend client would contact doapspace.org which would search for projects with ‘firefox’ as the last part of the PURL. If there is only one, it will fetch the DOAP and parse it, displaying the bug database URL, or if there are more than one project named ‘firefox’, show all the categories for each.

I’m still working on the server code for doapurl.org but hope to have it ready for testing for people already familiar with DOAP Dec. 9, 2007 and hope to announce it for wider testing by Jan. 1, 2008.

In follow-up articles I’ll go into more detail on:

  • Why separate DOAP PURL server and package indexes?
  • How to use OpenId delegation to verify DOAP owenership
  • The server software being used
  • Involving the OSS community in running and owning doapurl.org and its data.