DOAP + OpenId + PURL = Single OSS Metadata Namespace?
December 6th, 2007When writing code to search for DOAP for a particular OSS project you need to understand that the homepage and old-homepage elements are unique identifiers. Edd Dumbill came up with this idea when designing the schema. I get it, it makes sense. Tools can figure out the homepage for a project, but it’d be easier for humans to search for DOAP by a short project name. That isn’t possible for a number of reasons though, the biggest obstacle being the distributed nature of DOAP prevents enforcing unique project names.
We have doap:store, which offers a web browser form for SPARQL queries, but I want an easy way to find a single DOAP record, known to be authored or authorized by a member of a particular OSS project, without using a web form.
DOAP moves. There can be multiple copies of DOAP for the same project floating around in triple stores, but which is the one true DOAP the author keeps up-to-date? And if you’re familiar with RDF, you know that having DOAP in a triple store with bnodes can prevent us from knowing the URL of the DOAP.
An idea I came up with is similar to what Karl Fogel describes in an article about an idea for a Galctic Project Registry. I got in contact with Karl a few months back to see if he’d made any progress, but he hadn’t worked on it since the artcle was written in Sept. 2006. The basic idea Karl had was to have one up-to-date, canonical ‘namespace’ for all of OSS using DOAP.
There are no details in the wiki on how this would be accomplished, but I think I’ve come up with a way to implement this without immediately needing the backing of all the major ‘forges’ and software package indexes he lists (Google Code, SourceForge, Freshmeat etc.) to get started. It also differs in scope from what Karl had in mind, I think, but I’ll cover that in an upcoming article.
I’ve started work on doapurl.org, which functionally works exactly like a PURL resolver. That is, it simply performs HTTP redirects. Who is allowed to create and edit these PURLs and the way the PURLs are categorized is where doapurl.org differs from traditional PURL resolvers.
There is a one-level structure of categories which makes remembering and searching for DOAP easy. Unlike a traditional PURL resolver, only members of an Open Source project may create and edit a PURL on doapurl.org for their own project. Using OpenId delegation from a known URL namespace members of a particular project can write to, we can determine that a person from the project is elligible to create and edit a PURL pointing to a DOAP record without requiring humans to manually research and approve anything. This will only work for well-known places we know people can write to, such as SourceForge, berlios, etc. but that covers a vast amount of hosted projects and will relieve the burden of people verifying quite a bit of DOAP authorship.
The goal of doaprul.org is simply to create an index of authoritative DOAP URLs. Using this collection of URLs will allow others to create searchable package indexes or find metadata for Open Source projects quickly and easily by using a unique name or category and name pair. Redundant PURL servers would be a breeze to have in place, they only need to sync an .htaccess file containing all the redirects, if they use Apache.
Having a permanent URL for your DOAP will also make it easier for existing package indexes and release notification systems such as SourceForge and Freshmeat to take full advantage of the benefits of DOAP and its distributed nature. Create your DOAP for a new project, use that DOAP to import it into Freshmeat etc.
After getting doapurl.org functional, I’ll use doapspace.org to import the PURLS, creating a categorized browsable package index. It will be the job of doapspace.org and other DOAP-based package indexes to periodically spider DOAP. Using a ping service like PingTheSemanticWeb will make this very easy.
I have a command-line client called ‘doapfiend‘ which should easily demonstrate all of this. Say you want to know the URL of the bug tracker for firefox:
doapfiend -f bug-database www-clients/firefox
doapfiend simply fetches the DOAP from the URL resolved from the PURL at http://doapurl.org/www-clients/firefox parse the DOAP and displays the bug tracker URL.
If you don’t know the category firefox belongs in you can ommit it:
doapfiend -f bug-database firefox
The doapfiend client would contact doapspace.org which would search for projects with ‘firefox’ as the last part of the PURL. If there is only one, it will fetch the DOAP and parse it, displaying the bug database URL, or if there are more than one project named ‘firefox’, show all the categories for each.
I’m still working on the server code for doapurl.org but hope to have it ready for testing for people already familiar with DOAP Dec. 9, 2007 and hope to announce it for wider testing by Jan. 1, 2008.
In follow-up articles I’ll go into more detail on:
- Why separate DOAP PURL server and package indexes?
- How to use OpenId delegation to verify DOAP owenership
- The server software being used
- Involving the OSS community in running and owning doapurl.org and its data.
Hi Rob,
The continuous update activity on doapspace.org corresponds to a recent project update or release event, which is a result of crawling freely available meta-data of various sources (mostly Sourceforge) and translating to (or updating existing) DOAP, correct?
I haven’t come across DOAP on doapspace.org that was generated from a Freshmeat hosted project yet, but I’m assuming it is being pulled in as well.
Have you had to deal with the situation where the same project has meta-data on multiple sources, e.g SF and FM? How did you resolve these instances?
Are you using FLOSSmole (http://ossmole.sourceforge.net/)?
And, finally, a question on PTSW - doapspace.org is pinging it when it updates or creates DOAP.
Are you also doing the reverse, querying it for updates related to DOAP that might be maintained elsewhere (i.e by the real project owner)?
And is doapurl.org functional?
Sorry for all the questions but I’m still putting together all of the pieces and moving parts, and perhaps other visitors have had the same questions.
This is a valuable service you have created here and I hope others realize its potential.