OAI coordination

From: Ray Plante <rplante-at-poplar.ncsa.uiuc.edu>
Date: Tue, 25 Nov 2003 16:17:21 -0600 (CST)


Hello Harvesters and Harvestees,

We are beginning to see some OAI interfaces coming (back) on-line, so we will need to coordinate on a few items regarding how we use them. As you may know, there are several places where we, as a community, have some latitude in presenting our metadata through this interface. Nevertheless, we need a certain amount uniformity.

Here is a list of recommendations on things we need to agree on. My preferences in each case are not as important as the fact that we need to agree, so I welcome your alternative suggestions.

  1. Metadata format name for VOResource metadata: ivo_vor

    This was Sebastien's suggestion. The exact choice is not too     important, but we should reduce the likelihood that the name might be     used by another community.

2. "root" element for VOResource metadata; that is, the child of the

    <oai:metadata> element:

    I recommend we use <VOResource>, for the following reasons:

  1. The <VOResource> is constrained to contain only 1 Resource by the schema. In contrast, <VODescription> allows multiple resources; this would prevent validation from catching this error.
  2. The alternative of allowing <Resource> or one of its sub-classes (e.g. <Organisation>, <Service>, etc.) will likely complicate the handling of the data on the harvester's end if several possible elements are allowed at this level (depending on how the harvester is implemented).
       This may not be a big deal in the short term, but in the long-term 
       it will make it easier for a harvester to decide if it can handle 
       the record.  In general, any application must answer the 
       following questions:
         *  is the XML instance valid (for the schemas I know/care about)?
         *  is the root element what I need/expect it to be?

       The second question is easier to answer if there is only one 
       possible root element to check for.  

3. The form of the OAI identifier; i.e. the value of <oai:identifier>.

   I would like to see us use our IVOA identifiers (in their URI forms)    here. Otherwise, we will find ourselves having to keep track of two    identifiers.

   This might seem like a no-brainer, but several of us (including us at    NCSA!) are using the OAI interface script from Virginia Tech, which    creates its own OAI identifiers based on the local XML file name.

   Ramon has placed a modified version of this script at    http://nvo.ncsa.uiuc.edu/VO/software/XMLFileDP_vo.pm that is meant to    serve as a drop in replacement for XMLFileDP.pm. (Replace your old    XMLFileDP.pm in the perl library directory used by your oai.pl script.)    This version will override the default OAI identifiers with the IVOA    ones found in the corresponding VOResource files. It also has the    added benefit of supporting deleted records.

Let me know if any of the above needs more clarification. Ramon and I will happy to work with anyone needing additional help with the OAI interface.

There's also another item that comes to mind that is independent of OAI: every publishing registry (i.e. registry that can be harvested from) should export:

No two registries can claim to manage the same AuthorityID. While this may not remain true in the future, for now, this is the way we track resource records back to their origin (as discussed by Alex).

We need a way to register our harvesting interfaces as well. We can either:

  1. create a new standard service type, and define the appropriate metadata
  2. add additional metadata to Registry. I prefer the latter, but we should go with which ever is easier. Thoughts?

cheers,
Ray   Received on 2003-11-25Z23:19:06