Hi WGers,
I've been conversing with many of you regarding the general issue referred to as fine-grained/rich vs. coarse-grained registries. One current manifestation of it is the issue of whether descriptions of table columns should appear in the registry. We see use cases emerging that want to use this information for discovery and planning, but handling this information in the registry raises costly curation issues. I would like to propose a solution to this issue that I believe can serve as a model for handling other reputed fine-grained information. This solution will ultimately call for a standard format for describing a set of tables.
First, I recognize that before we can agree on whether to put fine-grained information into the registry, we need a common understanding of what qualifies as "fine-grained". I have some ideas on this that I will be presenting next week in the RWG session.
The use cases that are driving table metadata into the registry are:
One major reason that placing the column metadata in the registry is attractive is that the registry is an existing system for collecting the information and provides a common way to access it. One current problem with our existing catalog services (Cone Search, SIA, OpenSkyNode, and SSA) is each has a slightly different of presenting this information. Thus, in practice, it is difficult to mine this information--you need 4 different methods. For data collections that are described independently of any service that accesses them, there is no standard way of getting this information other than having it in the registry.
I would like to propose we define a standard format for describing a set of tables and all their columns that can be served by a single, static URL. With this, we can:
Implementation considerations:
o More than one URL could be associated with a resource. Thus, if
a service or collection serves many tables, their descriptions
could be distributed over several documents of manageable size.
o While the information is packaged into individual documents,
a service can generate this information on the fly as necessary.
(For example, if TAP were to define separate "getTables" and
"getColumns" methods, the information could be aggregated via
internal calls to these methods.)
o For existing "standards"--Cone Search, SIA, OpenSkyNode, (and if
necessary, SSA)--we could devise trivial HTTP GET services that
convert on-the-fly calls to their respective metadata methods
into the standard format. These services could be provided by
registries.
The advantages are:
The pressure for supporting the above use cases is large, so we need something quickly. I would strongly recommend a v1.0 that is simple and based on existing formats. I think either of two such options would work fine:
o a profile on VOTable
o the Catalog description model currently in the VODataService
extension schema used in the registry
(http://www.ivoa.net/xml/VODataService/v1.0).
I also want to point out the Source Catalog Data Model, which some of you may be familiar with. Because of its emphasis on the astronomical semantics more than table & catalog structure, it's probably not a good candidate for the format itself. However, it would be a good model for annotating a table description via utypes.
The point is to just support what people are already doing with the registry. If we want to add more to the format (or even totally replace it), I recommend we save it for a save it for a subsequent version.
So the general pattern for "fine-grained" information would be to have VOResource records point to this information that is primarily managed at the providers site. Another area I would like to explore this idea is in using detail coverage information to aid in discovery. We currently have a place in the VODataService schema (an extension of VOResource) a place to point to a detailed footprint service. We would need to add a place to point to a table description. Thus, there is a critical time issue for putting this into place.
I invite your comments, and I will raise this in Beijing during RegWG2.
cheers,
Ray
Received on 2007-05-08Z08:02:23