I guess it is time again to explain the OWLViper tool, to make a clear use case for vocabularies and vocabulary extensions and vocabulary translations. Brian Thomas will be presenting a poster at ADASS on this. Even though the tool is not quite ready for launch, it still provides a concrete example to think about the SV.
We have a tool that reads in OWL ontologies and creates a menu of objects to choose from. You can grab AstroObjects and place them on the canvas. Then you can choose properties of these objects (like "hasMeasurement RotationalVelocity" or "hasPart Halo" or "hasStar Cepheid") and add them inside the object's box. Then you can constrain values by min and max or contains string. It can then query for these objects or, if it already has data, it can go on to perform operations on the data. But, lets focus on query.
The best situation (from the application's point of view) is for all datacenters to have exactly the same ontology and able to respond to requests for OWL subclasses. That is, the query, in the format of an OWL class with the user's restrictions, is sent and the datacenters return Individuals that belong to the restricted class. A datacenter "simply" has to ingest all of their data into an OWL database and off-the-shelf reasoners can be used to respond to queries.
It would be nice if we all used the same standard vocabulary, but that may not be the case. What if each datacenter has its own ontology? You could use the tool, one datacenter at a time, by loading in the ontology of each datacenter, forming the query in that vocabulary and sending it just to the one site. But this would be laborious. So, someone has to make "translations" from the tool's vocabulary to each of the datacenter's. This just means using the owl:equivalentClass and rdfs:subClassOf. This can be more complicated than it sounds because it may require forming complex classes with owl:unionOf and owl:intersectionOf. The tool is told which namespace to use for which datacenter.
Next. What to do about datacenters that are not quite so advanced and don't use OWL? They may use an XML Schema to describe their data. Then hopefully they can respond to an XQuery. So one can fairly easily convert OWL into XQuery, the only hard part is again having a mapping between the terms. One way to proceed is to have software that can automatically transform between OWL and XQuery (say an XSLT) which works as long as the vocabularies are consistent. Then convert the Schema to OWL and provide the "translations" as above. The tool can then form OWL queries in the vocabulary of the datacenter and then the "standard" OWL-to-XQuery tool finishes the job.
For datacenters that use ADQL, it is quite similar to XQUERY case. The relational database schema is similar to (indeed, can be autotranslated into) an XML schema. And OWL-to-ADQL transformation is easy. By the way, it is always easy to down translate, as long as you accept that some complex queries will have no equivalent in the simpler language.
Some conclusions. It probably is not a good idea to invent another language for mapping between vocabularies since two exist: SKOS and OWL. It probably is not a good idea to use SKOS since a) it is not a recommendation b) work is going on to make it compatible with OWL c) OWL is more powerful and is either the language that we will want to have in the long run or is an ancestor of the language that we will want and d) OWL is a WWW Recommendation with real, existing, working, fairly stable tools (editors and visualizers and even wizards) and has support from a number of important scientific fields with $$$$.
One can format an SV of Classes (tokens?) with ascii text using indents to imply subclassing. Protege's Subclass Wizard will read it in and create OWL instantly. One can format an SV with SKOS and a couple of substitution commands in vi will transform it to OWL. So, I don't care which of these are used at first. However, with OWL it is easier to visualize and modify and check for consistency, and one can get on with adding properties (mostly connecting the Classes with the Measurements that are pertinent to them and the allowed ranges and datatypes).
One last note. I don't believe it is useful to have a vocabulary with gamma, ray, burst and then say that you have everything you need to form the concept gamma_ray_burst. Adding some colons and semicolons into it will not help. We need the full term gamma_ray_burst and explicit machine readable statements on it: subClassOf Explosion, hasTimeSeries, TimeSeries hasDuration D and hasValue > J janskies. On the other hand, I am not advocating terms like gamma_ray_burstHasTimeSeriesHasDuration. I am advocating sticking close to natural language. We speak it because it has been proven to almost work.
Ed
Andrea Preite Martinez wrote:
>
>
>> sub-set officially. You can't build, from scratch, a list of words that >> describes everything everyone is doing at the moment, let alone in the >> future. It is fundamentally not possible to build a canonical and final >> list of "stuff" in a subject like ours, which deals with broad topics >> and changes on a fairly rapid time scale. Subjects that have done this >> face more bounded problem sets.
>> If you're going to go and build a framework where we can fit our own >> words and collaboratively create and manage a vocabulary to annotate >> and categorize our content then I vote YES. If what is being voted on >> here is going off and building a big list of words, I vote NO. >> >> Al.