[Fwd: Re: TAP information schema]

From: Keith Noddle <ktn-at-star.le.ac.uk>
Date: Wed, 03 Oct 2007 16:26:08 +0100

Keith.

-- 
Keith Noddle                    Phone:  +44 (0)116 223 1894
AstroGrid Project manager       Fax:    +44 (0)116 252 3311
Dept of Physics & Astronomy     Mobile: +44 (0)7721 926 461
University of Leicester         Email:  ktn-at-star.le.ac.uk
Leicester, UK   LE1 7RH         Web:    http://www.astrogrid.org

attached mail follows:


Doug, I really cannot understand why TAP should require an information schema. On one hand, you propose a simple table access, which any data provider can easily implement. On the other hand, you insist to implement a complex set of tables describing potentially a single table. This looks out of scope to me -- are we defining a relational database implementation, or a table access protocol ??? At least in the first implementation of TAP, I strongly suggest to remove any requirement for an information schema, and stick to simple methods to retrieve the resources (which already exist in the Registry) and a method to list the contents of a resource (i.e. list the fields/groups/ parameters of the tables making up one resource). For future versions of TAP, it could be possible to look more extensively what are the metadata which are really required, on the basis of actual services making extensive use of TAP: it could well turn out that the implementation of an information schema is NOT the best approach to get the schema of tabular services. And really, if VOTable is already criticized for being too far away from the "XML spirit", embedding the schema of data in a set of VOTables which have to be joined should look even more ugly ! As an illustration, the extract of the Hipparcos catalog is presented, either as an empty VOTable (which contains all metadata), or as an information set (which does not contain all elements -- the <VALUES> for instance which specifies that the column may be empty is not included) I. Example of an empty VOTable ------------------------------ <RESOURCE ID="yCat_1239" name="I/239"> <DESCRIPTION>The Hipparcos and Tycho Catalogues (ESA 1997)</DESCRIPTION> <COOSYS ID="J2000" system="eq_FK5" equinox="J2000" epoch="2000.0" /> <COOSYS ID="J2000_1991.250" system="eq_FK5" equinox="J2000" epoch="1991.250"/> <TABLE ID="I_239_hip_main" name="I/239/hip_main"> <DESCRIPTION>The Hipparcos Main Catalogue</DESCRIPTION> <FIELD name="_RAJ2000" ucd="pos.eq.ra" ref="J2000" datatype="double" width="10" precision="6" unit="deg">
<DESCRIPTION>Right ascension (FK5) Equinox=J2000.0 Epoch=J2000.000, proper motions taken into account </DESCRIPTION>
</FIELD> <FIELD name="_DEJ2000" ucd="pos.eq.dec" ref="J2000" datatype="double" width="10" precision="6" unit="deg">
<DESCRIPTION>Declination (FK5) Equinox=J2000.0 Epoch=J2000.000, proper motions taken into account </DESCRIPTION>
</FIELD> <FIELD name="HIP" ucd="meta.id;meta.main" datatype="int" width="6">
<DESCRIPTION>Identifier (HIP number) (H1)</DESCRIPTION>
</FIELD> <FIELD name="Vmag" ucd="phot.mag;em.opt.V" datatype="float" width="5" precision="2" unit="mag">
<DESCRIPTION>? Magnitude in Johnson V (H5)</DESCRIPTION>
<VALUES null="" />
</FIELD> <FIELD name="RA(ICRS)" ucd="pos.eq.ra;meta.main" ref="J2000_1991.250" datatype="double" width="12" precision="8" unit="deg">
<DESCRIPTION>alpha, degrees (ICRS, Epoch=J1991.25) (H8)</DESCRIPTION>
<VALUES null="" />
</FIELD> <FIELD name="DE(ICRS)" ucd="pos.eq.dec;meta.main" ref="J2000_1991.250" datatype="double" width="12" precision="8" unit="deg">
<DESCRIPTION>delta, degrees (ICRS, Epoch=J1991.25) (H9)</DESCRIPTION>
<VALUES null="" />
</FIELD> <FIELD name="Plx" ucd="pos.parallax.trig" datatype="float" width="7" precision="2" unit="mas">
<DESCRIPTION>? Trigonometric parallax (H11)</DESCRIPTION>
<VALUES null="" />
</FIELD> <FIELD name="pmRA" ucd="pos.pm;pos.eq.ra" ref="J2000_1991.250" datatype="double" width="8" precision="2" unit="mas/yr">
<DESCRIPTION>Proper motion mu_alpha.cos(delta), ICRS(H12) (for J1991.25 epoch)</DESCRIPTION>
<VALUES null="" />
</FIELD> <FIELD name="pmDE" ucd="pos.pm;pos.eq.dec" ref="J2000_1991.250" datatype="double" width="8" precision="2" unit="mas/yr">
<DESCRIPTION>? Proper motion mu_delta, ICRS (H13) (for J1991.25 epoch)</DESCRIPTION>
<VALUES null="" />
</FIELD> <FIELD name="e_Plx" ucd="stat.error" datatype="float" width="6" precision="2" unit="mas">
<DESCRIPTION>? Standard error in Plx (H16)</DESCRIPTION>
</FIELD> <FIELD name="B-V" ucd="phot.color;em.opt.B;em.opt.V" datatype="float" width="6" precision="3" unit="mag">
<DESCRIPTION>? Johnson B-V colour (H37)</DESCRIPTION>
<VALUES null="" />
</FIELD> <FIELD name="Notes" ucd="meta.note" datatype="char">
<DESCRIPTION>Existence of notes (H70)</DESCRIPTION>
</FIELD> </TABLE> </RESOURCE> ================================================================================ 2. Example of an information schema output ---------------------------------------- (notice that this schema is incomplete: references frames are not quoted, and the existence of "null" values is not specified). <RESOURCE name=information_schema"> <TABLE name="TABLES"> <DESCRIPTION>Table description</DESCRIPTION> <FIELD name="name" datatype="char" arraysize="20*"> <DESCRIPTION>Table name</DESCRIPTION> </FIELD> <FIELD name="description" datatype="char" arraysize="255*"> <DESCRIPTION>Table description</DESCRIPTION> </FIELD> <DATA> <TABLEDATA> <TR><TD>hip_main</TD><TD>The Hipparcos Catalog (ESO, 1997)</TD></TR> </TABLEDATA></DATA> </TABLE> <TABLE name="COLUMNS"> <DESCRIPTION>Column Description</DESCRIPTION> <FIELD name="name" datatype="char" arraysize="20*">
<DESCRIPTION>Column label</DESCRIPTION>
</FIELD> <FIELD name="ucd" datatype="char" arraysize="60*">
<DESCRIPTION>UCD1+ representation</DESCRIPTION>
</FIELD> <FIELD name="type" datatype="char" arraysize="20*">
<DESCRIPTION>data-type </DESCRIPTION>
</FIELD> <FIELD name="width" datatype="int" width="10">
<DESCRIPTION>length for strings/blobs</DESCRIPTION>
</FIELD> <FIELD name="unit" datatype="char" arraysize="32*">
<DESCRIPTION>Units of the data (controlled)</DESCRIPTION>
</FIELD> <FIELD name="description" datatype="char" arraysize="255*">
<DESCRIPTION>Detailed explanation of the column</DESCRIPTION>
</FIELD> <DATA> <TABLEDATA> <TR><TD>_RAJ2000</TD><TD>pos.eq.ra</TD><TD>double</TD><TD>12</TD><TD>deg</TD><TD>Right ascension (FK5) Equinox=J2000.0 Epoch=J2000.000, proper motions taken into account</TD></TR> <TR><TD>_DEJ2000</TD><TD>pos.eq.dec</TD><TD>double</TD><TD>12</TD><TD>deg</TD><TD>Declination (FK5) Equinox=J2000.0 Epoch=J2000.000, proper motions taken into account</TD></TR> <TR><TD>HIP</TD><TD>meta.id;meta.main</TD><TD>int</TD><TD>6</TD><TD></TD><TD>Identifier (HIP number) (H1)</TD></TR> <TR><TD>Vmag</TD><TD>phot.mag;em.opt.V</TD><TD>float</TD><TD>6</TD><TD>mag</TD><TD>? Magnitude in Johnson V (H5)</TD></TR> <TR><TD>RA(ICRS)</TD><TD>pos.eq.ra;meta.main</TD><TD>double</TD><TD>12</TD><TD>deg</TD><TD>alpha, degrees (ICRS, Epoch=J1991.25) (H8)</TD></TR> <TR><TD>DE(ICRS)</TD><TD>pos.eq.dec;meta.main</TD><TD>double</TD><TD>12</TD><TD>deg</TD><TD>delta, degrees (ICRS, Epoch=J1991.25) (H9)</TD></TR> <TR><TD>Plx</TD><TD>pos.parallax.trig</TD><TD>float</TD><TD>7</TD><TD>mas</TD><TD>? Trigonometric parallax (H11)</TD></TR> <TR><TD>pmRA</TD><TD>pos.pm;pos.eq.ra</TD><TD>float</TD><TD>9</TD><TD>mas/yr</TD><TD>Proper motion mu_alpha.cos(delta), ICRS(H12) (for J1991.25 epoch)</TD></TR> <TR><TD>pmDE</TD><TD>pos.pm;pos.eq.dec</TD><TD>float</TD><TD>9</TD><TD>mas/yr</TD><TD>? Proper motion mu_delta, ICRS (H13) (for J1991.25 epoch)</TD></TR> <TR><TD>e_Plx</TD><TD>stat.error</TD><TD>float</TD><TD>5</TD><TD>mas</TD><TD>? Standard error in Plx (H16)</TD></TR> <TR><TD>B-V</TD><TD>phot.color;em.opt.B;em.opt.V</TD><TD>float</TD><TD>6</TD><TD>mag</TD><TD>? Johnson B-V colour (H37)</TD></TR> <TR><TD>Notes</TD><TD>meta.note</TD><TD>char</TD><TD>1</TD><TD></TD><TD>Existence of notes (H70)</TD></TR> </TABLEDATA></DATA> </TABLE> </RESOURCE> ================================================================================ Francois Ochsenbein ------ Observatoire Astronomique de Strasbourg 11, rue de l'Universite F-67000 STRASBOURG Phone: +33-(0)390 24 24 29 Email: francois-at-astro.u-strasbg.fr (France) Fax: +33-(0)390 24 24 32 ================================================================================ > > >Hi Guys - > >A first cut at a minimal core TAP information schema is attached. >This was a little more involved than I expected due to the need to >support the registry table model and the VOTable model as well as a >conventional DBMS, but it still all fits on one page. VOTable support >is required for example if we want to use TAP/ADQL with the DAL query >response (as Francois pointed out). > > - Doug > >================================================================================ > >Minimal TAP Schema >------------------- > >Our use cases for TAP include queries against a relational database as >well as queries against a VOTable, for example the query response from >a DAL service. We want to be able to able to generate Table-related >metadata for the registry in a standard form, as has been proposed for >a VODataService. Hence our information schema needs to merge aspects >of the table views of VOTable, VODataService, and a conventional RDBMS. > >The following follows the concept and approach of the information >schema but the content is based primarily on a merger of the VOTable >and VODataService metadata, with a little information schema thrown in >as well. All of the following are tables and are potentially queryable. >It is assumed here that a TAP service accesses a single database >containing an arbitrary number of tables. Generalization to multiple >databases per service would be easy to add but is not addressed here. > > > TABLES Lists all tables in DB (Mandatory) > > Id Unique Table ID > Name Table name (Required) > Type Table type (base, view) (Required) > Role How table is used, e.g., "out" > Ucd Can a table actually have a UCD? > Utype UTYPE can apply to a table > Description Short one line table description > > > COLUMNS List all columns in DB (Mandatory) > > Id Unique Column ID > TableName Table to which column belongs (Required) > Name Column name (Required) > Datatype Datatype (as in VOTable + "string") (Required) > Arraysize Arraysize (as in VOTable) > ColumnType Column type (as in VOTable Field, Param) > Std Standard column (boolean) > Value Value if ColumnType=Param (fixed value) > Ucd UCD for column > Utype UTYPE for column > Unit UNIT for column > Description Brief one-line description of column > > > GROUPS Lists all column groups in DB (Optional) > > Id Unique Group ID > TableName Table to which group belongs (Required) > Name Group name (Required) > Ref Column or group ID (Required) > RefType Reference type (column or group) (Required) > Utype UTYPE for group > > >It is not clear whether we really need the GROUPS table at the level >of TAP and ADQL, but I included it to help evaluate how Group might >be handled with a relational interface. The question is whether we >need to be able to deal with Group at this level, or whether it can be >reserved to software which operates directly upon a VOTable (see also >[4] below). The optional nature of GROUP illustrates the extensibility >of the information schema approach. > > >Notes > > 1. TABLES describes both baseTables and views. The role could > be used with views to identify standard views of astronomical > catalogs, if defined by the service, e.g. like VERB in cone search > (perhaps "narrow" and "wide" where wide may be less than the full > table and the base table is the full table, for 3 options total). > > 2. COLUMNS describes both real table columns (as in a DBMS or > with VOTable FIELD) and optionally, fixed properties of a table > (like VOTable PARAM). > > 3. GROUPS is for representing the VOTable GROUP construct. This is > not used for DBMS tables and is not needed to understand most > VOTables as the UTYPE will usually (but not always) identify the > elements of a GROUP. (This may be needed if we want to process > VOTables, e.g., from a DAL query response, via a relational > interface like TAP/ADQL). > > 4. The GROUPS table is not fully normalized (there are multiple > records with the same ID, Name, etc. with Ref and RefType > varying), however this simplfied form is probably adequate for > our purposes and does not require a join to use (a SimpleQuery > would be adequate for access). > > If we were willing to have a restriction that a given > field/param/group can only belong to one parent group, then this > could be done more cleanly by adding a GroupId field to Columns, > with Group described by a single record in Groups. This would > still permit hierarchically structured Group relationships. > > In a typical use of Group we would want to identify all the items > belonging to a Group with a given Utype. This would be done by > querying the GROUP table with a given TableName and Utype. This > would list the items of zero, one or more groups, each identified > by a unique Group ID. The Ref/RefType fields identify the members > of the group. If one of these is another Group, the reference is > recursive. > > 5. If a generic DBMS is exposed, most of the VOTable-like metadata > (Ucd, Utype, GROUPS, etc.) would be omitted or nil valued. > The minimum metdata should be directly computable from the > DBMS information schema (or other system tables), with only > minor conversions in the service code. In other words it is > not necessary to create or modify DBMS tables for the minimal > required metadata. > > 6. The Registry view of a table is directly computable by subsetting > the information specified here. > > 7. The concept of multiple databases per service may be needed > for some DBMSes. It could also be useful in TAP to represent > a VOSpace logically as a named database, or to represent a HTTP > POST containing uploaded table data as a named database (useful > in queries). The information schema itself could be considered > a database (in RDB terms a catalog or schema). >

Received on 2007-10-03Z18:20:27