On Fri, 16 May 2003, Tony Linde wrote:
> I've posted a document on the IVOA wiki which summarises my understanding
> of the UCD discussion in the plenary session yesterday. Perhaps some
> people could peruse that and provide feedback on this mailing list about
> any errors or differences of opinion on the implications that I see. I
> hope it helps others with any of their own misunderstandings.
>
> http://www.ivoa.net/twiki/bin/view/IVOA/TonyOnUCDs
Here go my comments to Tony's document:
> My understanding of UCDs
>
> The way this came up was my question in the plenary UCD session about
> how we can identify columns within a table uniquely. Basically, the
> answer was that UCDs will not solve this problem and are not intended
> to do so. This page will summarise what I now understand as the purpose
> of UCDs and some of the implications of this.
>
> UCDs as Data Types
>
> Comment was made that UCDs can be considered as data types, so a column
> in a table has a data type of, POS_EQ_RA, say. I assume that the
> reasons for having UCDs as data types are to allow:
>
> * operations on columns: comparison, addition, subtraction,
> multiplication, etc plus specific astronomical operations
> * conversion between data types: eg converting between equitorial and
> galactic coordinates
As their name say, UCDs are meant to describe the content of a column uniquely.
Within any given table, column names do not repeat (unless in exceptional cases where the authors want to repeat a column for easy reading.
The problem arose when column names became degenerate when analyzing different tables/catalogues. The most common scenario for astronomy was a column called "Mag" , meant to represent the brightness of a celestial object in some photometric system. The problem was/is that "Mag" was "overloaded" and just comparing two columns labeled "Mag" was a cry for trouble. UCDs were introduced to break the degeneracy.
column (catalog=c1 name=Mag unit=mag UCD=PHOT_JHN_V) has nothing to do with column (catalog=c2 name=Mag unit=mag UCD=PHOT_STR_B)
> Do we thus need (or already have) some hierarchical structure of the
> UCDs based on allowable operations? In normal data types, we have
> numerical types, subdivided by integral and floating point, subdivided
> by storage size etc; one can add all numerical types but (generally)
> cannot add a number and a string (without pre-defining what such an
> addition will do).
Although one can add numerical types and not add strings or strings and numbers, not all numbers should be allowed to be added. One should not allow adding a velocity measured in km/s to a right ascention in equatorial coordinates (an angle).
Whatever mechanism is used to perform table combination should be provided with this kind of knowledge.
> Aligned to that: should we define the operations that can be performed
> on the individual data types (UCDs), the rules for those operations
> given specific types, and the type resulting from such operations.
Yes and no, thinking of all combination is unrealistic, plus, it's only in adding/subtracting where you have problems. No-one could impede you to take ratios of columns or multiply columns. As Peter Quinn pointed out in the plenary, some decisions should always be the responsibility of the astronomer.
> UCDs as Keywords
>
> In this context, the UCDs is part of the metadata for a table. It
> indicates the type of data held in a table, so having POS_EQ_RA
> identified with a table says that this table includes positional data
> in equitorial coordinates. That said, maybe the UCD for the table
> should include POS_EQ instead (since it is unlikely that it'll have RA
> without DEC).
Unlikely but not impossible. I've seen tables in which that's the case. UCD's were meant to be attached to columns, talking about a table's UCD is IMHO a confusion. A Table can have a set of UCDs attached to it (like a list), which may be shorter than the number of columns in the table if some columns happen to be representative of the same physical quantity therefore they DO have the same UCD.
> So the idea of being able to query which resources have POS_EQ* makes sense.
Sure it does. Not all catalogues contain RA-Dec.
> UCDs as Pointers into Data Model
>
> This was a very interesting comment, that UCDs can be seen as a pointer
> into the data model (DM). How this might be implemented and how
> feasible it is is still open. I guess there are two potential problem
> areas:
>
> * a UCD refers to multiple DM points (classes, objects or whatever
> they are called) this is likely but does indicate areas in which the
> UCDs are not the lowest level of metadata
>
> * one DM point is referred to by several UCDs
> if this occurs, it would indicate that the DM requires further analysis
Yes and no. Yes if we are talking about the so called "core-UCDs", which are the UCDs which can be attached to a column. No if we develop the concept of an "alias-UCD", which represents a list of UCDs design for discovery purposes. I'd say that a DM should have one "core-UCD" and 0 or more "alias-UCDs"
> I suspect that, as the DM expands and covers more areas of astronomy,
> we will need a more efficient version of UCDs that accurately maps to
> the DM; the current '_' separated textual names will have limited
> extensibility (even with the additional modifiers agreed at this
> meeting).
Most likely.
> Unique Column Identification
>
> Given that we cannot use UCDs as unique column identifiers, how do we do this?
> It seems that the only possible unique identifier for a column in a
> table is the resourceID of the table (from the Registry) plus the
> columnName (for explanation of resourceID, see the discussion on this
> in the Registry mailing list:
> http://www.ivoa.net/forum/registry/0091.htm and related messages).
Absolutely,
Within a table, the column name IS a unique column identifier. The pair catalogueName.columnName is in 99.9% of the cases unique, and for the future, one can impose that columnName be unique within a table (most DBMS won't be so happy if you assign 2 columns with the same name). If we ever converge to assign unique IDs to catalogues, then cataloqueUniqueID.columnName is unique Adding a UCD to this structure would be redundant eg, cataloqueUniqueID.columnName.columnUCD
> So, to summarise the discussion from the plenary session, a query can
> be sent to a table with either UCDs or column names or a mixture of
> both. If a UCD is included in a query, the data source can resolve this
> if there is only one column with that UCD or there are multiple columns
> but one has the modifier MAIN attached to only one of the column UCDs.
> Otherwise the query will fail.
Let me include here another element before the submission of a query to the resource handling catalogues: The registry. Any registry will contain metadata about the services listed, ie, the catalogues. The registry would know (among other things), - catalogueName (possibly catalogueUniqueID) - catalogueTitle - catalogueKeywords - catalogueAuthor - number of columns - number of records - column names, UCDs, units - name ID,MAIN --- - RA POS_EQ_RA,MAIN h:m:s - Dec POS_EQ_DEC,MAIN d:m:s .... - Vmag PHOT_JHN_V mag - Bmag PHOT_JHN_B mag - z REDSHIFT --- IMHO, a query can be accurately formulated for any given resource after consulting the registry. It should be decided at the registry level what we want to extract from any given catalogue, therefore, the query received by the resource handling that catalogue has to make no decision, and what's best, the query should not even be submitted if we know ahead of time that it will fail.
> Conclusion
>
> I hope people will provide feedback on the mailing list to these
> comments. I reiterate that they are only my understanding of what was
> said and my belief of the implications.
Another use of UCDs would be to discover resources (catalogues) which contain certain QUANTITIES. Scenario 1 If I formulate a query in the line "select catalogueName where UCDs include REDSHIT POS_EQ_RA,MAIN POS_EQ_DEC,MAIN PHOT_JHN_V from registryXX" I should get back a listing of catalogues which contain that information (plus probably the column names, units and UCDs which make the catalogue satisfy our request). Titles and keywords are short handles and do not always tell about all is listed in a catalogue. The astronomer should know what to do with this information, perhaps s/he will pick up a few of these catalogues and submit a query to them. Scenario 2 "select ucd;ID,MAIN ucd;POS_EQ_RA,MAIN ucd;POS_EQ_RA,MAIN ucd;PHOT_JHN_V ucd;REDSHIFT from [catalogueName where UCDs include REDSHIFT PHOT_JHN_V POS_EQ_RA,MAIN POS_EQ_DEC,MAIN from registryXX] where ucd;REDSHIFT > 2.5" The user doesn't know ahead of times if any ocatalogue exists to satisfy the query, but if it exist, s/he would like to print the equivalent of Name, RA, Dec, Vmag, and z This query should first find out a list of catalogues which contain all those UCDs and the name of their respective columns. Once this is known, individual queries should be sent to each of the catalogues requesting the columns which identify with the UCDs the user requested and which satisfy redshift > 2.5. Note that I've used a very loose notation above on purpose
Cheers,
Patricio
--- Patricio F. Ortiz pfo-at-star.le.ac.uk AstroGrid project Department of Physics and Astronomy University of Leicester Tel: +44 (0)116 252 2015 LE1 7RH, UKReceived on 2003-05-16Z12:50:11