Having just glanced at Alberto's mail in respons to Keith, I have realised my attempt to post some suggestions have not appeared, sorry - this may answer some of Alberto's points regarding a controlled vocabulary, separating layers of description etc..
The material below can also be found at
http://wiki.astrogrid.org/bin/view/Astrogrid/DataServiceSchema with links
which help to explain it.
It is intended to describe the science content of the *summary* of astronomical datasets held by the registry, not the list of UCDs or evaluations of the contents of the dataset.
Tentative ammendments to Keith's RegistrySchema
AMSR comments and changes marekd *
Hanisch et al. is the most recent (presently v.6) version of the document
also referred to as
(not)Bob's.
Mostly this is as per Hanisch et al. but some changes - e.g.
- spatial is incorrect for sky coverage/position/resolution, should be
angular
- some things added
SEE NOTES AT END
Schema: http://www.w3.org/2001/XMLSchema
include: schemaLocation="serviceLocation.xsd"
elements:
CONTENT
"content" string (see elements following)*
"facility" string
"instrument" string
"format" string (VOTable, ascii, FITS etc?)*
"briefsummary" string
"tablenrows" integer (Number of rows in table)*
"tablencols" integer (Number of columns in table)*
"tablesize" decimal (bytes - size of table excl.
linked nDim data)*
"ndimdatasetsizemin" decimal (bytes)*
"ndimdatasetsizemax" decimal (bytes)*
"nndimdatasets" integer (number of nDim data sets)*
"type" string (archive, survey, catalogue,
bibliography,
journal, library, outreach,
education,
eporesource, integrated,
nameresolver)
"subjectkeyword" string (Galaxies, Milky Way, Nebulae,
Planets,
Solar system, Stars)*
--- COVERAGEReceived on 2003-04-30Z14:39:50
"coverage" string (see elements following)*
"wavelengthrange" string (gammaray, xray, xuv, uv, optical,
ir, mmwave, radio)
"wavelengthshort" decimal (metres)
"wavelengthlong" decimal (metres)
"ramin" decimal (degrees)
"ramax" decimal (degrees)
"decmin" decimal (degrees)
"decmax" decimal (degrees)
"sensitvity" decimal (Jansky? also allow Magnitude?
eV?)
"startdate" decimal (JD) or (YYYY.DD) or date
(CCYY-MM-DD)*
"enddate" decimal (JD) or (YYYY.DD) or date
(CCYY-MM-DD)*
"angularfraction" decimal (dimensionless fraction)
"spectralfraction" decimal (dimensionless fraction)
"temporalfraction" decimal (dimensionless fraction)
"sourcedensity" decimal (counts per square degree)
--- RESOLUTION
"resolution" string (see elements following)*
"angularresolution" decimal (degrees? arcsec?)
"spectralresolution" decimal (dimensionless fraction)
"temporalresolution" decimal (sec)
--- DATAQUALITY
"dataquality" string (see elements following)*
"astrometryerror" decimal (degrees? arcsec?)
"photometryerror" decimal (Jy? Magnitudes? eV?)
"timingerror" decimal (dec)
- - - - - - - - - - NOTES Suggested standard units/conventions: see http://www.iau.org/IAU/Activities/nomenclature/units.html (This is just for the ResourceMetadata; for DataSets generally the wider conventions of CDS can be used). I have suggested units; in some cases I suggest alternatives where the conversion may be tricky or where being totally consistent might lead to very small/large Nos (e.g. degrees for angular position, but arcsec for error is more usual) - however I would prefer to be consistent, the first unit before ? is preferred. Approximate conversions suffice to answer 'Is this catalogue any use' with 'maybe/no'. Hanisch et al. use decimal years for date but this is non-standard? Convention for leap years not well-known. This has implications for the user query; for the very first iterations we may have to force the user to use standard units but very soon we should be able to interconvert Jy/Mag/?x-ray units? and wavelength/freq/eV units etc. For Resource metadata selection this does not have to be precise. Should the units be added to the schema? Data types and null values Is it simplest if every element should occur at least once, and we use null values as suggested in the [[http://cdsweb.u-strasbg.fr/doc/VOTable/votable-1-0.htx][VOTable documentation]]? e.g. use NaN? for decimals with no value and NULL for strings? Alternatively, if we need to use the null value to sort by, (e.g. (de)prioritise DataSets lacking the relevant ResourceMetadata entry) xml allows INF and -INF. CONTENT
"subjectkeywords" (new element)
One or more keywords taken from the dataset header, e.g. a subset of the third column on the Vizier catalogue selection page. See http://adc.gsfc.nasa.gov/adc/adc_keyword_index.html and http://vizier.u-strasbg.fr/doc/ADCkwds.htx We should add from the ADC list or the Vizier simplification as required, sparingly. Note planetary nebulae are Nebulae not planets Galaxies means external galaxies, not the Milky Way Is there anything equivalent for Solar/STP? In Hanisch et al. 'subject' is included in curation metadata, but I feel it fits better in content. However I don't really mind; this and some other things listed under CONTENT below should maybe be in CURATION?
"type" means FITS, ASCII etc?
I am using 'table' to mean data which could be searched directly in a database or be converted to VOTable, e.g. a list of sources and properties. Other 'nDim' data which requires special viewers/extraction software, e.g. FITS, will always? have an associated table describing it, e.g. a list of pointings and other observationsal details. I think that we can cover whether nDim data are images, spectra etc. by whether elements like
"decmin" or "spectralresolution" have meaningful values, or the null
value. COVERAGE
"angularfraction" (a fraction) is for datasets containing images or
imageable data; "sourcedensity" (sources/deg^2) is for datasets containing lists of sources with positions. Note that the total fractional coverage is different from the resolution. In future iterations AstroGrid want to go for indexing/matrix representation rather than the shapes suggested by Hanisch et al.? RESOLUTION Note that it is easier to express spectral resolution as (finest channel width)/central value), e.g. delta-lambda/lambda, as this avoids unit problems, but this cannot be done in a universal way for other sorts of resolution. We should use the best value in the data for now, and later include algorithms to allow for e.g. angular reaolution as a funtion of frequency for multi-frequency data. DATA QUALITY Things like angularresolution and sensitivity will initially probably be given as the best value of all errors (systematic and random) correctly combined. However in some data sets these may cover a wide range. E.g. astrometry error can depend on sensitivity and resolution; in observing logs resolution may be frequency-dependent. Ultimately we should be able to express these things as functions which are evaluated depending on other bits of the data set or even the query. E.g. the MERLIN archive covers frequencies from 0.408 to 22 GHz has a best resolution of 0.''008 but this is at 22 GHz; the resolution at 5 GHz is 0."050 and if you want higher resolution you have to go to e.g. the EVN archive. UCDs Should there also be an element to link to the UCDs for the dataset? Many of my changes are probably incorrect xml, sorry, but I hope intention is clear.