Re: VO and ADEC identifiers

From: Alberto Accomazzi <aaccomazzi-at-cfa.harvard.edu>
Date: Thu, 18 Sep 2003 17:24:41 -0400


I completely agree with Patrick's comments about the ambiguity and overuse of the term "dataset." However, in the context of this thread (verification of datasets published in the literature), one can generally assume that the word "dataset" refers to a rather fine-grained instance of astronomical observations (i.e. a single FITS file + ancillary data rather than a survey). In that sense, it is unreasonable (at least in my mind) to assume that each and every dataset will have an entry in the registry. Even if you consider bigger sets of observations, each survey paper may refer to its own custom-made collection (obtained according to some criteria), and it's unreasonable to think that each of these will be entered in the registry.

Just to add some more prospective (but hopefully not confusion) to the topic, if one considers the ADS as an archive of bibliographic datasets, there is no reason not to think of a single record (bibcode) as a datum that can be verified and linked to. So we could presumably define an entry in the registry corresponding to ADS as an archive and its bibliographic datases as "data collections." It would also make sense to register a verification service that can be used by other data centers to create and maintain bibcode links (right now this is performed using customized tools). However, it would be insane to consider adding all of its 3.2M bibcodes (now seen as data identifiers) to the registry.

So I guess my point is we should not assume that the registry contains *everything* that we may want to obtain metadata about. We can however assume that it contains entries for all the services that can be used to get to this metadata.

Patrick Dowler wrote:

> On Thursday 18 September 2003 12:36, Tony Linde wrote:
> 

>>If there is a single service which sits in front of a collection of
>>datasets, each of which is a table within a database, how does a query
>>sent
>>to the service work? Does it query every dataset with the same criteria?
>>
>>Are all the datasets just blocks within a single table so that a query
>>is
>>effectively on the colleciton as a whole and the data returned can be
>>from
>>many datasets?
>>
>>If a user queries the registry looking for a service which can provide
>>data
>>of some description, how is the collection of datasets described under a
>>single service? - ie does the metadata (coverage, content etc) embrace
>>all
>>the datasets as if they all existed in a single table?
>>
>>Sorry if this is AstroInformatics 101 :)
> 
> 
> "dataset" is a heavily (over-)used word. To some people it means one or more
> related files from a telescope or archive (1+ images). To another, the whole 
> SDSS source catalog is a dataset (ie. many RDB tables).  There are cases 
> where a "dataset" is a set of images, spectra, and a source catalog to go 
> with it. I think the confusion comes from the fact that "data" is (over-)used
> to mean both the observational data (images, spectra, time series, etc) and
> the derived or extracted information (source catalogs, for example). 
> 
> Whether the use of data and dataset if over-use can be argued until the end of 
> time. It certainly is a vague concept in practice and in my experience even 
> individuals tend to use it losely and differently (which doesn't help :-).
> 


-- 

****************************************************************************
Alberto Accomazzi
NASA Astrophysics Data System                     http://adswww.harvard.edu
Harvard-Smithsonian Center for Astrophysics      http://cfa-www.harvard.edu
60 Garden Street, MS 31, Cambridge, MA 02138 USA
****************************************************************************
Received on 2003-09-18Z23:40:45