Re: Fitting HEASARC tables into the UCD framework.

From: Andrea Preite Martinez <andrea.preitemartinez-at-rm.iasf.cnr.it>
Date: Fri, 27 Jan 2006 12:10:52 +0100


Tom, Michael,

first of all thank you for your report on the assignation of UCDs to HEASARC's tables.
It is very useful, for me (author of the scripts behind the ucd-builder) and also for other potential users.
If you don't mind, I'll post your file on the twiki page of the IVOA UCD working group (http://www.ivoa.net/twiki/bin/view/IVOA/IvoaUCD ) as an example of application of UCD-tools.

I was thinking of organizing a session at next IVOA InterOp meeting in May on applications of ucd-tools to real cases and feedback from users. I hope you'll be there to present your work.

Comments:

First of all, let me say that the public tools provided in the CDS page were not ment for massive use, like yours. Indeed you soon realised that the builder is only an interactive tool, and can only be (effectively) used on a one-by-one basis.
At CDS I am confronted with a task similar to yours, but the situation is worst: about 40.000 tables and 150.000 columns to assign UCDs to (just to update the old ucd1 to the new ucdi+), and then a steady work of about 1.000 columns to assign per month. In order to do the job in a reasonable time, I built around the basic scripts (find-ucd-word: fw, and build-ucd-from-words: b-ucd, both used by the public builder)
a command-line assignator that accepts as input a file describing each table (a suitable version of the VIZIER read-me file). Now, in about 100s I can assign UCDs to more than 40.000 columns. But the real problem is not only time (because you have in any case the control-time to consider!). The assign tool can use more information to assign the ucd, based on column-name (most of them are standard names, more explicit than user descriptions!!) or units. A short description of the tool follows at the end.

The tools are continuously upgraded and improved (see the builder page at http://vizier.u-strasbg.fr/UCD/cgi-bin/descr2ucd, with the updated date), with the feedback of
my work on Vizier tables and looking at the log of the builder on the CDS page.
Thanks to your work I'll have an additional feedback to work on!!



> assign1p -h

assign1p = assign ucd1p-words from list of [key]words and build UCD1+ from word(s).
If present, a suggested old UCD1 is default-translated.

USE: assign1p [options] [<] input-file.tsv

tab-separated input-file fields:

       0: Cat/Tab   (Table name)
       1: Data Type (I,F,A)
       2: Col_Name  (Title of the column)
       3: Col_units (no units= ---)
       4: Col_description (Free text)
       5: UCD1/ucd1+ || notes
tab-separated output fields:
       [nn]0,1,2,ucd,3,4

Options:

-h[elp] : this help
-d : print revised description used by FindWord after the application of
syntax/semantic rules (def=print input descr.)
-l : list all words with matching score > 5 (def=only top
P/QECV/S scores)
-k : print elements of association-tables containing input
description [key]words (def=no)
-r : do not apply syntax/semantic rules to list of keywords (def=yes)
-s : print some statistics at the end (def=no)
-t : find also the traditional UCD.
Generates a duplicate output line with old UCD (def=no)
-u : do not force use of suggested ucd (def=yes)
-v : verbose output, sets also -l -u (def=no)
-n : prefix output with record-number (def=no)
-nn : start at record 'nn' in input-file (def=0)

Required: readtab.pl, f-word.pl, bucd.pl Files: U1Pdescr_w, U1-U1P.defaults

The result of the assign1p procedure is flagged in field 1, as

      (Type,flag), where flag can be:
      nn = best score shown (if only description is used),
       _ = ColumnName used,
       % = Units used,
      _% = CN+U used,
       f = used suggested ucd,
      f1 = used suggested ucd1,
       ? = unable to assign,
       ! = forced note.
=====================================================================

Regards,
Andrea


Andrea Preite Martinez                  andrea.preitemartinez-at-rm.iasf.cnr.it
Istituto di Astrofisica Spaziale        Tel.:+39.06.4993.4641
Area di Ricerca di Tor Vergata          Fax.:+39.06.2066.0188
Via del Fosso del Cavaliere 100         Cell:+39.339.3817355
00133 Roma                              CDS :+33.3.90242473
==============================================================================
Received on 2006-01-27Z11:11:32