Re: Issues relating to simple data and metadata queries

From: Patrick Dowler <patrick.dowler-at-nrc-cnrc.gc.ca>
Date: Mon, 30 Apr 2007 09:40:55 -0800

I read through all the relevant emails over the weekend and would like to make the following observations:

My thought right now is that TAP is somewhat different from other DAL services in the following ways. DAL services have an implicit (or explicit) astronomical data model; at one level TAP has a table/column model and (with joins) more or less the full relational (RDB) model. Once database, table, and column metadata is discovered, TAP then exposes a custom data model (astronomical) via UCDs and/or utypes or otherwise. Thus, TAP does have at least two levels of data model implicitly included.

Further, nothing above says what the instances (rows in tables) are; for that level of understanding one needs to declare that a table or TAP service contains instances of a specific data model. This is what other DAL services do (SIA, SSA, etc) -- they declare what kind of things are described/stored inside. This is also what Pedro's group was trying to do by making the Source Catalog Data Model and having a skynode declare that it held instances of that model. This is kind of a "coherence" that goes beyond just using utypes and UCDs for specific columns -- it is the shift from a custom data model to a standard data model.

OK, I am already getting well past the reasonable email limit (need to scroll to read these two paragraphs... without more detail, what do people think of this conceptual picture? Am I crazy?

Pat

On Saturday 28 April 2007 13:12, Doug Tody wrote:
> Database and Table Metadata
> ---------------------------
>
> I think everyone here is probably already familiar with the information
> schema: it allows the standard SQL mechanism to be used to query
> database metadata as well as table data. Hence to get a description
> of the columns in table A, we can do something like SELECT * FROM
> INFORMATION_SCHEMA.columns WHERE TABLE_NAME="A". Similarly, one
> can query for a list of tables, views, stored procedures and their
> parameters, privileges, etc.
>
> The Registry description of tabular resources provides a much simpler
> but similar mechanism, providing a way to describe a catalog of tables,
> or the columns of a table. The column metadata provided is similar
> to what VOTable defines for a table field.
>
>
> Scope of Metadata
> -----------------
>
> If we consider the types of sophisticated queries which will be made
> possible with ADQL, it is not clear how much metadata is needed to
> plan and compose complex queries.
>
> ISSUE 1: To compose ADQL/SQL queries using TAP, do clients need
> information comparable to the SQL information schema (plus some
> additional astronomy-specific metadata), or is a simple list of
> the tables and columns managed by a TAP service (such as provided
> by the registry resource model) enough? (Views for example, could
> be useful to describe what is in the condensed TAP verbose levels,
> and might be useful for viewing subsets of very large tables).
>
> A related question is what the information provided by the registry
> is to be used for. As Ray noted in his mail yesterday, there are
> use cases which require sufficient information in the registry to
> enable discovery and planning of operations which use table services.
> One would like to have enough information stored in the registry to
> enable service discovery and perhaps some high level query planning
> (e.g., what type of information to extract and whether a given service
> provides it), before querying the individual services.
>
> ISSUE 2: What information needs to be stored in (or otherwise available
> from) the registry, to support global TAP service discovery and
> perhaps the first phase of query planning?
>
> ISSUE 3 (Related to #1-2): Is more database/table metadata than that
> required for #2 (discovery), needed to support advanced ADQL
> queries against a single service?
>
>
> Metadata Content
> ----------------
>
> The next concern is database and table metadata content as provided at
> the service level (independently of what is required at the registry
> level; whether these can be the same is TBD). This can be used for
> a variety of purposes, e.g., data discovery and selection, query
> planning, and detailed queries against a single table or set of tables.
>
> ISSUE 4: What COLUMN metadata is required? Is a VOTable-like set of
> attributes sufficient to describe each column? Is any additional
> SQL information schema metadata needed?
>
> ISSUE 5: What TABLE metadata is required? This can include
> table-specific metadata (table name, number of rows and columns,
> type or class of table, etc.), dataset metadata (DataID, Curation,
> Characterization, etc.), access metadata (e.g., to retrieve the
> entire table), and possibly other things. We use generic dataset
> metadata to describe other types of data in the VO; how useful
> would it be to provide this for table datasets as well?
>
>
> Form of Interface
> -----------------
>
> The core TAP functions are to be able to query table data and table
> metadata. There are at least two possible approaches to this: a
> uniform interface for both, as for the SQL information schema (i.e.,
> database metadata is represented as any other table), or separate
> interfaces, possibly with different inputs and outputs.
>
> Option 1: Information Schema Approach
>
> In the uniform interface approach the same mechanism is used
> to query both data tables and metadata tables. The information
> schema is used to give metadata tables predefined names, e.g.,
> "INFORMATION_SCHEMA.tables".
>
> For example (shortening this to just "SCHEMA" for brevity and ignoring
> protocol issues such as HTTP formatting, GET/POST usage, the details
> of MIME types, etc.), something like the following, could be a simple
> data query:
>
> queryData(query="SELECT * FROM A WHERE rms > 3.0", format="csv")
> queryData(table="A", rms="3.0/")
>
> Using the same interface, something like the following could provide a
> simple metadata query mechanism:
>
> queryData(table="SCHEMA.tables")
> queryData(table="SCHEMA.columns", tableName="A")
> queryData(table="SCHEMA.views", tableName="A")
>
> In the SQL information schema approach, a ADQL/SQL query against
> the information schema would also be possible, but may or may not
> be advisable in the TAP interface so a parameter-based example is
> given instead.
>
> Option 2: Separate Interface for Data and Metadata
>
> In this case we provide a separate interface to query or retrieve
> table data and metadata. For example,
>
> getTables()
> getColumns(table="A")
>
> Even using a different operation with a different set of paramters, this
> approach could be made very similar to option 1, if the output semantics
> were the same in both cases.
>
> ISSUE 6: Either of the above approaches would work. Option #1 can
> potentially provide access to a greater range of information
> via a uniform interface, with most of the client and server
> implementation being the same. Option #2 can provide slightly
> simpler queries, and allows the acceptable parameter set to be
> constrained for each type of metadata query, but requires a new
> service operation for each type of metadata. Which approach do
> you prefer, and why? Or, suggest something different.

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)
Received on 2007-04-30Z19:35:30