Dear Tony,
I would distinguish two quite different aspects in the workflow you
depicted,
(by the way, both described in my Astrovirtel use cases, remember?):
1.- Data Discovery
> have an agent (portal or workflow or client software or ...) go
> determine
> which databases met those criteria (using the metadata in the registry)
2.- Query submission
> and submit the query to them (without reference back to the user)
Those two things are very different in astronomers' minds.
Data Discovery (the pure Registry part) is the fundamental tool to allow
unaware astronomers of the potential of new-to-them data collections
and/or tools.
Yes, it happens. Especially, but not exclusively, to astronomers very
much biased
towards a particular spectral domain that need to discover data in
other spectral domains.
In this case, a query just to discover which data or catalogues might
help them
can very well be resolved by using UCDs. This would be an asset!
Nevertheless, no astronomer would want to blindly query all of the
resulting collections
without a proper inspection. They have special science cases in mind,
and
the specificity of their requirements cannot be underestimated; very
difficult,
if not impossible, to UCDify, so to say, that knowledge ...
... unless the science case is very unspecific, AND unless all data
collections
are described (maybe I should say Characterised here) very well, but in
that case we would need a very detailed data collection data model.
Only once the astronomers have discovered, inspected and understood the
quality
of the newly discovered data collections, they can think of querying
them.
And here the problem comes: in general, within a given data collection UCDs are not unique. Hence, it is impossible to just go on with a UCD-based query.
Hence, I'm very much with Bob regarding point 2 (at least, until we do
not have
a detailed data model for data collections, in which case utypes and
not ucds would be
used anyway) but I agree very much with you about point 1.
Ciao,
Alberto
On May 27, 2005, at 21:23, Tony Linde wrote:
> This is an issue which I raised during the dicussions of ADQL at
> Kyoto. I
> don't think I explained myself very well, nor understood Bob's response
> properly. I've since talked to the AstroGrid Science Advisory Group
> (AG-SAG)
> which met last week and (think) I understand it more, so let's try
> again.
>
>
> I initially asked that we discuss whether we need to be able to
> formulate
> queries in ADQL using UCDs rather than just column names. The reason
> for
> this was to enable one query to be formulated and sent to more than one
> database, without the user having to reframe the query with a new set
> of
> column names each time.
>
> I did say (or meant to say) that the astronomer might want to
> formulate the
> query in generic terms (UCDs are all we have now - data models in
> future)
> then specify criteria for database selection (presumably including
> whether
> they include those UCDs), and have an agent (portal or workflow or
> client
> software or ...) go determine which databases met those criteria
> (using the
> metadata in the registry) and submit the query to them (without
> reference
> back to the user), returning the results along with metadata explaining
> which databases were chosen and why (or, in the case of workflow,
> simply
> logging the metadata while using the results in downstream apps).
>
> Bob shot down this idea, saying that astronomers would never want to
> submit
> queries to unknown databases. The AG-SAG, however, agreed that this,
> though
> not a crucial and immediate requirement, would be required in the
> future in
> some form. I think I explained better at that meeting about being able
> to
> set up front the criteria against which the databases are chosen,
> including
> quality criteria (perhaps using the validationLevel /
> validatingOrganisation
> metadata added to the registry during the interop).
>
> Does this now work for you, Bob? Have I explained it better?
>
>
> As I see it then, ADQL needs to support the construction of:
>
> 1. a single _conceptual_ query, restated manually by the user (to name
> known
> columns) for each selected database;
> 2. a single query stated in generic terms and sent to multiple known
> and
> pre-selected databases;
> 3. a single query stated in generic terms and sent to an unknown set of
> databases selected on basis of user-specified criteria (as part of
> workflow): user only knows which databases by inspecting metadata
> returned
> from workflow.
>
> Although it is not the job of VOQL (though in the context of SkyNode,
> maybe
> it is) to specify how queries are constructed, we could imagine the
> following ways in which a portal or query-builder could operate:
>
> A. user selects databases; query builder only allows query to be built
> using
> common UCDs; if any UCD has multiple instances in a database user is
> asked
> to resolve which column name is meant.
>
> B. user enters query using UCDs; query builder allows user to select
> from
> list of databases which support the UCDs used; again, if any UCD has
> multiple instances in a database user is asked to resolve which column
> name
> is meant.
>
> C. user enters query using UCDs; user enters criteria for selecting
> databases (based on registry metadata for SkyNodes); queries are
> automatically sent to databases which use their own methods of
> resolving
> multiply-allocated UCDs, and the metadata returned with results
> reflects
> this reasoning.
>
> I daresay there are other ways of approaching the issue.
>
> I would also say that in the above cases (A and B anyway) the portal or
> query-builder could simply translate the single UCD-based query into
> separate ones using column names before sending them.
>
>
> But at the moment there is a conflict between the SkyNode and ADQL
> specs. In
> SkyNode, we have the AcceptsUCDs flag which indicates whether a node
> can
> accept queries formulated using UCDs, but there is no way in ADQL of
> indicating that a given term in the query is a UCD rather than a
> column name
> (or is there?).
>
> Cheers,
> Tony.
>
> --
> Tony Linde
> Phone: +44 (0)116 223 1292 Mobile: +44 (0)7753 603356
> Fax: +44 (0)116 252 3311 Email: Tony.Linde-at-leicester.ac.uk
> Skype: callto:tonylinde (home) callto:tonylinde2 (work)
> Post: Department of Physics & Astronomy, University of Leicester
> Leicester, UK LE1 7RH
> Web: http://www.star.le.ac.uk/~ael
>
> Project Manager, EuroVO VOTech http://eurovotech.org
> Programme Manager, AstroGrid http://www.astrogrid.org
> Co-Director,
> Leicester e-Science Centre http://www.e-science.le.ac.uk/
>
Alberto Micol
ST-ECF HST Archive Scientist
Received on 2005-05-29Z15:18:54