Hi Tony. Just a few comments on your message...
In working with Ed Shaya and Brian Thomas on this voql (VO Query Language), I believe that the driving concept is that this voql will capture a single user query to the VO. It is not intended to capture the entire research process: query, retrieve, analyze, refine query, resubmit, re-analyze, etc. i.e, it is not intended to capture the one extreme workflow case that you mentioned. That's why we still need humans in the loop.
On the other hand, voql captures far more than a one-step-at-a-time query. For example, the voql (XML-ized user query) should be able to capture this science query: "give me optical emission line lists for IR-luminous AGN that were Xray-selected". This query involves multi-wavelength data and multi-modal data (catalogs, spectra), and thereby the query must be parsed and distributed to appropriate data centers and maybe the data need to be shipped to some service (e.g., to generate line lists from optical spectra). This relatively simple query is already complex enough and distributed enough that a standard query format (hence XML) is certainly needed. Thus, voql does more than the other extreme case that you mentioned. The user may subsequently choose to refine the query after seeing the results, or filter the results, or expand the query, or a million other things. Then a new voql will be generated to handle each subsequent query.
The workflow (or query plan) is handled at the "integrator" level, by (what I would call) a query manager, which not only dispenses pieces of the voql to the relevant data centers and/or service providers, but it also integrates and fuses the returning query results.
VOQL is a standardized language to capture scientist's queries to the distributed heterogeneous collections that comprise the VO. If all data collections were identical (schema, interfaces, metadata), then a VOQL would not be needed. XML Query and SQL would suffice in such instances, but that is not the world in which our data and information resources and services live.
Ed's initial voql concept is able to capture complex and simple queries. The apparent simplicity of the schema belies the complexity that it can support, and vice versa.
> From mdolensk-at-eso.org Sun Feb 23 11:48 EST 2003
> Reply-To: <ael-at-star.le.ac.uk>
> From: "Tony Linde" <ael-at-star.le.ac.uk>
> To: <voql-at-ivoa.net>
> Subject: RE: a high level language
> Date: Sun, 23 Feb 2003 16:44:11 -0000
>
> > Here is a first cut at a highest level astronomical query language.
>
> This looks good, Ed. I'm not sure I could get my head around it all
> based just on the sample query and schema - I'll wait for 'further
> annotation of the schema' before commenting on too much.
>
> One general issue that perhaps needs to be addressed is the distinction
> between the query and the workflow construction.
>
> I imagine the workflow as a description of all the steps that are
> required to get the user a final result. These steps would include
> queries, object selection, data analysis, re-querying, visualisation etc
> - everything an astronomer does now in the course of getting a result.
>
> At one extreme, each step that the user inserts into the workflow must
> address only one service (eg querying a single data source) and it is up
> to the user to construct data merges, sub-selects based on previous
> queries etc. thus building up the complete workflow.
>
> At the other extreme, the entire workflow could be seen as a 'query', ie
> everything needed to produce the user's result.
>
> In the middle, one could imagine that a single step is defined as
> something which produces a useful (possibly retained) result (though
> not, perhaps, the final result), so could include queries, sub-queries
> etc but all automated and with no saving of intermediate data. Likewise,
> functional steps where data is analysed or visualised would also be
> single steps.
>
> Where do you think the distinction lies with your VOQL?
>
> (Being still a programmer at heart, the first option is certainly the
> easiest from the p.o.v. of building a workflow GUI - and will probably
> be the first to be implemented.)
>
> I would guess that VOQL is *only* a query language, so only acts on data
> sources, though mediated by the Web Service fronting the data centre or
> data source itself. Do we also need a VOWL: VO Workflow Language?
> (Perhaps not, will a workflow need to be exchanged between job control
> services?)
>
> How far do we want to push the VOQL towards a VOWL?
>
> Cheers,
> Tony.
>
> > -----Original Message-----
> > From: Ed Shaya [mailto:edward.j.shaya.1-at-gsfc.nasa.gov]
> > Sent: 21 February 2003 23:31
> > To: voql-at-ivoa.net
> > Cc: Cynthia Cheung
> > Subject: a high level language
> >
> >
> >
> > Here is a first cut at a highest level astronomical query language.
> > This is going to need some introduction so you can begin to
> > see where
> > we are going with this. The goal here is to create an XML
> > language that
> > can capture the scientific spirit of the NVO use cases. This requires
> > using object/property relationships and therefore builds on many of
> > concepts in AMASE such as the idea that the key items are the
> > astronomical object and their classes. The scientist should
> > be able to
> > build up a query from a form or gui. An integrator service then
> > analyses the voql and breaks it down to its atomic parts. The
> > integrator checks the metadata registries to see which atomic
> > parts go
> > to which data centers. The integrator checks with the WSDL
> > documents to
> > create queries properly for each data center. I would imagine
> > that XML
> > Query will play a big role here. But, XML Query is meant for
> > query on a
> > specific XML database of well known schema, and is not appropriate at
> > the distributed level. We can of course make voql look quite XML
> > Queryish. When the responses come back, the integrator applies the
> > logic functions to see which objects have all of the required
> > data. It
> > may send some results out to services and finally it forms the return
> > tables applying the requested statistics.
> >
> > Thus, we need to get our metadata registries into shape so that this
> > ontological type query can be useful (such as "these galaxies are
> > members of this cluster", "these layers are regions of
> > stars", "FRII is
> > a subclass of radio-galaxy" etc). And, we need to develop and
> > intelligent integrator. But, if Web Services are properly
> > used, atleast
> > the interactions between the services and the integrator will
> > be quite
> > straightforward. The recent development of WS
> > at CDS, ADECC and the paper by Brian Thomas and myself are a good
> > beginning down this road. But now I am starting the road from the
> > scientists into the VO.
> >
> > A request has any number of "constraints" which specifies an
> > object/@class and a number of properties (or properties
> > within a range
> > of values). As it is discovered that objects of this class are known
> > the values and errors are saved to variables. These variables can be
> > used in the following "constraints". Services may be used along the
> > way, and we can only enter an element named "service" which allow for
> > arbitrary attributes at this time. Finally, a "result"
> > consists of a
> > table where we specify which variables fall into each field. We may
> > also do statistics on the variable and enter the results into
> > that field.
> >
> > The schema does an xs:include on the Coords.xsd that Arnold
> > Rots worked
> > on. You can tell those elements because they begin with a capital
> > letter. I did have to do some liberal editing on Coords to
> > make it work
> > well in this.
> >
> > Further annotation of the schema is coming.
> >
> > Notes on use case 1:
> > Specifically this request says:
> > Constrain results to clusters of galaxies with ROSAT X-ray
> > measurements and images and at least one cataloged galaxy
> > member. It must also have a name, RA, and DE (ie. The
> > galaxies MAY also have a
> > ROSAT X-ray flux (optional
> > properties are in <or> elements). The $variables do not set
> > a range on the
> > required measurement. The variable is set by the database.
> > For a variable lists use @.
> >
> > Ed
Received on 2003-02-23Z23:04:05