I just want to reiterate that I think VOQL should be a means for the astronomer to clearly specify what astronomical knowledge he/she wants. It should not require the astronomer to know how data is arranged at each of the data centers, nor all of the steps required. Astronomical data is held in a bewildering assortment of ways. There are object oriented data centers (IPAC, CDS and AMASE), there are thousands of catalogs at CDS each in their own eccentric logical arrangement (yes, they are all in a similar numeric format, but each is ontologically unique), object-relational databases, XML databases (GSFC), log entry observations, and into the future one can expect this to only get more complex. The average astronomer can not be expected to be able to properly layout an optimal workflow.
The astronomer thinks "I am interested in objects with such and such properties" and VOQL should allow a description of these constraints, even if they are quite complex or detailed (eg. red-giants with white dwarf companions and proper motions greater than x and variable by less than y%). Additionally, the astronomer says, "while selecting objects that fit this criterion, keep track of the following other properties of such objects." It is sobering how quickly the detailed work flow for such queries gets beyond what humans can reliably handle. The astronomer doesn't know if it is better to first get index numbers of records that fit the criteria and then later extract additional properties or to do both at the same time. Are we looking through tables or querying object databases? Are we looking at tables of red-giants or white dwarfs or proper motions or binary stars? Are there tables specifically of variability or do we need to look at photometry catalogs and look for time variations? How do we do a many way cross-correlation, because the intermediate return has some tables with some properties and other tables have other properties although some tables have a few of each? Etc.
So now the detractors say, well if you don't know how to do this, how do you expect the machine to know better? The answer is that computers simply are better at the restrictive problem here and repetitive task breakdown and assignment. To be sure, it will take us some time to find the appropriate set of algorithms and "workflow" language to automate this. But once done it should evolve quite slowly.
Next the detractor says, "I can break your example down easily. Go to an all-sky binary star catalogs and get lists of binaries with red-giant/white dwarf members, then send the list of ra/dec positions to a proper motion resource and have it delete candidates with low proper motions. Then send that list to a variable star resource and have it delete highly variable ones. What is so hard about that, I can set up the entire workflow for that in about 20 minutes?"
First of all, I agree that the user should indeed have access to individual resources in this simple manner. And this may get the user some objects that fulfill the criteria. But what if the user needs a more thorough search. Let's say that it is not easy to see white dwarfs around red-giants, so one is likely to get only a few hits from the general catalogs. Inevitably a more in depth search is required. There are many useful sources for each of the desired properties, most do not have all sky coverage, so coverage maps need to be examined for proper overlap. Now we are talking about matching with SLOAN and 2-MASS and PSS and a few dozen other cataloged data sources, perhaps some were published in the last few months. The workflow development rapidly grows to a couple of days.
"This will be ruinous," says the conservative. "How is a machine to know when enough is enough? Perhaps it will attempt photometery on all objects on each and every image of the sky ever taken in a desperate attempt to answer your query as thoroghly as technologically possible?" Yes, this is a concern. On the other hand, maybe that is what the user has in mind. For requests that have options that would take more than a few minutes the user should first be sent a high level summary of the possible paths to satisfy the request and estimates of the time required. The user then selects one of these options before any such action. So, the concern is, and always has been, what prevents a single user from tying up vast resources too often? There must be time alotments or financial costs to put limits on what users do.
Admittedly, this level of automated service will not even begin to come together before the final year of the NVO grant. But it is useful to carefully outline and agree on the properties of the (pen)ultimate user interface before beginning to develop a system. The alternative is everyone marching in a different direction because each sees a different end goal.
Back to annotation.
Ed
Kirk Borne wrote:
>Tony: thanks for clarifying distinctions between workflow and query,
>and between data services and functional services. This is in fact
>a distinction that Ed, Brian, and I discussed, but somehow I mangled
>it in my example. It is perhaps appropriate and prudent therefore
>to keep those "functional" workflow actions separate from the VOQL's
>query actions.
>
>- Kirk
>
>
>
>
>>From: "Tony Linde" <ael-at-star.le.ac.uk>
>>To: <voql-at-ivoa.net>
>>Subject: RE: a high level language
>>Date: Mon, 24 Feb 2003 09:42:34 -0000
>>
>>Hi Kirk,
>>
>>Thanks for the reply.
>>
>>
>>
>>>This query involves
>>>multi-wavelength data and multi-modal data (catalogs, spectra), and
>>>thereby the query must be parsed and distributed to
>>>appropriate data centers and maybe the data need to be
>>>shipped to some service (e.g., to generate line lists from
>>>optical spectra).
>>>
>>>
>>This is what I assumed from Ed & Brian's document and why I raised the
>>question. I can see that a *query* language might cover more than a
>>simple single-dataset query, eg selecting from a join of distributed
>>datasets with sub-selects etc. - the sort of thing you can do at the
>>moment using SQL on the more advanced databases (though without the
>>distributed bit).
>>
>>However, when it comes to shipping intermediate data to another service
>>for analysis, reduction etc., I would consider this to be *workflow*,
>>requiring a separate description using a workflow language (as in the
>>commercial world with the recent development of BPEL4WS).
>>
>>
>>
>>>VOQL is a standardized language to capture scientist's
>>>queries to the distributed heterogeneous collections that
>>>comprise the VO.
>>>
>>>
>>There I would agree. But the VO comprises more than data services, it
>>includes functional services such as those to 'generate line lists'.
>>Pushing the results of a query to such services, or using the results of
>>a query in another, later, query amount to workflow construction.
>>
>>There is a danger that, in trying to combine queries and workflow in a
>>single language, we will overcomplicate the matter and reduce the chance
>>of using or extending existing efforts in the development of query and
>>workflow languages.
>>
>>Cheers,
>>Tony.
>>
>>
>>
>>>-----Original Message-----
>>>From: Kirk Borne [mailto:borne-at-rings.gsfc.nasa.gov]
>>>Sent: 23 February 2003 22:01
>>>To: ael-at-star.le.ac.uk
>>>Cc: voql-at-ivoa.net
>>>Subject: Re: a high level language
>>>
>>>...
>>>
>>>
>
>+------------------------------------+-------------------------------------+
>| Dr. Kirk D. Borne | mailto:Kirk.Borne-at-gsfc.nasa.gov |
>| Institute for Science & Technology, Raytheon (IST-at-R) |
>| NASA Goddard Space Flight Center | |
>| Astrophysics Data Facility | Phone: 301-286-0696 |
>| Code 631 | or 301-286-2772:Kathy Starling |
>| Greenbelt, MD 20771 | FAX: 301-286-1771 |
>+------------------------------------+-------------------------------------+
> US Virtual Observatory: http://us-vo.org/
> Staff page: http://rings.gsfc.nasa.gov/~borne/bio_borne_kirk.html
>
>
>
Received on 2003-02-24Z21:09:07