Re: table metadata and the registry

From: Tony Linde <Tony.Linde-at-leicester.ac.uk>
Date: Tue, 08 May 2007 14:21:04 +0100


Hi Ray,

Thanks for that - it does make your proposal clearer. I'll try to make my own argument clearer.

 > to see is that all of the table information be retrievable in one URL.

That is certainly more reasonable. But what I don't understand is why, if we're to have a single method of getting the metadata, it cannot be returned in the format recognised by the registry: VOResource? It would take no more work to format the metadata as VOResource than some other format so why not stick with what we have, why come up with another format?

 > that once you have located the resource, you should go to the service
 > for the table metadata; even in this case, it is one extra call to
 > access the URL that retrieves it directly from the service.

As I said, this is better but, as an application writer, I know I'd want to hit only registries that contain all the information I need to complete a task (eg, if building a workflow, I'd want the user to be able to construct a query regardless of whether the service is currently up since the workflow execution engine will wait around until the service is available, not the user).

So, we would need to ensure that full registries can be distinguished form pointer-only ones and that applications can specify, at installation time, that they expect to be connected to a full registry.

 > You have to give me a little credit here :-)

I certainly do, Ray. I just want to determine whether your proposal can work and, at the moment, I see issues that aren't currrently addressed.

 > See the latest VOSI doc from Guy.  The motivation is address the
 > fine-grained metadata issue:  the former provides information intended
 > for the registry and the latter is a fatter record.  So while I see

As is obvious, I'd not read it. And, as I said, I can see no reason for more than one method which returns the full VOResource record.

 > what is the point of making them optional in place when you force the  > user to provide it anyway in another.

They are only optional because we could not agree on making them mandatory - I am not talking about metadata which is inapplicable to a given service but that which is simply not provided by a service although it is applicable (table and column metadata).

 > The general idea is that over time, we can develop discovery services
 > that leverage more and more information about resources.  Not all of
 > this information need be expressable in a VOResource schema.

But, as I said previously and above, this will lead to registries serving information in ways no other does and so, applications which rely on that information will only be able to work against specific registries.

 > To put it in concrete terms, the AstroGrid registry effectively
 > pressures the NVO registries into supporting fine-grained table
 > ...  First, you
 > encourage your publishers to provide table metadata.  We harvest these
 > records which in turn go out to our users as a result of queries.  We
 > have to then help users make sense of this information.

I don't understand this - the information is the same whether you get it from the registry alone or from the registry plus the service. What is the difference?

 > When there are
 > problems with the information, it reflects poorly on us, not you.

Poor metadata information is a problem we all have to tackle, not just AstroGrid: it is exacerbated by poor registry population applications and has nothing to do with the type of registry.

 > Second, your application in effect encourages our publishers to provide
 > table metadata to our registry if they are to be used in your
 > application, because your application only gets this information from
 > the registry.

This is not going to change, however the metadata is collected, whether by entry into the registry, by VOSI:getWhatever from the service or by the metadata URLs you propose. We will continue to provide full registry information and applications that rely on it: it is the only effective way to build responsive applications.

It is also, I might add, the only way to discover resources based on the additional metadata. If someone wants to discover x-ray catalogs with a given type of informaiton (specified by a ucd), how does it do this from a pointer-only registry, apart from getting every possible x-ray source and querying every one of them?

I guess if we do follow your proposal it will, over the next couple of years, show what the application developers really want by the number that tie themselves to AG-style registries vs NVO-style registries.

 > We do need to have a common understanding of what qualifies as
 > "fine-grained" information and develop mechanisms of exposing it only
 > when desired.  I don't think we have this, yet, but I will offer my
 > strawman at the meeting.

I'd like to expose more to the list since I won't be at the meeting.

 >> Bottom-line, Ray. ...
 > I hope I have clarified that this is not what I am proposing.

I'm still not so sure. I think that digging into the repercussions of your proposal will show that it is a major change.

At the core, I think the fundamental disagreement is over how the service provides its metadata: either by a standard method, getWhatever, which returns a full VOResource record; or, by a URL which returns some yet-to-be-determined format.

One last query, if the URL only returns the 'extra' metadata, where does the core service metadata come from? The registry only? Does this mean the service provider has to maintain metadata in two locations? Surely one additional benefit of the getWhatever method is that a service provider can update their registry record simply by changing the VOResource record served up by getWhatever?

(Shall go and lie down now... :) )

Cheers,
Tony.

Ray Plante wrote:
> Hi Tony,
>
> On Tue, 8 May 2007, Tony Linde wrote:

>>> I don't understand this.  Anybody who wants the fine-grained
>>> information
>>> can get it by following the URLs.  Anybody who doesn't want this
>>
>> But this is an enormous waste of time. I thought the VO was supposed 
>> to make
>> things better. What you propose will mean that anyone who wants to 
>> provide a
>> general query builder will have to query the registry for resources and,
>> when the user selects a resource, find the service and query it for table
>> information, then, for each table, query the service for column 
>> information:
>> all while the user patiently stares at a spinning hourglass. This is not,
>> IMO, an improvement on existing services.

>
> This is not at all what I am proposing. First of all, what I would like
> to see is that all of the table information be retrievable in one URL.
> (Multiple URLs might be allowed only as a means to enable very large
> collections of tables.)
>
> If you want the AstroGrid registry's search interface to return an
> "expanded" VOResource that includes the table metadata for the benefit
> of your query builder, I think that is fine. Others will probably argue
> that once you have located the resource, you should go to the service
> for the table metadata; even in this case, it is one extra call to
> access the URL that retrieves it directly from the service.
>
> The important thing for registries is the form of the records we share
> through the harvesting interface. If these records simply have pointers
> to table metadata, then those registries that do not wish to manage this
> information don't have to. The AstroGrid registry can pull the table
> metadata via the URL when the record is harvested.
>
> This idea was conceived to fit well into what you are already doing.
> For example, if we choose to use the table model from VOResource as the
> standard format, then it is trivial to pull the table metadata from the
> service and insert it into your internal copy of the VOResource record.
> You have to give me a little credit here :-)
>
>>> (getRegistration() or getMetadata()) to include it in is the service
>>
>> I thought there was only one method to get metadata and it returned the
>> VOResource record. I cannot see the need for more than one such.

>
> See the latest VOSI doc from Guy. The motivation is address the
> fine-grained metadata issue: the former provides information intended
> for the registry and the latter is a fatter record. So while I see the
> reason, I don't think it will accomplish its goal.
>
>>> Furthermore, with no guideline as to what information should go in
>>
>> I would certainly mandate that the full VOResource record be returned 
>> with
>> all the optional bits of that made mandatory.

>
> How does this help the provider? One of the reasons metadata are
> optional is because they won't necessarily apply to all resources. And
> what is the point of making them optional in place when you force the
> user to provide it anyway in another. This is not a recipe for quality
> metadata.
>
>>> is all or nothing.  Not only does the URL solution allow a registry to
>>> choose what fine-grained information it collects, but also it does not
>>> require that that information fit into the VOResource format.
>>
>> Why would the registry care about non-VOResource information? And what 
>> use
>> is a registry which cannot supply the information a calling service
>> requires?

>
> As a discovery service. Some will argue that a client should be getting
> information like table data directly from the service when it plans its
> queries, but I don't want to prevent you from getting it all from your
> registry.
>
> The general idea is that over time, we can develop discovery services
> that leverage more and more information about resources. Not all of
> this information need be expressable in a VOResource schema.
>
>> Do we now have to specify all the levels of metadata that a
>> registry can and cannot supply?

>
> While we may not agree at the moment on the best way to address the
> fine-grained issue, I hope we can at least agree on what the problem is.
>
> To put it in concrete terms, the AstroGrid registry effectively
> pressures the NVO registries into supporting fine-grained table metadata
> needed to support your query builder but which we feel should be handled
> in a different way. This pressure comes in two forms. First, you
> encourage your publishers to provide table metadata. We harvest these
> records which in turn go out to our users as a result of queries. We
> have to then help users make sense of this information. When there are
> problems with the information, it reflects poorly on us, not you.
> Second, your application in effect encourages our publishers to provide
> table metadata to our registry if they are to be used in your
> application, because your application only gets this information from
> the registry.
>
> We need to find a way that allows a registry like AstroGrid to innovate
> and provide new discovery and automated retrieval techniques that do not
> force other registries to follow suit.
>
>> Do we now have to specify all the levels of metadata that a
>> registry can and cannot supply?

>
> We do need to have a common understanding of what qualifies as
> "fine-grained" information and develop mechanisms of exposing it only
> when desired. I don't think we have this, yet, but I will offer my
> strawman at the meeting.
>
>>> metadata.  A simple service (provided by a registry) can translate that
>>> information into a standard format, so off the bat you get good
>>
>> How can the registry do that? None of the catalog services have 
>> *standard*
>> ways of providing metadata: a registry will have to implement separate 
>> code
>> for every potential service unless we specify new standards for these
>> URL-based metadata retrieval methods.

>
> SIA has a *standard* way of getting the table metadata: FORMAT=METADATA.
> A simple service that takes only an SIA base URL as a GET input can
> apply a stylesheet to return this information in a standard format. The
> others have *standard* ways but they are all different. A converter for
> each one provides a single way to get the table metadata from all of them.
>
>> Bottom-line, Ray. I think what you are proposing is a radical change 
>> to the
>> way the VO works. This turns the registry into a simple pointer to 
>> resources
>> and puts the onus on VO applications to do all the searching for 
>> metadata,

>
> I hope I have clarified that this is not what I am proposing.
>
> cheers,
> Ray
-- 
Tony Linde
Phone:  +44 (0)116 223 1292    Mobile: +44 (0)785 298 8840
Fax:    +44 (0)116 252 3311    Email:  Tony.Linde-at-leicester.ac.uk
Post:   Department of Physics & Astronomy,
         University of Leicester
         Leicester, UK   LE1 7RH
Web:    http://www.star.le.ac.uk/~ael

Project Manager, EuroVO VOTech   http://eurovotech.org
Programme Manager, AstroGrid     http://www.astrogrid.org
Received on 2007-05-08Z15:19:44