Re: On clarification for getCapabilities and stageData

From: Doug Tody <dtody-at-nrao.edu>
Date: Thu, 26 Apr 2007 15:50:13 -0600 (MDT)


On Thu, 26 Apr 2007, Doug Tody wrote:
> Another example could be a "google like" global discovery mechanism,
> based on harvesting of more detailed dataset metadata from services,
> similar to the way Google harvests information from Web pages (this
> came up earlier in a different forum - see the message I append below).

Sorry, I appended the wrong message to the previous email, although it turned out to be a relevant one as well. The one on a "google-like" discovery mechanism follows. - Doug

---

>From dtody-at-nrao.edu Thu Apr 12 10:38:34 2007
Date: Thu, 12 Apr 2007 10:36:45 -0600 (MDT) From: Doug Tody <dtody-at-nrao.edu> To: NVO Technical Working Group <techwg-at-us-vo.org> Subject: [nvo-techwg] Registry granularity Hi All - Just to, very briefly, follow up on our discussion in the telecon this morning. What we are discussing is a Google-like capability, where a centralized engine retrieves massive amounts of dataset metadata from data services all over the world (or some large portion of it at least), integrates this into some massive database, and provides intelligent search tools for sophisticated data discovery. I think this would be a wonderful thing to have. The question is not whether to have such a capability, but where in the system architecture such a capability should live. It is not a DAL service, as data services only provide access to specific data collections, usually at one site per service instance; data is distributed and hence so are the data services. From the perspective of a data service it does not matter whether or where such metadata is cached, the service just provides the information about the data which it manages, and the client does what it wants with it. Some are now arguing that such capabilities should be built into the resource registry; after all, the registry is the one part of our architecture where information about distributed data collections comes together, and an individual dataset could be considered a type of "resource". However, turning what is now a resource registry into a Google-like advanced discovery service is a major change in the scope of the registry. Describing, indexing, and providing a search capability for all astronomy data collections, services, software components, etc., with automated replication to multiple sites, and standard query interfaces, is already a fairly complex problem. All I am suggesting is that perhaps an astronomy Google should be a third element of the system. It seems like a potentially quite complex problem to me. Integrating actual dataset metadata on hundreds of millions of datasets into a centralized database, with dynamic updating, some level of quality control, efficient indexing, uniform search capabilities, intelligent ranking heuristics, etc., seems like a fairly complex problem to me. In fact it is hard enough that one might want to experiment with multiple implementations (as with Google vs Yahoo vs Microsoft Live Search etc.) before trying to define such things as standard query interfaces. The query interfaces required for this sort of thing may want to be somewhat different than those for high level resource discovery. So - the question is not whether to do this, but how to do it, and where the functionality should be provided. It is a system architecture question, and a very important one. - Doug
Received on 2007-04-26Z23:50:33