Re: Changes to VOSpace specification

From: Paul Harrison <pharriso-at-eso.org>
Date: Mon, 27 Nov 2006 13:04:41 +0100


On 27.11.2006, at 06:18, Dave Morris wrote:

> Dave Morris wrote:
>
>> Matthew Graham wrote:
>>
>>> 3. Decoupled data servers
>>
>>> .....
>>> This is actually the only transfer method which needs a
>>> modification: all the others work fine with decoupled servers.
>>
>> No transfer methods need modification.
>> You can achieve the same effect using a pullToVospace call instead
>> of pushToVospace.
>
> Actually, this is wrong (I was still thinking in terms of
> version-1.0 not version-1.+).
>
> As Paul points out in another email :
> "we cannot brush the asynchronicity of this call under the carpet"
>
> I agree.
> We already have two asynchronous calls in VOSpace-1.0, and no way
> to manage the implied state on the server.
>
> The pushToVospace and pullFromVospace methods both initiate
> transfers that will happen in the future, which implies setting up
> something on the server to handle them.
> But, we don't have any way of referring to new state information
> created on the server.
>
> I wasn't keen on making the other two import and export methods
> asynchronous - until we had a way of referring to, and managing,
> the transfer state.
> Once we have that mechanism in place, then we can go ahead and make
> all of the transfer methods asynchronous.
> As Matthew has highlighted, without it, we are creating state
> information on the server that the client can't reach.
>
> Now that we are opening up discussion about a new version of the
> spec. this might be a good time to bring up a couple of suggestions
> I made in September.
>
> http://wiki.astrogrid.org/bin/view/Astrogrid/VoSpace20060904
>
> Vospace version-1.1 proposal
> Section 2.3 - asynchronous transfers
>
> Paul added some notes when he saw them in September, and since then
> I have re-evaluated some of the ideas in light of his comments.
> So, the details in these documents are already out of date, but (I
> hope) the general idea is still sound.
To summarize my objections

vos://org.test!vospace/container/node?transfer

and a list of all pending transfers for the space itself could be referred to by such a query on the root node.

>
> Basically, we need something (an object) to represent the state of
> a transfer.
> We could create a new set of objects, methods, service WSDL and
> schema to handle the new status object(s).
>
> However, we already have client and server components for querying
> and modifying objects (nodes) on the VOSpace server.
> In which case, can we represent the transfer state as a node, with
> child nodes for each of the protocol options ?
> This would enable us to query and manage the state of a transfer
> without having to invent a completely new set of objects and
> service API.
>
> So, when I said this in my previous email :
> >
> > No transfer methods need modification.
> > You can achieve the same effect using a pullToVospace call
> instead of pushToVospace.
> >
> I was wrong, they do need modification to support asynchronous
> transfers properly.
> All four of the import and export methods need to return something
> that refers to the state information created on the server.
>
> If the status is represented as a VOSpace node, then the the import
> and export methods could either return a simple "vos://..."
> identifier of the status node, or the full status node element.
>
> So where at the moment we have :
>
> import response
> <!-- The updated node -->
> <node uri="vos://.....">
> .....
> </node>
> <!-- Transfer details -->
> <transfer>
> <view ..../>
> <protocols>
> .....
> </protocols>
> </transfer>
>
> This would change to :
>
> import response
> <!-- The updated node -->
> <node uri="vos://.....">
> .....
> </node>
> <!-- The transfer status node -->
> <node uri="vos://.....">
> .....
> </node>
>
> In effect, replacing the current transfer details node in the
> response with a status node.
> We would still be returning all the same information, but in a
> different wrapper.
>
> The new status node would contain the same information as the
> current transfer details, including the target view (as a property
> of the transfer node) and the list of the protocol options (as
> child nodes of the transfer). However, representing the information
> as nodes in the VOSpace service means that it remains persistent
> after the end of the initial SOAP call. This gives us something
> that the client and server can use to refer to the state
> information later on.
>
> The client can use the "vos://....." URI of a status node to update
> the state, either by manipulating the status node properties, or by
> using a new set of methods specifically for updating transfers,
> e.g. complete(), fail() and cancel().
>
> This part of the specification wouldn't mandate _what_ the client
> should have to do with the status node once it has been given it.
> It just gives the client and server a common way of referring to
> the status of that particular transfer.
>
> As Matthew described, some protocols may complete without requiring
> a notification callback from the client, e.g a HTTP put to a
> servlet within the VOSpace service. In which case, the status node
> just provides the client, or a 3rd party, with a way of checking if
> the transfer has been completed yet.
>
> Other protocols will require some form of callback.
> In Matthews example, if the protocol involves a put to a gFTP
> server followed by an 'adoption' step where the VOSpace server
> updates its metadata to include the uploaded file, then the client
> may have to tell the VOSpace server when the data is ready.
>
> The client could use the "vos://...." URI of the transfer status in
> the callback, to tell the server which transfer (and protocol
> option) it is talking about. We need to remember that the VOSpace
> server may have offered more than one protocol option for the
> transfer, so the client needs to tell it which option has been
> completed, to enable the server to collect the data from the right
> place and cancel the others.
>
> The details of what the callback means, and what the server does
> with it, would be specific to the implementation of the protocol.
> If the VOSpace and gFTP server are acting as one entity, then the
> VOSpace server may leave the data within the gFTP server file
> system, and just update the node metadata.
> On the other hand, if the gFTP server is acting as a staging post,
> then the VOSpace server may collect the data from the back end and
> move it to another location within its own file system.
>
> In summary :
>
> Mathew has highlighted the fact that we already need a callback
> mechanism for some of the existing import and export protocols.

And this does has a bearing on the delivery of a 1.0 standard if we are to promise backward compatibility.....
>
> Whatever callback mechanism we adopt, it will need some way to
> refer to the persistent state within the VOSpace server, that
> represents the state of the transfer and the individual protocol
> options within it. Representing these as VOSpace nodes means that
> we can use the existing "vos://..." URI scheme to refer to them,
> and the existing API to list, query and modify them.

It might just be a question of teminology, but as I said the idea that they are just "ordinary nodes" that appear in the container listings, I do not like, however if they are accessible via the query part of the URL, I am more amenable. However, although the existing api is ok for querying and listing control objects, I am not so sure it is that suitable for modifying them - after all, this whole discussion has flared up because of the complexities of the data upload within VOSpace - whilst these complexities are acceptable for uploading data objects (so that we can take advantage of the special qualities of existing data transfer protocols), an specialized api might be more suitable for modifying control objects.

>
> Once we have a standard way of referring to the persistent state of
> a transfer, then my previous email about making the details of the
> callback specific to the protocol might make sense. Without it, the
> client has no way of telling the server which transfer and protocol
> option it is talking about.

I think that we should try to extract as much common protocol behaviour as possible - I think that as soon as a protocol is not completely described by the transfer URL we get into complications that would be better to avoid to maintain interoperability - we need to utilise as much of the common characteristics of a protocol as possible before layering what are non-standard protocol behaviours on top of externally defined protocols Received on 2006-11-27Z13:05:04