[Fwd: Re: Changes to VOSpace specification]

From: Dave Morris <dave-at-ast.cam.ac.uk>
Date: Fri, 24 Nov 2006 16:47:57 +0000


I'm forwarding these messages to the vospace list for discussion. There are a number of interested parties on the list who may not have seen the original messages.

Dave

attached mail follows:


Hi,

>> One of the problems with the first way is that there will be
>> different behaviour depending on whether the space has coupled or
>> decoupled data servers:
>> Coupled servers:
>> A. Client calls pushToVoSpace(<node>, <transfer>) returns <node> and
>> <transfer> - the latter containing details for the data server
>> B. Client transfers data to data server
>> C. Behind the scenes, the data server tells VOSpace that transfer has
>> occurred
>>
>> Decoupled servers:
>> A. Client calls pushToVoSpace(<node>, <transfer>) returns <node> and
>> <transfer> - the latter containing details for the data server
>> B. Client transfers data to data server
>> C. Client notifies VOSpace that transfer has been completed, e.g.
>> transferComplete(<node>).
>>
>> Unless, of course, we make the decoupled server scenario the only way
>> of doing it. We also have to enforce the use of transferComplete
>> otherwise the state of the data transfer is indeterminate.
>>
>> The alternate is that the first part of the process is the user
>> finding out what data servers are available with the getDataServers
>> call I suggest at the end:
>> A. Clients get list of data servers with getDataServers
>> B. Client transfers data to data server
>> C. Client registers data with VOSpace: register(<node>, URI of
>> location) returns the registered <node>
>>
>> This is much more in keeping with the other data discovery methods we
>> already have such as getProtocols. The process also does not leave
>> the space in an indeterminate state.
>
> except as I said before, one of the original use cases was that
> VOSpace was supposed to be managing physical location of data - if the
> client gets to choose where the data are sent first then it breaks
> that use case (though I suppose if the getDataServers call had as its
> argument the intended <node> this could still be done). I am not
> convinced the "breakage scenario" in this last process is better - if
> the client pushes data to a store then fails to register the node -
> the data server gets filled up without the VOSpace knowing about it -
> there are still 3 steps for the client to perform. In fact if the
> getDataServers call has an an argument a Node, then there is really
> little difference between process 2 qnd process 3 above - all that is
> different is when the VOSpace chooses to actually make the entry in
> its metadata tree.
>
> The last process also makes it more difficult to implement simple
> "coupled" data stores - it implies that we have to have authentication
> on simple http where the client knows an authentication secret in
> advance for instance to stop mass uploading of porn. Simple
> one-time-password implementations cannot work if the client has to
> contact the data server first.
>
> Another point is how decoupled are you talking about - I am presuming
> that the space still has access to the same filesystem as the
> decoupled data servers. If it is more decoupled than that then the
> space is unlikely to be able to control the contents on the data
> server, which will lead to inconsistent states anyway.
Although it sounds good in theory, the use case where VOSpace manages the physical location is actually very impractical for the reasons I outlined previously (access to source code of third party data servers to implement callback). Another popular use case here is that the user already has the data on a data server somewhere and does not want to load it all into VOSpace just to register it so that it is exposed this way. I actually think that both these use cases are far more realistic, although more complicated, than where the VOSpace manages the physical location. I also have complete decoupling in mind.

    Cheers,

    Matthew Received on 2006-11-24Z17:48:30