[Fwd: Re: Changes to VOSpace specification]

From: Dave Morris <dave-at-ast.cam.ac.uk>
Date: Fri, 24 Nov 2006 16:33:35 +0000


I'm forwarding these messages to the vospace list for discussion. There are a number of interested parties on the list who may not have seen the original messages.

Dave

attached mail follows:


On 23.11.2006, at 05:16, Matthew Graham wrote:

> Hi,
>
> I've had meetings in the past two weeks with the folks at JHU about
> putting a VOSpace interface onto CasJobs and Arun Jagatheesan at SDSC
> about the VOSpace interface with SRB. Predictably both parties raised
> issues but there were three that both brought up and I think we
> need to
> address them:
>
> 1. Properties
> The current scheme is limited to key-value pairs where the value is
> interpreted as a string. A problem with this that some key-values
> pairs
> might be intended to represent other datatypes, e.g. a date or a
> float,
> and without this typing information, it is impossible to check the
> validity of the value. It is always possible for a client to add this
> information with an xsi:type attribute, e.g.
> <property uri="ivo://net.ivoa/properties/date"
> xsi:type="xs:dateTime">2006-11-22T18:50:03Z</property>
> but this might not be interpreted properly by the browser. However, if
> we actually add a type attribute then we can cover this:
> <property uri="ivo://net.ivoa/properties/date"
> type="xs:dateTime">2006-11-22T18:50:03Z</property>.
> The attribute is optional and non-inclusion implies that the
> datatype is
> string. The value of the attribute can either be an XML datatype or a
> reference to an XML schema that describes the data structure thus
> allowing for more complicated properties such as:
> <property uri="ivo://net.ivoa/properties/color" type="myschema.xsd">
> <color>
> <red>123</red>
> <blue>234</blue>
> <green>89</green>
> </color>
> </property>

I am not so sure that this is a good idea - more or less the whole point of properties is that the server does not check the validity of them - they are simply opaque strings except for the ones that the server knows about, and then it already knows the data type. I think that this is adding an extra level of complexity to implementation requirements, without too much benefit, as it is ultimately only the client that is really going to understand your myschema.xsd example, and it could achieve the same simply by passing the xml as a string.

>
> 2. Views
> This needs to be renamed to what it actually is, i.e. format(s), since
> the current name is universally confusing.

yes! - but I think it is really two concepts - on import it is just a statement of the format of the data - on export it is a request to convert the data to a format. The whole transfer object in the WSDL should reflect this difference by not using the same data structure for requests and returned information, as this is also confusing IMHO as the distinction is lost between the statements "this data is in format x"
"convert this data to format y"

> 3. Decoupled data servers
> Under the current scheme, it is assumed that there is some
> communication
> channel between the VOSpace and a data server, e.g. a gridftp
> server, so
> that when a pushTo or pullFrom is completed, the data server can
> notify
> the VOSpace service that the transfer has completed. This sort of
> activity is particularly necessary when the endpoint is a logical one,
> e.g. a one-time-use URL. This design is fine for the cases where we
> have
> implemented the data servers ourselves or have access to the source
> code
> so that we can add the callback; however, what happens when you are
> dealing with an off-the-shelf data server where this is not the
> case or
> non-trivial, e.g. the Globus gridftp server.
> One solution is to have the client notify the VOSpace when the
> transaction is complete (since this really is only a problem for the
> asynchronous services) so pushToVoSpace would become:
>
> A. Client calls pushToVoSpace(<node>, <transfer>) returns <node> and
> <transfer> - the latter containing details for the data server
> B. Client transfers data to data server
> C. Client notifies VOSpace that transfer has been completed, e.g.
> transferComplete(<node>).
>
> There are a couple of problems with this, however: the client has to
> call the space twice and might forget to do the notification call and
> what happens if the transfer fails or is not done.
> An alternate approach is to do the data transfer first of all and then
> register the data object with the node including its physical
> location so pushToVoSpace becomes:
>
> A. Client transfer data to data server
> B. Client registers data with VOSpace: register(<node>, URI of
> location)
> returns the registered <node>
>
> This is actually the only transfer method which needs a
> modification: all the others work fine with decoupled servers. In
> fact, instead of adding an additional operation, we can modify
> pushToVoSpace either to have an additional URI argument:
> pushToVoSpace(<node>, <transfer>, location-uri) or we could just
> incorporate the location-uri into the transfer so that if the
> protocol contains an endpoint then that endpoint is interpreted to
> be the physical location of the data object.
>
> One thing that would be useful is another operation to return the
> list of (decoupled) data servers (resources in SRB speak) that the
> VOSpace is using so I would suggest that we add a getDataServers
> operation.
>

this has always been my point that we cannot brush the asynchronicity of this call under the carpet - however, I think that we have to go with the first of your two options, as I do not see how the client can really know which data server to transfer data to without contacting the VOSpace first to say where in VOSpace they want to put the data - it is then up to the VOSpace to say which data server to use as the VOSpace knows the topology. I think that the transferComplete() call would have to be "advisory", in the sense that the VOSpace should keep track of all pending inward transfers and try to determine if the data have arrived after a given time (e.g. for an ftp service it could do an ls and see if the size is the same as the size that was specified in the original pushToVoSpace) - if the VOSpace does received a transferComplete() call, it can assume that the client believes that the data have arrived safely.

Cheers,

        Paul. Received on 2006-11-24Z17:53:42