I'm forwarding these messages to the vospace list for discussion.
There are a number of interested parties on the list who may not have
seen the original messages.
Dave
attached mail follows:
Hi,
Paul Harrison wrote:
>> 1. Properties
>> The current scheme is limited to key-value pairs where the value is
>> interpreted as a string. A problem with this that some key-values pairs
>> might be intended to represent other datatypes, e.g. a date or a float,
>> and without this typing information, it is impossible to check the
>> validity of the value. It is always possible for a client to add this
>> information with an xsi:type attribute, e.g.
>> <property uri="ivo://net.ivoa/properties/date"
>> xsi:type="xs:dateTime">2006-11-22T18:50:03Z</property>
>> but this might not be interpreted properly by the browser. However, if
>> we actually add a type attribute then we can cover this:
>> <property uri="ivo://net.ivoa/properties/date"
>> type="xs:dateTime">2006-11-22T18:50:03Z</property>.
>> The attribute is optional and non-inclusion implies that the datatype is
>> string. The value of the attribute can either be an XML datatype or a
>> reference to an XML schema that describes the data structure thus
>> allowing for more complicated properties such as:
>> <property uri="ivo://net.ivoa/properties/color" type="myschema.xsd">
>> <color>
>> <red>123</red>
>> <blue>234</blue>
>> <green>89</green>
>> </color>
>> </property>
>
> I am not so sure that this is a good idea - more or less the whole
> point of properties is that the server does not check the validity of
> them - they are simply opaque strings except for the ones that the
> server knows about, and then it already knows the data type. I think
> that this is adding an extra level of complexity to implementation
> requirements, without too much benefit, as it is ultimately only the
> client that is really going to understand your myschema.xsd example,
> and it could achieve the same simply by passing the xml as a string.
A savvy user can always impose this themselves with xsi:type and I would
rather we had a mechanism to do this instead - in fact, it would be
interesting to do the exercise and see what effect putting xsi:type on
the property element had.
>>
>> 2. Views
>> This needs to be renamed to what it actually is, i.e. format(s), since
>> the current name is universally confusing.
>
> yes! - but I think it is really two concepts - on import it is just a
> statement of the format of the data - on export it is a request to
> convert the data to a format. The whole transfer object in the WSDL
> should reflect this difference by not using the same data structure
> for requests and returned information, as this is also confusing IMHO
> as the distinction is lost between the statements
> "this data is in format x"
> "convert this data to format y"
I think that having format on its own on input and contained within
accepts and provides on output is fine - so did the various people, it
was just the naming of view that was at issue.
>
>
>> 3. Decoupled data servers
>> Under the current scheme, it is assumed that there is some communication
>> channel between the VOSpace and a data server, e.g. a gridftp server, so
>> that when a pushTo or pullFrom is completed, the data server can notify
>> the VOSpace service that the transfer has completed. This sort of
>> activity is particularly necessary when the endpoint is a logical one,
>> e.g. a one-time-use URL. This design is fine for the cases where we have
>> implemented the data servers ourselves or have access to the source code
>> so that we can add the callback; however, what happens when you are
>> dealing with an off-the-shelf data server where this is not the case or
>> non-trivial, e.g. the Globus gridftp server.
>> One solution is to have the client notify the VOSpace when the
>> transaction is complete (since this really is only a problem for the
>> asynchronous services) so pushToVoSpace would become:
>>
>> A. Client calls pushToVoSpace(<node>, <transfer>) returns <node> and
>> <transfer> - the latter containing details for the data server
>> B. Client transfers data to data server
>> C. Client notifies VOSpace that transfer has been completed, e.g.
>> transferComplete(<node>).
>>
>> There are a couple of problems with this, however: the client has to
>> call the space twice and might forget to do the notification call and
>> what happens if the transfer fails or is not done.
>> An alternate approach is to do the data transfer first of all and then
>> register the data object with the node including its physical
>> location so pushToVoSpace becomes:
>>
>> A. Client transfer data to data server
>> B. Client registers data with VOSpace: register(<node>, URI of location)
>> returns the registered <node>
>>
>> This is actually the only transfer method which needs a modification:
>> all the others work fine with decoupled servers. In fact, instead of
>> adding an additional operation, we can modify pushToVoSpace either to
>> have an additional URI argument: pushToVoSpace(<node>, <transfer>,
>> location-uri) or we could just incorporate the location-uri into the
>> transfer so that if the protocol contains an endpoint then that
>> endpoint is interpreted to be the physical location of the data object.
>>
>> One thing that would be useful is another operation to return the
>> list of (decoupled) data servers (resources in SRB speak) that the
>> VOSpace is using so I would suggest that we add a getDataServers
>> operation.
>>
>
> this has always been my point that we cannot brush the asynchronicity
> of this call under the carpet - however, I think that we have to go
> with the first of your two options, as I do not see how the client
> can really know which data server to transfer data to without
> contacting the VOSpace first to say where in VOSpace they want to put
> the data - it is then up to the VOSpace to say which data server to
> use as the VOSpace knows the topology. I think that the
> transferComplete() call would have to be "advisory", in the sense that
> the VOSpace should keep track of all pending inward transfers and try
> to determine if the data have arrived after a given time (e.g. for an
> ftp service it could do an ls and see if the size is the same as the
> size that was specified in the original pushToVoSpace) - if the
> VOSpace does received a transferComplete() call, it can assume that
> the client believes that the data have arrived safely.
One of the problems with the first way is that there will be different
behaviour depending on whether the space has coupled or decoupled data
servers:
Coupled servers:
A. Client calls pushToVoSpace(<node>, <transfer>) returns <node> and
<transfer> - the latter containing details for the data server
B. Client transfers data to data server
C. Behind the scenes, the data server tells VOSpace that transfer has
occurred
Decoupled servers:
A. Client calls pushToVoSpace(<node>, <transfer>) returns <node> and
<transfer> - the latter containing details for the data server
B. Client transfers data to data server
C. Client notifies VOSpace that transfer has been completed, e.g.
transferComplete(<node>).
Unless, of course, we make the decoupled server scenario the only way of doing it. We also have to enforce the use of transferComplete otherwise the state of the data transfer is indeterminate.
The alternate is that the first part of the process is the user finding out what data servers are available with the getDataServers call I suggest at the end:
A. Clients get list of data servers with getDataServers B. Client transfers data to data server C. Client registers data with VOSpace: register(<node>, URI oflocation) returns the registered <node>
This is much more in keeping with the other data discovery methods we already have such as getProtocols. The process also does not leave the space in an indeterminate state.
Cheers,
Matthew Received on 2006-11-24Z17:53:33