Logical storage units in VOSpace 1.1

Arun Jagatheesan arun at sdsc.edu
Sun Aug 19 23:55:51 PDT 2007


Hi Sorry for the tardiness in replying to this thread...

Having logical storage namespace as part of the VOSpace is a  
requirement to go along with the VoSpace design objective - to hide  
as much as possible about the internals from the outside wold -  
without making the internals as  a blackbox that is not useful in  
real world situations. Without this, the model of VoSpace would not  
reflect the real world and the protocol would not be useful to take  
advantage big data centers.

We are not interested in exposing the hardware or hardware details to  
users (seems like the example might have led to some confusion).  We  
want to model the real world as it is -but make it logical and allow  
late binding.  We are just trying to associate space (as in storage  
space) with a logical identifier. VOSpace 1.0 already has logical  
data namespace  (nodes).  The VOSpace concept allows late binding for  
the major entities involved in a data transfer between spaces:

- Data Name  defined as Logical Name    and represented as Node
- Data transfer defines the protocols and the data type "transfer" is  
the data type that we use.
- Data storage is just defined as space in v1.0 - NO modeling has  
been done to reflect this in the architecture.

The dataname and protocol undergo late binding  (i.e) the physical  
name of the file and the data transfer protocol are not binded untill  
the client and server decide to commit the transaction (transfer).  
But, the storage space is left opaque. The VOSpace in a large data  
center could have multiple servers - assume there are multiple FTP  
or  iROD/SRB servers.   As per v1.0, the protocol allows getProtocols 
() to define either FTP or iRODS protocol to be used.  If FTP was  
used, the client would not know which physical storage space to use.   
This might seem like an advantage on the surface, as the VOSpace  
server could decide any FTP server that is available. However, it  
restricts the client from taking advantage of late binding. While the  
client had the luxury to do a late-binding on data-transfer protocol,  
it does not have the freedom to ask for the "prefered-storage-type"  
or "prefered-storage-resource" or "less-expensive-storage".

When the vo-space control protocol wanted to give the luxury to the  
client to pick and negotiate the data transfer protocol, shouldn't it  
give the follow-up luxury or smartness to the client to decide on the  
"class of storage" to be used?  The client could have a getResources 
() call.    Rather than providing the physical end-points, this call  
would return the identifiers for the logical storage units. Each data  
node, apart from providing its logical name, would also have the  
identifiers of the logical storage units where the data is physically  
located. Thus, this allows us to model replicas, replication, data  
migration etc., as part of the data model it self.

We dont use the physical identifier or end-point of the storage (like  
an IP address or 132.0.0.1) - instead we provide logical identifiers  
for these storage units such as "sdsc-tape", "manchester-disk", "sdsc- 
gpfs". These are mostly human readable names that could also help a  
end-user - it could have additional attributes  (optional resource  
properties) to help the applications to decide on a storage unit to use.

Cheers,
Arun





On Aug 15, 2007, at 6:06 AM, Paul Harrison wrote:

> Hi,
>
> I think that data replication is an important functionality of  
> VOSpace, but I think that introducing the concept of logical  
> storage units in this fashion into the "public" api might not be  
> very easy to use in practice without knowledge of the underlying  
> storage system, and additionally is contrary to one of the aims of  
> the VOSpace design of trying to hide as much as possible about the  
> internals from the outside world.




> The use case that you describe could also be handled in a more easy- 
> to-reason-about way by having "move to fast storage" and "move to  
> slow storage" functions in the api, or having similar hints in the  
> various get api calls .Perhaps a compromise using a similar api to  
> the one you suggest, is that the "hardware units" are generic  
> classes of unit rather than each vospace defining its own set of  
> proprietary hardware units. VOSpaces that want to, simply map the  
> generic classes onto specific internal hardware units  
> transparently.  The VOSpace then hides all the details of exactly  
> where items are stored.
>
> Paul Harrison
>
> On 13.08.2007, at 19:32, Matthew Graham wrote:
>
>> Hi,
>>
>> A request has from our friends at SDSC to include references to  
>> the actual storage units that data is being deposited on. The use  
>> case is data replication so, for example, I want to move/copy a  
>> data object from a slow tape archive to an ultrafast disk but both  
>> hardware units are within the same VOSpace or I want to retrieve a  
>> data object from the ultrafast disk copy and not the slow tape one.
>>
>> I think that we can incorporate this easily into our existing data  
>> model. We will refer to hardware units as logical storage units  
>> with the implication that they are identified via a logical  
>> identifier (URI) that is set by the particular VOSpace  
>> implementation. To get the list of available storage units from a  
>> VOSpace, we will need a method: getLogicalStorageUnits() which  
>> will return a list of URIs. These URIs may be resolvable to a  
>> description of the storage unit.
>>
>> The logical storage unit identifier will be an optional argument  
>> in the <transfer> entity so that as part of the data transfer  
>> negotiation, the user can specify a list of storage units that  
>> they want the data transferred to/from. The identifier will also  
>> be an optional argument in the <node> entity so that specific  
>> hardware can be targetted in moving and copying data.
>>
>> Comments, suggestions, etc.
>>
>>    Cheers,
>>
>>    Matthew



More information about the vospace mailing list