Logical storage units in VOSpace 1.1
arun at sdsc.edu
Sun Aug 19 23:55:51 PDT 2007
Hi Sorry for the tardiness in replying to this thread...
Having logical storage namespace as part of the VOSpace is a
requirement to go along with the VoSpace design objective - to hide
as much as possible about the internals from the outside wold -
without making the internals as a blackbox that is not useful in
real world situations. Without this, the model of VoSpace would not
reflect the real world and the protocol would not be useful to take
advantage big data centers.
We are not interested in exposing the hardware or hardware details to
users (seems like the example might have led to some confusion). We
want to model the real world as it is -but make it logical and allow
late binding. We are just trying to associate space (as in storage
space) with a logical identifier. VOSpace 1.0 already has logical
data namespace (nodes). The VOSpace concept allows late binding for
the major entities involved in a data transfer between spaces:
- Data Name defined as Logical Name and represented as Node
- Data transfer defines the protocols and the data type "transfer" is
the data type that we use.
- Data storage is just defined as space in v1.0 - NO modeling has
been done to reflect this in the architecture.
The dataname and protocol undergo late binding (i.e) the physical
name of the file and the data transfer protocol are not binded untill
the client and server decide to commit the transaction (transfer).
But, the storage space is left opaque. The VOSpace in a large data
center could have multiple servers - assume there are multiple FTP
or iROD/SRB servers. As per v1.0, the protocol allows getProtocols
() to define either FTP or iRODS protocol to be used. If FTP was
used, the client would not know which physical storage space to use.
This might seem like an advantage on the surface, as the VOSpace
server could decide any FTP server that is available. However, it
restricts the client from taking advantage of late binding. While the
client had the luxury to do a late-binding on data-transfer protocol,
it does not have the freedom to ask for the "prefered-storage-type"
or "prefered-storage-resource" or "less-expensive-storage".
When the vo-space control protocol wanted to give the luxury to the
client to pick and negotiate the data transfer protocol, shouldn't it
give the follow-up luxury or smartness to the client to decide on the
"class of storage" to be used? The client could have a getResources
() call. Rather than providing the physical end-points, this call
would return the identifiers for the logical storage units. Each data
node, apart from providing its logical name, would also have the
identifiers of the logical storage units where the data is physically
located. Thus, this allows us to model replicas, replication, data
migration etc., as part of the data model it self.
We dont use the physical identifier or end-point of the storage (like
an IP address or 126.96.36.199) - instead we provide logical identifiers
for these storage units such as "sdsc-tape", "manchester-disk", "sdsc-
gpfs". These are mostly human readable names that could also help a
end-user - it could have additional attributes (optional resource
properties) to help the applications to decide on a storage unit to use.
On Aug 15, 2007, at 6:06 AM, Paul Harrison wrote:
> I think that data replication is an important functionality of
> VOSpace, but I think that introducing the concept of logical
> storage units in this fashion into the "public" api might not be
> very easy to use in practice without knowledge of the underlying
> storage system, and additionally is contrary to one of the aims of
> the VOSpace design of trying to hide as much as possible about the
> internals from the outside world.
> The use case that you describe could also be handled in a more easy-
> to-reason-about way by having "move to fast storage" and "move to
> slow storage" functions in the api, or having similar hints in the
> various get api calls .Perhaps a compromise using a similar api to
> the one you suggest, is that the "hardware units" are generic
> classes of unit rather than each vospace defining its own set of
> proprietary hardware units. VOSpaces that want to, simply map the
> generic classes onto specific internal hardware units
> transparently. The VOSpace then hides all the details of exactly
> where items are stored.
> Paul Harrison
> On 13.08.2007, at 19:32, Matthew Graham wrote:
>> A request has from our friends at SDSC to include references to
>> the actual storage units that data is being deposited on. The use
>> case is data replication so, for example, I want to move/copy a
>> data object from a slow tape archive to an ultrafast disk but both
>> hardware units are within the same VOSpace or I want to retrieve a
>> data object from the ultrafast disk copy and not the slow tape one.
>> I think that we can incorporate this easily into our existing data
>> model. We will refer to hardware units as logical storage units
>> with the implication that they are identified via a logical
>> identifier (URI) that is set by the particular VOSpace
>> implementation. To get the list of available storage units from a
>> VOSpace, we will need a method: getLogicalStorageUnits() which
>> will return a list of URIs. These URIs may be resolvable to a
>> description of the storage unit.
>> The logical storage unit identifier will be an optional argument
>> in the <transfer> entity so that as part of the data transfer
>> negotiation, the user can specify a list of storage units that
>> they want the data transferred to/from. The identifier will also
>> be an optional argument in the <node> entity so that specific
>> hardware can be targetted in moving and copying data.
>> Comments, suggestions, etc.
More information about the vospace