[openstack-dev] LTFS integration with OpenStack Swift for scenario like

Discussion:

[openstack-dev] LTFS integration with OpenStack Swift for scenario like - Data Archival as a Service .

Sachin Goswami

2014-11-14 06:19:08 UTC

Hi ,
We would like to know your opinion about the same.

In OpenStack Swift - xfs file system is integrated which provides a
maximum file system size of 8 exbibytes minus one byte (263-1 bytes). We
are studying use of LTFS integration with OpenStack Swift for scenario
like - Data Archival as a Service .

Was integration of LTFS with Swift considered before? If so, can you
please share your study output?
Will integration of LTFS with Swift fit into existing Swift architecture ?

We would like to know your opinion about the same and pros / cons.
LTFS Link:
http://en.wikipedia.org/wiki/Linear_Tape_File_System

Best Regards
Sachin Goswami
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Samuel Merritt

2014-11-14 18:06:53 UTC

Permalink

Post by Sachin Goswami
In OpenStack Swift - xfs file system is integrated which provides a
maximum file system size of 8 exbibytes minus one byte (263-1 bytes).

Not exactly. The Swift storage nodes keep their data on POSIX
filesystems with support for extended attributes. While XFS filesystems
are typically used, XFS is not required.

Post by Sachin Goswami
We are studying use of LTFS integration with OpenStack Swift for
scenario like - *Data Archival as a Service* .
Was integration of LTFS with Swift considered before? If so, can you
please share your study output? Will integration of LTFS with Swift
fit into existing Swift architecture ?

Assuming it's POSIX enough and supports extended attributes, a tape
filesystem on a spinning disk might technically work, but I don't see it
performing well at all.

If you're talking about using actual tapes for data storage, I can't
imagine that working out for you. Most clients aren't prepared to wait
multiple minutes for HTTP responses while a tape laboriously spins back
and forth, so they'll just time out.

Tim Bell

2014-11-14 19:43:51 UTC

Permalink

There were some discussions over the past years.

I raised the question of Swift tape support in my keynote in Boston in 2011 (http://www.slideshare.net/noggin143/cern-user-story) but there was limited interest. LTFS makes it more likely but we should not underestimate the challenges. Ensuring bulk recall/migration (mounting tapes takes minutes), inventory catalogs to find the right tape and robotics (multiple interfaces to ask for a tape to be mounted) means it is not just a POSIX support question.

There was a blog in 2012 regarding a Glacier competitor (http://www.buildcloudstorage.com/2012/08/cold-storage-using-openstack-swift-vs.html) but I don't think things have progressed much beyond that.

It would need to be tiered (i.e. migrate whole collections rather than files) and a local catalog would be needed to map containers to tapes. Timeouts would be an issue since we are often waiting hours for recall (to ensure that multiple recalls for the same tape are grouped).

It is not an insolvable problem but it is not just a 'use LTFS' answer.

Tim

On 14 Nov 2014, at 19:06, Samuel Merritt <***@swiftstack.com<mailto:***@swiftstack.com>> wrote:

On 11/13/14, 10:19 PM, Sachin Goswami wrote:
In OpenStack Swift - xfs file system is integrated which provides a
maximum file system size of 8 exbibytes minus one byte (263-1 bytes).

Not exactly. The Swift storage nodes keep their data on POSIX filesystems with support for extended attributes. While XFS filesystems are typically used, XFS is not required.

We are studying use of LTFS integration with OpenStack Swift for
scenario like - *Data Archival as a Service* .

Was integration of LTFS with Swift considered before? If so, can you
please share your study output? Will integration of LTFS with Swift
fit into existing Swift architecture ?

Assuming it's POSIX enough and supports extended attributes, a tape filesystem on a spinning disk might technically work, but I don't see it performing well at all.

If you're talking about using actual tapes for data storage, I can't imagine that working out for you. Most clients aren't prepared to wait multiple minutes for HTTP responses while a tape laboriously spins back and forth, so they'll just time out.

Christian Schwede

2014-11-17 13:36:16 UTC

Permalink

Post by Tim Bell
It would need to be tiered (i.e. migrate whole collections rather than
files) and a local catalog would be needed to map containers to tapes.
Timeouts would be an issue since we are often waiting hours for recall
(to ensure that multiple recalls for the same tape are grouped).
It is not an insolvable problem but it is not just a 'use LTFS' answer.

There were some ad-hoc discussions during the last summit about using
Swift (API) to access data that stored on tape. At the same time we
talked about possible data migrations from one storage policy to
another, and this might be an option to think about.

Something like this:

1. Data is stored in a container with a Storage Policy (SP) that defines
a time-based data migration to some other place
2. After some time, data is migrated to tape, and only some stubs
(zero-byte objects) are left on disk.
3. If a client requests such an object the clients gets an error stating
that the object is temporarily not available (unfortunately there is no
suitable http response code for this yet)
4. At this time the object is scheduled to be restored from tape
5. Finally the object is read from tape and stored on disk again. Will
be deleted again from disk after some time.

Using this approach there are only smaller modifications inside Swift
required, for example to send a notification to an external consumer to
migrate data forth and back and to handle requests for empty stub files.
The migration itself should be done by an external worker, that works
with existing solutions from tape vendors.

Just an idea, but might be worth to investigate further (because more
and more people seem to be interested in this, and especially from the
science community).

Christian

Tim Bell

2014-11-17 18:43:57 UTC

Permalink

-----Original Message-----
Sent: 17 November 2014 14:36
Subject: Re: [openstack-dev] [swift] LTFS integration with OpenStack Swift for
scenario like - Data Archival as a Service .

There were some ad-hoc discussions during the last summit about using Swift
(API) to access data that stored on tape. At the same time we talked about
possible data migrations from one storage policy to another, and this might be
an option to think about.
1. Data is stored in a container with a Storage Policy (SP) that defines a time-
based data migration to some other place 2. After some time, data is migrated
to tape, and only some stubs (zero-byte objects) are left on disk.
3. If a client requests such an object the clients gets an error stating that the
object is temporarily not available (unfortunately there is no suitable http
response code for this yet) 4. At this time the object is scheduled to be restored
from tape 5. Finally the object is read from tape and stored on disk again. Will be
deleted again from disk after some time.
Using this approach there are only smaller modifications inside Swift required,
for example to send a notification to an external consumer to migrate data
forth and back and to handle requests for empty stub files.
The migration itself should be done by an external worker, that works with
existing solutions from tape vendors.
Just an idea, but might be worth to investigate further (because more and more
people seem to be interested in this, and especially from the science
community).

This sounds something like DMAPI (http://en.wikipedia.org/wiki/DMAPI) .... there may be some concepts from that which would help to construct an API definition for the driver.

If you work on the basis that a container is either online or offline, you would need a basic data store which told you which robot/tape held that container and some method for handling containers spanning multiple tapes or multiple containers on a tape.

The semantics when a new object was added to a container would also be needed for different scenarios (such as container offline already and container being archived/recalled).

Other operations needed would be to move a container to new media (such as when recycling old tapes), initialising tape media as being empty but defined in the system, handling container deletion on offline media (with associated garbage collection), validation of an offline tape, ...

Certainly not impossible and lots of prior art in the various HSM systems such as HPSS or CERN's CASTOR.

Tim

Christian
_______________________________________________
OpenStack-dev mailing list
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Tim Bell

2014-11-17 18:59:51 UTC

Permalink

-----Original Message-----
Sent: 17 November 2014 19:44
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [swift] LTFS integration with OpenStack Swift for
scenario like - Data Archival as a Service .

-----Original Message-----
Sent: 17 November 2014 14:36
Subject: Re: [openstack-dev] [swift] LTFS integration with OpenStack
Swift for scenario like - Data Archival as a Service .

...

Certainly not impossible and lots of prior art in the various HSM systems such as
HPSS or CERN's CASTOR.

One additional thought would be to use an open source archive/retrieve solution such as Bacula (http://blog.bacula.org/) which has many of the above features in it.

Tim

Christian
_______________________________________________
OpenStack-dev mailing list
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Clint Byrum

2014-11-14 22:18:25 UTC

Permalink

Post by Samuel Merritt

Post by Sachin Goswami
In OpenStack Swift - xfs file system is integrated which provides a
maximum file system size of 8 exbibytes minus one byte (263-1 bytes).

Not exactly. The Swift storage nodes keep their data on POSIX
filesystems with support for extended attributes. While XFS filesystems
are typically used, XFS is not required.

Assuming it's POSIX enough and supports extended attributes, a tape
filesystem on a spinning disk might technically work, but I don't see it
performing well at all.
If you're talking about using actual tapes for data storage, I can't
imagine that working out for you. Most clients aren't prepared to wait
multiple minutes for HTTP responses while a tape laboriously spins back
and forth, so they'll just time out.

Agreed. You'd need to have a separate API for freezing and thawing data
I think, similar to the way glacier works. However, my understanding of
glacier is that it is simply a massive bank of cheap disks which are
largely kept powered off until either a ton of requests for data on a
single disk arrive, or a certain amount of time has passed. The benefit
of this is that there is no intermediary storage required. The disks
are either online, and you can read your data, or offline, and you have
to wait.