You are here


Storage and backup




LinuxVTL is an implementation of a VTL in Linux. Not really an appliance, but some software which behaves like a real VTL. Do not expect the performance of a Protectier or a FalconStore, as the changelog show this is really work in progress : the log for one of the last changes shows fixes for silent data corruption.

Data-deduplication isn't included, but you can off course use FUSE-ZFS with dedup on. Also, if you're using LinuxVTL on a separate server, you know you must either have HBA cards, or use iSCSI. Check out the homepage for usefull tips, like setting up an OpenSolaris client, or the settings for your favorite backup software.

Eaten by a robot


We have a giant SL8500 tape library at work, which regularly needs to be fed. I once had the dull task of feeding the beast 600 tape media though the front-end CAP, at 14 media per time, which took me quickly about 4 hours to perform. Yesterday, we had a new batch of 300 tapes, which had to be entered. Luckily, the StorageTek technicians were present, doing a firmware upgrade of the library. As they opened the library for easy access of some vital parts, I took the chance of entering the machine, and putting the tapes in their slots from the inside.

A SL8500 is a U-shaped library, which can be entered from top of the 'U'. On the long sides, tapes reside on both sides, and on the curved side of the U, the tape drives are mounted. So I accessed the library in a narrow corridor of 2 meter tall walls, completely filled with tapes containing the precious electronic data of my enterprise. These corridors sure are narrow, so I guess one of the requirements of being a StorageTek admin is not having a Burger King subscription. I only hoped that the technicians didn't forget to deacivate the handbots, cause an encounter with a iron gripper, moving at 5 meters/second would definitively send me on a one-trip down to the cemetery.

The picture doesn't do the library right : in this dark corridor, only being lit by some faint LEDs, and with my crappy phone camera, only the lowest 50 centimeters are visible (I placed the camera on the floor). Imagine this being 2 meters high, and 15 meters long, and you have an idea what we're talking about.

I finished in an hour, and once the STK engineers had finished their labor, I let the library rescan its new inventory, which it finished within 3 minutes (I love StorageTek hardware). 450 new Terabytes ready to use. Happy munching, Optapemus Prime !

TSM 6.1 : Proof of Technology


I got invited by IBM to attend their Proof of Technology session about TSM6.1. It was an interesting opportunity to delve some deeper into the new functionalities of TSM6.1, and to chat with other TSM customers about their experiences. First, the 'good' news : I wasn't the only one who got problems with the installation on AIX; apparently only on AIX6, some installations were flawless, but on the AIX5 platform, noone succeeded yet in installing TSM6.1.

The session was introduced with an explanation about TSM FastBack, which is basically a CDP solution for remote branches, aimed at Windows installations. It captures data changes at the block level, with very minimal impact to the system, and also provides a near instant restore capability that enables applications to be up and running within minutes after data recovery is initiated.

TSM6.1 then : to my big surprise, we were invited to a hands-on session with an upgrade from 5.5 to 6.1. It all went quite faster than expected, so I took the remaining time to study the data-deduplication in TSM. First of all, data-dedup can only be applied to sequential file device classes. It uses a SHA-1 hashing algorithm, which can give a infinitesimal probablility of a hash-collision ( a 40PB archive give a 0.5*10-28 chance on a collision). Creating or upgrading your storage pool to a dedup-based one is performed by this command :

update stgpool datadedup=yes

As soon as you execute this command, an extra process is started :

Process  Process Description   Status
--------  --------------------  -------------------------------------------------
     283  Identify Duplicates   Storage Pool FILEPOOL, Volume /tsmpool2/00006664.
                                 BFS, Files Processed: 2000, Duplicate
                                 Extents Found: 344, Duplicate Bytes Found:
                                 3,238,123 Current Physical File (bytes):
                                Status: Processing

This process starts identifying duplicate data blocks, and goes idle when nothing is left to process, so you'll allways see at least one of these processes running per deduplicated storage pool. This process is the first part of data-deduplication and, while running, can use lots of CPU and memory resources : existing FILE volumes are opened and read (I/O intensive), data is fingerprinted into chunks (CPU intensive) and a SHA digest is calculated on the chunk (CPU intensive). To avoid false-positives, size of chunk and another quick digest are also checked. Common chunks are replaced with pointers to the location of the common data (DB updates, LOCK contention). The second phase of data-dedup is the effective removal of the spurious datablocks, by a reclamation process.

There was also some presentation about the reporting features introduced into TSM6.1, which I already studied. The reporting is based on a combination of ISC, ITM, TEP and DB2, and needs a whopping 2.2GB download. The reporting feature I looked into was only the TEP part, but apparently there's also some reporting baked into ISC. The provided reports are pretty basic, but IBM is planning on creating a script library for all TSM customers. You can also extend your TSM based reporting with BIRT, an Eclipse plugin. Customers which now rely on TSM Operational Reporting, might be interested to know that this product will be discontinued with TSM6.1...

IBM looks determined to push Sun and HP customers to Linux on mainframe. At work, we compared the performance of TSM on AIX and on zLinux. IBM seemed *very* curious about our performance figures on these tests, and I got an IBM representative asking me lots of questions about it. I was quite surprised at the time of the tests to see TSM performe so well on zLinux, but it did perform about 18% less better than the AIX version. It still outperformed TSM on Solaris by a whopping 300%...

SSD myths dispelled - sortoff


Many eeePC installations recommend some precaution while formatting the internal SSD drives with ext3. Most articles warn that the continuous writes of journaled file systems or swap spaces might trash eventually the drive. I too chose a setup with ext2 and no swap partition on my netbook.

Robert Penz tries to dispel some of the prejudices around SSD drives, where he states that with a 2 million cycle at 50MB/sec you'd still get a life cycle of 20 years.

Interesting read, but I don't buy all arguments - he makes some good remarks, though I think the truth lies somewhere in the middle :
- first of all, Robert probably accepts figures from enterprise SSD disks, which are a different quality level than the regular SSD drives found in netbooks.
- I really would like to see results from writing over the same block over again and again, a life cycle of 20 years is imho in those conditions impossible.
- mis-configurations can fill up logfiles pretty quickly, so yes, netbooks can experience heavy writes too.

Does it makes such an overall difference ? Hell, SSD drives are pretty fast, so fsck's are also. My Linux boxes are pretty stable, I still haven't seen my netbook crash, forcing a fsck, despite some moments of heavy usage or sudden battery drains. If I would be forced to reinstall my eeePC, I still would go for the same setup. Maybe a swap partition would be handy, but an additional 1GB disk space on 20GB is a nice tradeoff for some memory gain.

A quick look at TSM 6.1


TSM 6.1 was released some weeks ago, but I didn't had any chance yet to test this out. Many things have changed in the backup flagship of IBM, so time for a quick glance. First the bad news :

  • TSM isn't available on 32bit Linux, and at this time only for 64bit SLES. No word yet if RHEL will be supported in the future.
  • the TSM downloads are hefty, ranging from 2 to 5 GB ! So be prepared for some long download times. Many of the packages are replaced by installation suites. TSM also contains a full blown DB2 9.5, the reporting software comes with ITM. Lots of software integration with different IBM solutions, so your procurement division might have lots of fun figuring out the licensing of these software clusters.
  • All installations are performed by software installation wizards, which behave stupidly. I tried the installation on AIX, but it kept failing on AIX prereqs, like the AIX Technology Level, and some APARs. The reporting software behaves in the same moronic way. In despair I turned to a Windows 2003 in VMware, where I had more luck in installing everything.

More info later on TSM itself, but I noticed that data dedup is present, but under the form of a separate housekeeping job. You might provide some extensive testing into your TSM 6.1 environment to see if the performance load of D3 doesn't level out the performance gain of D3 in other housekeeping jobs.

TSM now also contains a default reporting, based on ITM. Reporting alone is a download of 2.2GB, containing ISC, ITM and DB2. You can monitor & report about other TSM instances, even 5.5 versions. The reports are pretty basic, but if you're used to TSM Reporter, this might be a nice addition. Basic reporting contains stuff like backup jobs, schedules, database size and distributions of the number of objects inspected vs backed up. Still no real competitor towards professional TSM reporting suites like EMC Data Protection Advisor or Aptare Backup Manager.


Subscribe to RSS - Storage