Skip to main content

TSM 6.1 : Proof of Technology

I got invited by IBM to attend their Proof of Technology session about TSM6.1. It was an interesting opportunity to delve some deeper into the new functionalities of TSM6.1, and to chat with other TSM customers about their experiences. First, the 'good' news : I wasn't the only one who got problems with the installation on AIX; apparently only on AIX6, some installations were flawless, but on the AIX5 platform, noone succeeded yet in installing TSM6.1.
The session was introduced with an explanation about TSM FastBack, which is basically a CDP solution for remote branches, aimed at Windows installations. It captures data changes at the block level, with very minimal impact to the system, and also provides a near instant restore capability that enables applications to be up and running within minutes after data recovery is initiated.
TSM6.1 then : to my big surprise, we were invited to a hands-on session with an upgrade from 5.5 to 6.1. It all went quite faster than expected, so I took the remaining time to study the data-deduplication in TSM. First of all, data-dedup can only be applied to sequential file device classes. It uses a SHA-1 hashing algorithm, which can give a infinitesimal probablility of a hash-collision ( a 40PB archive give a 0.5*10-28 chance on a collision). Creating or upgrading your storage pool to a dedup-based one is performed by this command :

update stgpool datadedup=yes

As soon as you execute this command, an extra process is started :

Process Process Description Status
Number
-------- -------------------- -------------------------------------------------
283 Identify Duplicates Storage Pool FILEPOOL, Volume /tsmpool2/00006664.
BFS, Files Processed: 2000, Duplicate
Extents Found: 344, Duplicate Bytes Found:
3,238,123 Current Physical File (bytes):
2,626,676,296.
Status: Processing

This process starts identifying duplicate data blocks, and goes idle when nothing is left to process, so you'll allways see at least one of these processes running per deduplicated storage pool. This process is the first part of data-deduplication and, while running, can use lots of CPU and memory resources : existing FILE volumes are opened and read (I/O intensive), data is fingerprinted into chunks (CPU intensive) and a SHA digest is calculated on the chunk (CPU intensive). To avoid false-positives, size of chunk and another quick digest are also checked. Common chunks are replaced with pointers to the location of the common data (DB updates, LOCK contention). The second phase of data-dedup is the effective removal of the spurious datablocks, by a reclamation process.
There was also some presentation about the reporting features introduced into TSM6.1, which I already studied. The reporting is based on a combination of ISC, ITM, TEP and DB2, and needs a whopping 2.2GB download. The reporting feature I looked into was only the TEP part, but apparently there's also some reporting baked into ISC. The provided reports are pretty basic, but IBM is planning on creating a script library for all TSM customers. You can also extend your TSM based reporting with BIRT, an Eclipse plugin. Customers which now rely on TSM Operational Reporting, might be interested to know that this product will be discontinued with TSM6.1...
IBM looks determined to push Sun and HP customers to Linux on mainframe. At work, we compared the performance of TSM on AIX and on zLinux. IBM seemed *very* curious about our performance figures on these tests, and I got an IBM representative asking me lots of questions about it. I was quite surprised at the time of the tests to see TSM performe so well on zLinux, but it did perform about 18% less better than the AIX version. It still outperformed TSM on Solaris by a whopping 300%...