Data throughput: Comparing Oranges to Oranges
One of the things that maddens me about backup vendors (both hardware and software) is the confusion they engender with the many ways of referring to data throughput speeds.
I used to work at a University and we had a many month long (ok, almost many year long! Anyone who has ever worked at a University knows what I am talking about!) process where we were looking at backup solutions.
Every vendor used a slightly different notation to denote throughput of the devices.
In the end, I came up with my own system. When I think of data, I think of it in buckets usually sized in GigaBytes. (Although I guess TeraByte sized buckets are going to be more common soon enough) What I mean by that is, if I need to backup a server, and it has 750 Gigabytes of data. The only number I care about is how long will that take?
Well a LTO 3 tape (Linear Tape Open, more commonly known as Ultrium, a standard that IBM, HP, and Quantum seem to support) can do in theory 80 MB/s. That is mega bytes a second.
Ok, so how many hours will it take to backup my 750 GB server? I need the calculator for that one, and with this long bid process, I needed the ability to think about the numbers in meaningful ways.
For me, the only number that is meaningful is GB/hr. That is, Giga Bytes an hour. This makes a great deal of sense to me, since if an LTO-3 drive can do about 280 GB/hr, then 750 GB, at 280 GB/hr means I can easily do the math in my head and realize, about three hours.
But the vendors seem addicted to using other speed notations, usually MB/s, Mega bytes a second, which is mostly useless to me.
Coming up with conversion factors is easy if you remember the old standbys from high school physics. (At least they taught this at my high school, I am from Canada, so who knows what they teach in the US.) We used to call this Dimensions Analysis, since you look at the dimensions, and play around with those values to do the conversions. I have no idea what its real name is. (My high school physics teacher was a great guy, and I learned a lot, but I found out later in university that many things he taught us, was 'unique' and specail to his view of physics. Regardless, it still works!)
Mega Bytes / second is our starting point as this is what most vendors will quote. We need to multiply this by some fraction that resolves down to 1 (the value is the same on top as on the bottom of the fraction, thus cancelling out and equals one), but leaves us with different units, and hopefully the ones we want.
Mega Bytes 1 Giga Byte 1 Giga Byte
---------- X ----------- = ------------
Second 1000 Mega Byte 1000 seconds
Take Mega Bytes per second, and multiply by our converting fraction of 1 Giga byte over 1000 Mega Bytes. Well 1 Giga Byte equals 1000 Mega bytes. Now that should really be 1024 for completeness, but wait a moment and you will see why I round.
The Mega bytes on top left cancel the Mega bytes on the bottom right, and you are left with Giga Bytes/1000 seconds.
Now to get rid of the seconds and replace it with hours.
1 Giga Byte 3600 seconds 3600 Gigabytes
----------- X ----------- = --------------
1000 seconds 1 hour 1000 hours
Now we use our fraction that says 3600 seconds divided by 1 hour, equals one. But when we multiply our current value by this new fraction, the seconds on the bottom left cancel the seconds on the top right.
Now we simplify that to 3.6 (3600/1000) to get GB/hr.
Therefore, take the Mega bytes per second (80 MB/s in the case of LTO-3), and multiply to 3.6 and you get your conversion factor. So 80 MB/s means a more useful 288 or so GB/hr.
Now the reason I skipped the 1024 and used 1000 instead is that it adds about a 2.4% error into it, but makes the conversion factor 3.6, which is pretty easy to do in my head on the fly. 3.515625 which is the more correct number is not so useful to me. Probably 3.5 could work, but then I am rounding again, and that just adds error in the other way.
You can use this to get other more useful conversion factors. I saw someone quote Mega Bits per second (which is even less useful to me than Mega Bytes per second when it comes to backups at least). Take our 3.6, and consider that the units are now Giga Bits per hour after you multiplied, and say:
Giga Bits 1 byte Giga Bytes
--------- X ------ = ----------
hour 8 bits 8 hours
Or more simply, multiply the value of Giga bits per hour by 1/8th (or divide by 8, same thing) and you get your conversion. Put the two together and 3.6 times 1/8 is about 0.45.
So 100 Mega bits per second is about 45 Giga bytes per hour. Much more useful.