Server consolidation is a key element of cutting down IT costs. Novell products make it easy to assess your data center for consolidation and optimization. Geoffrey Carman explains how they used PlateSpin Recon and PlateSpin Migrate to help a client consolidate and migrate their data center to a new location.
When a data center starts off with a clean slate and fresh design, life is easy, and everything is easy to understand. You start off with a beautiful room, with perfectly run cables, labeled, color coded, and pretty. Heck, you probably blocked out space, so that web servers are all in one set of racks, and all the application servers are together. The SAN is over here, the network switches over there, sorted by function and whatnot. You probably even tried to be forward-looking and left lots of empty space in each area for future growth.
If you have ever seen a data center in this state, it is truly a thing of beauty. The university I used to work at went through this process at least twice in my tenure there. I came in with a decades-old data center, that used to have mainframes and mid-range boxes, that had been removed and outsourced by the time I got there, but bits and pieces remained. We reused all sorts of odd things and there were holes in places that you would not expect, for removed stuff, but they did a good job of cleaning the space up.
Then we moved half the machines upstairs, and we had this really nice machine room although it , too, was an old mainframe style room, so we did not completely refresh it. We used that room for an additional year as they redid the downstairs one, then we shuffled everything down there.
But once that downstairs new big machine was built, man it was good looking! However, it did not stay that pretty for long! Soon enough the rules for what goes where, and the color coding of the networking cables, the cable runs, etc, slowly began to be violated.
You know how it is, guy needs to fix some issue. Does not have the right pretty part, or room right where he needs it right now. But we have to get this fixed now! Its an important production system! Hurry hurry hurry! Well do it the fast way, but wrong for the room, and we will clean up later. Well does later ever really come? Does anyone ever actually figure out another outage to come back and properly clean up? In my personal experience, pretty rarely.
Soon enough within a year or two it was still looking pretty good, but there were all sorts of exceptions all over the room. There were close to 1000 servers floating around in a fairly large space so you can imagine how that happens.
Last time I visited, it looked like a scene from ET! Due to temperature problems, the hot and cold aisles had to be isolated with plastic curtains, and it sure was not very pretty any more. The cable patterns were pretty obviously slightly off, as was to be expected, and I was shown servers that were in the wrong sections.
This is normal in the course of any machine room. They are a living beast, which when you get to this kind of size, consider replacement schedules. Assume you replace your hardware every three to five years, along with consolidation of services as time goes on, and new services and servers being brought in, the room changes and merges and splits, literally like a living creature.
It is an almost impossible task to keep track of everything! In fact, the guy in charge had to setup a database you could search just to find servers in the room, after the move, and it quickly fell out of date, no matter how hard he tried to keep it up to date.
That machine room had almost no VM in it when I was involved with it. Virtual Machines rock! They have seriously changed every thing in the server and services world. There are few existing services that really can use a modern eight core server with 32 Gigs of RAM to its max. Yet it is so amazingly cheap to buy such a monstrous box, (compared to just a few years ago! Thank you Moore's Law!) but so hard to really use it to its fullest. 8 cores, and many Gigs of RAM is basically entry level for server class hardware these days. The high end is so high end, it is scary at times!
With VMWare, open source XEN, Citrix's XEN Server, KVM, Microsoft's Hyper-V or any other VM product, you start getting the ability to do some really amazing things!
We have one client that has ten 8-core servers in an XEN Server cluster running well over 100 server instances. Yet this is all in 2U servers, and in 20 U's of space, they have over 100 servers running. Think about that from a power, heating and cooling perspective and that is a total win in every dimension.
As you can imagine, if it is hard to keep track of servers when there is a one-to-one correlation in the physical world to a server, imagine what happens when you start adding in virtual machines, and now there is no correlation at all that can be seen physically.
Even better, with modern VM software, you can move running VM's from one host in a cluster to another cluster node. From an administrative perspective this is priceless. If you need to upgrade a box, or fix an issue, just move the running VM's to other nodes, and then take the problem hardware out of the cluster, fix it, and when you bring it back online, move some VM's back onto it.
Thus a physical walk through of a data center is pretty much useless these days. No doubt it is helpful, and can identify physical hardware, but it does not help much to get a real picture of the data center.
The good news is that Novell has a great product for dealing with issues like this. The acquisition of PlateSpin was a really good call in this space, as PlateSpin brings a suite of tools to the table that are darn useful.
In the first place there is PlateSpin Recon which is the perfect tool for this issue.
Even better, Novell has a one-time license model, where you can buy PlateSpin Recon licenses just to manage a single data center move, instead of for continuous monitoring. This is one of the better marketing decisions on licensing I have seen in a while. If you can 'sucker' them into using the product for real, soon after the move is over, you will start getting asked, can I see the current version of the reports we saw during the move? Well, to do that, you need to pay for a full license. Well at that point, the people asking for the information (CIO's and high level managers) are usually willing and able to come up with the money for the full license.
PlateSpin Recon has a couple of great features that make this so very useful in a data center. First off, it can go out and discover many of the servers running in your data center. Once it finds them all, it starts to collect usage and utilization data from them.
Once you have a week or more of utilization data, you can get a good feel for which physical or virtual servers are over- or under-performing.
Thus you can find all the various servers, physical and virtual you need to account for, and see how they are being used.
Next up, PlateSpin Recon can suggest a migration plan, to an all-virtual environment. Thus you can specify a base server for your VM servers (in a cluster or not) and it will generate a plan showing which physical and virtual servers should be moved, and how they should be laid out in VM servers, in order to maximize utilization and work most efficiently.
As you can imagine, you will have lots of servers that are very busy during the day, and probably others that are only busy at night. Well these make complementary pairings to keep together on the same VM servers, as you could over-provision a bit, knowing that your peak times do not overlap.
Once you have this plan generated, you can go ahead and implement it.
Moving all these servers around can be a real pain by hand. That's where PlateSpin Migrate comes in. This tool is great for moving servers around P2V (Physical to Virtual), V2P (Virtual to Physical), V2V (Virtual to Virtual), P2P (Physical to Physical). Not only that, but it can even take the plan generated by PlateSpin Recon and implement it for you!
When you have to move a data center around, these two tools together are worth their weight in gold!
Moving a client's data center
We have a client that wants to move its data center from a big expensive building, to a hosted data center way out in the suburbs. They look at it simply. The cost of the rent for the floor the data center is taking up is quite high. As long as the move costs less than rent for a year, it is worth spending that much money for the move.
Along the way it makes sense to consolidate some hardware, ditch some old end of life, or near end of life boxes, and virtualize what makes sense. After all, if you are paying by the rack in the new data center, why move old junk hardware? This is a great time to clean up some of the junk in the system.
But there is a bigger problem first. What on earth is actually in that data center?
Well, we went in with them and they had a spreadsheet they were working from, which showed well over a hundred servers. Brought PlateSpin Recon and let it run, and it found an extra fifty servers they did not have tracking information for. Of course, each server belonged to someone, but for some reason they had not mentioned it to the person maintaining the spreadsheet, as is often the case.
Right off the bat, this was a huge win, as a move that ignored these fifty servers would have crippled their enterprise after the move. Or at least would have been pretty painful when they were missed.
In this case a bigger problem, for which we had no good solution, was the fibre channel mapping and zoning for all the hardware, since they had two SAN's in the old data center, and were consolidating down to one newer SAN in the new data center, leaving one behind, and moving one with them.
Getting the network and SAN fibre setup in the new data center was a bigger job than almost anything else.
But in this context, using the migrate plan from PlateSpin Recon was helpful as you could use it to show where the SAN needed to connect, and the LUN's that would need to be zoned to each location. This left us with a useful map to work from, if nothing else.
Still left a lot of work to do, but boy was it helpful.
Even if you are not doing a data center move, just having regular stats on machine usage is really useful. Usually monitoring tools like HP OpenView, WhatsUp, Big Brother, Nagios track and show the individual usage of servers, but do not offer a nice way to contemplate or model how they might be better combined.
This is some of the special sauces that PlateSpin Recon brings to the table.
For a high level manager (CIO, etc) this is truly gold. He can see how much of his hardware is really being utilized, and how much is being wasted. With the big concerns these days on power and cooling resources, running 100 servers on 10 physical boxes, instead of 100 physical boxes is such a clear win, that it seems almost crazy not to move to VM's.