Tag Archives: Server Virtualization

Countdown to July 14, 2015

In case you haven’t heard, Microsoft will end support for Windows Server 2003 on July 14, 2015. A quick glance at the calendar will confirm that this is now less than a year away. So this is your friendly reminder that if you are still running 2003 servers in production, and you haven’t yet begun planning how you’re going to replace them, you darn well better start soon. Here are a few questions to get you started:

  • Are those 2003 servers already virtualized, or do you still have physical servers that will need to be retired/replaced?
  • If you have physical 2003 servers, do you have a virtualized infrastructure that you can use for their replacements? (If not, this is a great opportunity to virtualize. If so, do you have enough available capacity on your virtualization hosts? How about storage capacity on your SAN?)
  • Can the application workloads on those 2003 servers be moved to 2008 or 2012 servers? If not, what are your options for upgrading those applications to something that will run on a later server version?
  • What impact will all this have on your 2015 budget? Have you already budgeted for this? If not, do you still have time to get this into your next budget?
  • Would it make more sense from a budget perspective to move those application workloads to the cloud instead of purchasing server upgrades? (Maybe a monthly operating expense will be easier to deal with than the capital expenditure of purchasing the upgrades.)

According to Microsoft, there are more than 9 million 2003 servers still in production worldwide…and the clock is ticking. How many of the 9 million are yours?

Some Straight Talk about VDI-in-a-Box

Update: The advent of solid-state drives allows you to eliminate IOPS as a potential bottleneck. The calculations below are based on 15K SAS drives that support roughly 175 IOPS each. A typical 200 Gb SSD will support tens of thousands of IOPS. On the other hand, although SSD prices are coming down, they’re still rather pricey. Replacing the eight 146 Gb, 15K SAS drives in the example below with eight 200 Gb SSDs, and loading it up with RAM so you can support more virtual desktops, will push the price of the server to nearly $20,000. So the primary point of this post still stands: While VDI-in-a-Box is a great product, and can be competitive with physical PCs when the entire lifecycle cost is compared, you’re just not going to see significant savings in the capital expense of ViaB vs. physical PCs. That doesn’t mean it isn’t a great product, and it doesn’t mean you shouldn’t consider it. It just means that you need to validate what it’s really going to cost in your environment.

Original Post (April, 2012):
There is a lot of buzz about Citrix VDI-in-a-Box (“ViaB”), and rightly so: it’s a great product, and much simpler to install and easier to scale than a full-blown XenDesktop deployment. You don’t need a SAN, you don’t need special broker servers, you don’t need a separate license server or a SQL Server to hold configuration data. Unfortunately, some of the buzz - particularly some of the cost comparisons you see that show a $3,000 - $4,000 server for 30 or more virtual desktops, is misleading. So let’s talk seriously about the right way to deploy ViaB. For this exercise, I’m going to assume we need 50 virtual desktops. Once we’ve worked through this, you should be able to duplicate the exercise for any number you want.

First of all, I’m going to assume that we are building a system that will support Windows 7 virtual desktops - because I can’t see any valid reason why someone would invest in a virtual desktop infrastructure that couldn’t support Windows 7. There are two important data points that follow from this: (1) We should allow at least 1.5 Gb per virtual PC, and preferably 2 Gb per virtual PC. (2) We should design for an average of about 15 IOPS per Windows 7 virtual PC, because, depending on the user, a Windows 7 desktop will generate 10 - 20 IOPS. Let’s tackle the IOPS issue first.

Thanks to Dan Feller of Citrix, we know how to calculate the “functional IOPS” of a given disk subsystem. Here are the significant factors that go into that formula:

  • A desktop Operating System - unlike a server Operating System - has a read/write ratio of roughly 80% writes and 20% reads.
  • A 15K SAS drive will support approximately 175 IOPS. The total “raw IOPS” of a disk array built from 15K SAS drives is simply 175 x the number of drives in the array.
  • A RAID 10 array, which probably offers the best balance of performance and reliability, has a “write penalty” of 2.

With that in mind, the formula is:

Functional IOPS=((Total Raw IOPS x Write %)/(RAID Penalty)) + (Total Raw IOPS x Read %)

If we put eight 15K SAS drives into a RAID 10 array, the formula becomes:

Raw IOPS = 175 x 8 = 1,400

Functional IOPS = ((1,400x.8)/2)+(1,400x.2) = 560 + 280 = 840

If we are assuming an average of 15 IOPS per Win7 virtual PC, this suggests that the array in question will support roughly 56 virtual PCs. So this array should be able to comfortably support our 50 Win7 virtual PCs, unless all 50 are assigned to power users.

That’s all well and good, but we haven’t talked yet about how much actual storage space this array needs. That depends on the size of our Win7 master image, how many different Win7 master images we’re going to be using, and whether we can use “linked clones” for VDI provisioning, in which case each virtual PC will consume an average of 15% of the size of the master, or whether we’re permanently assigning desktops to users, in which case each virtual PC will consume 100% of the size of the master. For the sake of this exercise, let’s assume we’re using linked clones, and that we have three different master images, each of which is 20 Gb in size. According to the Citrix best practice, we need to reserve 120 Gb for our master images (2 x master image size x number of master images). We then need to reserve 3 Gb per virtual PC (15% of 20 Gb), which totals another 150 Gb. The ViaB virtual appliance will require 70 Gb. We also need room for the hypervisor itself (unless we’re provisioning another set of disks just for that) and for swap file, transient activity, etc., so let’s throw in another 150 Gb. That’s 490 Gb minimum. So we need to use, at a minimum, 146 Gb drives in our array, which would give us 584 Gb in our RAID 10 array.

How about RAM? If we allow 1.5 Gb per Win7 desktop, then 50 virtual desktops will consume 75 Gb. We need at least 1 Gb for the ViaB appliance, at least 1 Gb for the hypervisor, plus some overhead for server operations, so let’s just call it 96 Gb.

We can handle 6 to 10 virtual desktops per CPU core - more if the cores are hyper-threaded - so we’re probably OK with a dual-proc, quad-core server.

Now, I don’t know about you, but if I’m going to put 50 users onto a single server, I’m going to want some redundancy. I will at least want hot-plug redundant power supplies, and hot-plug disk drives. Ideally, I would provision “N+1″ redundancy, i.e., I would have one more server in my ViaB array than I need to support my users. I’m also going to want a remote access card, and probably an uplift on the manufacturer’s Warranty so if it breaks, the manufacturer will come on site and fix it.

By now, you’ve probably figured out that we are not talking about a $4,000 server here. I priced out a Dell R710 - using their public-facing configuration and quoting tool - with the following configuration, and it came out to roughly $11,000:

  • Two Intel E5640 quad-core, hyper-threaded processors, 2.66 GHz
  • 96 Gb RAM
  • Eight 146 Gb, 15K SAS drives
  • PERC H700 controller with 512 Mb cache
  • Redundant hot-plug power supplies
  • iDRAC Enterprise remote access card
  • Warranty uplift to 3-year, 24×7, 4-hour response, on-site Warranty

(NOTE: This is a point-in-time price, and hardware prices are subject to change at any time.)

The ViaB licenses themselves will cost you $195 each. Be careful of the comparisons that show the price as $160 each. ViaB is unique among Citrix products in that the base cost of the license does not include the first year of Subscription Advantage - yet the purchase of that first year is required (although you don’t necessarily have to renew it in future years). That adds $35 each to the cost of the licenses.

Finally, If you don’t have Microsoft Software Assurance on your PC desktops - and my experience is that most SMBs do not - you need to factor in the Microsoft Virtual Desktop Access (VDA) license for every user. This license is only available as an annual subscription, and will cost you approximately $100/year.

So, your up-front acquisition cost for the system we’ve been discussing looks like this:

  • Dell R710 server - $11,000
  • 50 ViaB licenses @ $195 - $9,750
  • 50 Microsoft VDA licenses @ $100 - $5,000

Total aquisition cost: $25,750, or $515/user. Not bad.

But wait - if we’re going to compare this to the cost of buying new PC, shouldn’t we look at the cost of ViaB over the same period of time that we would expect that new PC to last? If we assume, like many companies do, that a PC has a useful life of about 3 years, then we should actually factor in another two years of VDA licenses, and two years of Subscription Advantage renewal for the ViaB licenses. That pushes the 3-year cost of the ViaB licenses to $13,250, and the cost of the VDA licenses to $15,000. So the total 3-year cost of our solution is $39,250, or $785/user.

If you want N+1 redundancy, you’re going to need to buy a second server. That would push the cost to $50,250, or $1,005/user.

What conclusions can we draw from all this? Well, first, that VDI-in-a-Box is not going to be significantly less expensive than buying new PCs, if you actually do it right. However, it is competitive with the price of new PCs, which is worth noting. As long as the price is comparable, which it is, we can then start talking about the business advantages of VDI, such as being able to remotely access your virtual desktop from anywhere, with just about any device, including iPad and Android tablets, and about the ongoing management advantages of having a single point of control over multiple desktops.

Also, as you scale up the environment, the incremental cost of that extra server that’s required for N+1 redundancy gets spread over more and more users, and becomes less significant. For example, if we’re building an infrastructure that will support 150 virtual desktops, we would need four servers. Total 3-year cost: $128,750, or $858.33/user for a robust, highly redundant virtual desktop infrastructure. In my opinion, that’s a pretty compelling price point, and you won’t be able to hit that price point with a 150-user XenDesktop deployment, because of the other server and storage infrastructure components that you need to build a complete solution. On the other hand, XenDesktop does include more functionality, like the rights to use XenApp for virtual application delivery, ability to stream a desktop OS to a blade PC or a desktop PC, rights to use XenClient for client-side virtualization, etc.

But if all you want is a VDI solution, ViaB is, in my opinion, the obvious choice. It’s clear that Citrix wants to position VDI-in-a-Box as the preferred VDI solution for SMBs, meaning anyone with 250 or fewer users…and there’s no reason why ViaB can’t scale much larger than that.

For more information on ViaB, check out this video from Citrix TV, then head on over to the Citrix TV site to view the entire ViaB series

**** EDIT April 12, 2012 ****
You may already be aware of this, but Dell has announced a ViaB appliance that comes pre-configured, with both XenServer and the ViaB virtual appliance already installed. Oddly enough, even though Moose Logic is a Dell partner, I couldn’t get Dell to tell me what one would cost. Their answer was that I should call back when I had a specific customer need, and they would work up a specific configuration and quote it. I considered calling back with a fictitious customer requirement, but decided that I didn’t want to know badly enough to play that game.

They did, however, tell me what the basic server configuration was - and it was very close to the configuration I’ve outlined above: two X5675 processors, 96 Gb of RAM, eight 146 Gb drives in a RAID 10 array, Perc H700 array controller (don’t know how much cache, though), and iDRAC Enterprise remote access card. I do not know whether it has redundant power supplies (although I would certainly hope so), nor exactly what Warranty is included…perhaps that option is left up to the customer.

That gave me at least enough information to run a sanity check on the configuration. The array would provide 960 functional IOPS, which should be adequate for an 80 user system - which is how the appliance is advertised - depending, of course, on the percentage of power users. Also, the array should provide enough storage to handle the needs of most SMBs, unless they have an unusually large number of images to maintain.

One of my Citrix contacts recently told me that the Dell appliance was priced at $440/desktop for an 80 concurrent user configuration, which is very much in line with the cost per user in the post above, considering that $100 of my $515/user number was for the first year of Microsoft VDA licenses, which, to my knowledge, are not included with the Dell appliance.

XenServer Host Is In Emergency Mode

It’s 8 pm on a Sunday evening, and I get a panicked call from a customer because he cannot connect to his XenServersTM via the XenCenterTM management tool. However, as near as he could tell, all of the hosted virtual machines were up and running and in a healthy state. He had unsuccessfully tried to point the XenCenter management tool at another member of the XenServer pool but was unsuccessful.

So what happened and how do you fix it?

This situation can happen for several reasons but generally it happens when there are only two servers in the XenServer pool, and the pool master suddenly fails. In essence, what happens is the surviving server (let’s just call it the “slave”) can no longer see its peer, the pool master, so it assumes it has been stranded and goes into emergency mode to protect its own VMs. There are other ways this can happen (an incorrectly configured pool with HA turned on for example), but this is the most common reason that I have personally experienced.

Depending upon the situation, you may not be able to ping the master server because it is actually down, or you may be able to ping the server but it is in an inconsistent, “locked up”, state such that it cannot answer requests to it. If you are able to connect to the console of the master server either directly with a monitor, keyboard, and mouse (the old fashioned way) or through a remote management interface (DRAC, ILO, ILOM, etc) the server may appear to be running, but you may not be able to do anything with it.

At this point you may be thinking, “This is no big deal - just reboot the machine and it will be fine.” If you are lucky that may actually solve the problem, but in many cases it will not. What you might see is that after the master reboots you will be able to connect to the master but you will not see the slave. Or it may be that your master is truly broken and you are not able to simply reboot it due to a system or hardware failure. But, of course, you’ve still got to get your pool online and working again regardless.

During this period of time, if you try to use a tool such as Putty to connect to the slave via its management interface, you may not be able to connect to it either. If you try to ping the slave on the management interface you may not get any replies. But if you connect to the console of the slave (again, either the physical console or via a remote management interface) you will probably see that the machine is running, but if you look at XSconsole it will appear that the management interface is gone because there will be no IP address showing. By now you’ll probably be scratching your head because the strange thing is all the VMs are running.

So at this point your master appears to be down, or at least impaired, you’ve got no management interface on the slave, your pool is broken and you cannot manage the VMs. So what do you do?

Well, if this happens to you and your VMs are still up and running the first thing you should do is take a deep breath, because more than likely it is not as bad as you might think. XenServer is a robust platform and if the infrastructure is built correctly (and I’m going to quote a customer), “you can really slam the things around and they still work”.

After you take a deep breath and let it out slowly, from the console of the slave server, you will need to access the command line and start by typing:

xe host-is-in-emergency-mode

If the server returns an answer of “True” then you’ve confirmed that the server has gone into emergency mode in order to protect itself and the VMs running on it. (If the server returns an answer of “False” then you can stop reading, because the rest of this post isn’t going to help you.)

Assuming you receive the answer of “True” the slave server is in emergency mode because it cannot see a master – either because the master is actually down, or because the management interface(s) is(are) not working. Therefore, the next step is to promote the slave to master to get it out of emergency mode. We do this by typing:

xe pool-emergency-transition-to-master

At this point the slave server should take over as the pool master and the management interface should be available again. Now if you type the xe host-is-in-emergency-mode command again you should get an answer of “False”.

Now, open XenCenter again. It will first try to connect to the server that was the master, but after it times out it will then attempt to connect to the new master server. Be patient, because eventually it will connect (it may take several seconds) and you will again see your pool and be able to manage your VM’s. If some of the VMs are down because they were on the server that failed you’ll be able to start them on the remaining server (assuming you have shared backend storage and sufficient processor and memory resources).

Now what about the master if it has totally failed? What do I do after I’ve fixed, say, a hardware problem in order to return it to my pool?

If the following two conditions are true:

  1. You are using shared storage so that your VMs are not stored on the XenServer local drives, and
  2. You have built your XenServers with HBAs (fiber or iSCSI) rather than using Open iSCSI, which means the connectivity information to your backend SAN will be stored within the HBA,

…then it may be much simpler and quicker just to reload the XenServer operating system. (If you do not have shared backend storage, which means your VMs are on local storage, DO NOT DO THIS). I can rebuild my XenServers from scratch in about 20 - 30 minutes and have them back in the pool and running.

If either of those two conditions is not true then, depending upon your situation, recovery may be significantly more difficult. It could be as simple as resetting your Open iSCSI settings and connecting back to your SAN (still easy but takes more time to accomplish) or it could be as painful as rebuilding your VMs because you lost your server drives. (OUCH!)

Real world example: I recently had a NIC fail on the motherboard of my master server. Of course since the NIC was on the motherboard it meant the whole motherboard had to be replaced which significantly modified the hardware configuration for that server.

In this case, when I brought that XenServer back online it still had all the information about the old NICs showing in XenCenter, plus it had all the new NICs from the new hardware. Yes I could have used some PIF forget commands to remove the NICs that no longer existed and reconfigure everything but that would have taken me a bit of time to straighten out. Since I had iSCSI HBAs attached to a Datacore SAN (great product, by the way) for shared storage, all I did was reload XenServer on that machine, modify the multipath-enabled.conf file (that is a different blog topic for another day), and rejoin the server to the pool. Because the HBAs already had all the iSCSI information saved in the card, the storage automatically reconnected all the LUNs, the network interfaces took the configuration of the pool, and I was back online and running in less than 30 minutes.

After you repair the machine that failed and get it back online, you may want it to once again be the master server. To do this type:

xe host-list

You will get a list of available servers with their UUID’s. Record the UUID of the server that you want to designate as the new master and then type:

xe pool-designate-new-master host-uuid=[the uuid of the host you want]

After you type this your pool will again disappear from XenCenter, but after about 20 – 30 seconds (be patient) it will reappear with the new server as the master. Your pool should now be healthy, and you should again be able to manage servers as normal.

Does “Shared Nothing” Migration Mean the Death of the SAN?

You’ve probably heard that Hyper-V in Windows Server 2012 supports what Microsoft is calling “Shared Nothing” live migration. You can see a demo of that here, in a video that was posted on a TechNet blog back in July:

Now don’t get me wrong - the ability to live migrate a running VM from one virtualization host to another across the network with no shared storage behind it is pretty cool. But if you read through the blog post, you’ll also see that it took 8 minutes and 40 seconds to migrate a 16 Gb VM. (And I don’t know about you, but many of our customers have VMs that are substantially larger than that!) On the other hand, it took only 11 seconds to live migrate that same VM running on the same hardware when it was in a cluster with shared storage.

So I will submit that the answer to the question posed in the title of this post is “No” - clearly, having shared storage behind your virtualization hosts brings a level of resilience and agility far beyond what Shared Nothing migration brings. Still, for an SMB that has a small virtualization infrastructure with only two or three hosts and no shared storage, it’s a significant improvement over what they’ve historically had to go through to move a VM from one host to another: That has typically meant shutting the VM down, then exporting it to a storage repository that can be accessed by the other host (e.g., an external USB or network-attached hard drive), then importing it into the other host’s local storage, then booting it up…that can easily take an hour or more, during which time the VM is shut down and unavailable.

So Shared Nothing migration is pretty cool, but, as Rob Waggoner writes in the TechNet post linked above, don’t throw your SANs out just yet.

SAN Tips - Storage Repository Design

Back with another Moose Logic video for your viewing pleasure. In this installment, our own Steve Parlee, Moose Logic’s Director of Engineering, talks about SAN storage repository design concepts, and the effects your design choices have on things like snapshots, disk usage, and overall performance. In the process, you’ll also learn what we consider to be “best practice,” and some of the reasons why. As always, your comments will be appreciated. Enjoy!