Tag Archives: Xenserver

XenServer Host Is In Emergency Mode

It’s 8 pm on a Sunday evening, and I get a panicked call from a customer because he cannot connect to his XenServersTM via the XenCenterTM management tool. However, as near as he could tell, all of the hosted virtual machines were up and running and in a healthy state. He had unsuccessfully tried to point the XenCenter management tool at another member of the XenServer pool but was unsuccessful.

So what happened and how do you fix it?

This situation can happen for several reasons but generally it happens when there are only two servers in the XenServer pool, and the pool master suddenly fails. In essence, what happens is the surviving server (let’s just call it the “slave”) can no longer see its peer, the pool master, so it assumes it has been stranded and goes into emergency mode to protect its own VMs. There are other ways this can happen (an incorrectly configured pool with HA turned on for example), but this is the most common reason that I have personally experienced.

Depending upon the situation, you may not be able to ping the master server because it is actually down, or you may be able to ping the server but it is in an inconsistent, “locked up”, state such that it cannot answer requests to it. If you are able to connect to the console of the master server either directly with a monitor, keyboard, and mouse (the old fashioned way) or through a remote management interface (DRAC, ILO, ILOM, etc) the server may appear to be running, but you may not be able to do anything with it.

At this point you may be thinking, “This is no big deal - just reboot the machine and it will be fine.” If you are lucky that may actually solve the problem, but in many cases it will not. What you might see is that after the master reboots you will be able to connect to the master but you will not see the slave. Or it may be that your master is truly broken and you are not able to simply reboot it due to a system or hardware failure. But, of course, you’ve still got to get your pool online and working again regardless.

During this period of time, if you try to use a tool such as Putty to connect to the slave via its management interface, you may not be able to connect to it either. If you try to ping the slave on the management interface you may not get any replies. But if you connect to the console of the slave (again, either the physical console or via a remote management interface) you will probably see that the machine is running, but if you look at XSconsole it will appear that the management interface is gone because there will be no IP address showing. By now you’ll probably be scratching your head because the strange thing is all the VMs are running.

So at this point your master appears to be down, or at least impaired, you’ve got no management interface on the slave, your pool is broken and you cannot manage the VMs. So what do you do?

Well, if this happens to you and your VMs are still up and running the first thing you should do is take a deep breath, because more than likely it is not as bad as you might think. XenServer is a robust platform and if the infrastructure is built correctly (and I’m going to quote a customer), “you can really slam the things around and they still work”.

After you take a deep breath and let it out slowly, from the console of the slave server, you will need to access the command line and start by typing:

xe host-is-in-emergency-mode

If the server returns an answer of “True” then you’ve confirmed that the server has gone into emergency mode in order to protect itself and the VMs running on it. (If the server returns an answer of “False” then you can stop reading, because the rest of this post isn’t going to help you.)

Assuming you receive the answer of “True” the slave server is in emergency mode because it cannot see a master – either because the master is actually down, or because the management interface(s) is(are) not working. Therefore, the next step is to promote the slave to master to get it out of emergency mode. We do this by typing:

xe pool-emergency-transition-to-master

At this point the slave server should take over as the pool master and the management interface should be available again. Now if you type the xe host-is-in-emergency-mode command again you should get an answer of “False”.

Now, open XenCenter again. It will first try to connect to the server that was the master, but after it times out it will then attempt to connect to the new master server. Be patient, because eventually it will connect (it may take several seconds) and you will again see your pool and be able to manage your VM’s. If some of the VMs are down because they were on the server that failed you’ll be able to start them on the remaining server (assuming you have shared backend storage and sufficient processor and memory resources).

Now what about the master if it has totally failed? What do I do after I’ve fixed, say, a hardware problem in order to return it to my pool?

If the following two conditions are true:

  1. You are using shared storage so that your VMs are not stored on the XenServer local drives, and
  2. You have built your XenServers with HBAs (fiber or iSCSI) rather than using Open iSCSI, which means the connectivity information to your backend SAN will be stored within the HBA,

…then it may be much simpler and quicker just to reload the XenServer operating system. (If you do not have shared backend storage, which means your VMs are on local storage, DO NOT DO THIS). I can rebuild my XenServers from scratch in about 20 - 30 minutes and have them back in the pool and running.

If either of those two conditions is not true then, depending upon your situation, recovery may be significantly more difficult. It could be as simple as resetting your Open iSCSI settings and connecting back to your SAN (still easy but takes more time to accomplish) or it could be as painful as rebuilding your VMs because you lost your server drives. (OUCH!)

Real world example: I recently had a NIC fail on the motherboard of my master server. Of course since the NIC was on the motherboard it meant the whole motherboard had to be replaced which significantly modified the hardware configuration for that server.

In this case, when I brought that XenServer back online it still had all the information about the old NICs showing in XenCenter, plus it had all the new NICs from the new hardware. Yes I could have used some PIF forget commands to remove the NICs that no longer existed and reconfigure everything but that would have taken me a bit of time to straighten out. Since I had iSCSI HBAs attached to a Datacore SAN (great product, by the way) for shared storage, all I did was reload XenServer on that machine, modify the multipath-enabled.conf file (that is a different blog topic for another day), and rejoin the server to the pool. Because the HBAs already had all the iSCSI information saved in the card, the storage automatically reconnected all the LUNs, the network interfaces took the configuration of the pool, and I was back online and running in less than 30 minutes.

After you repair the machine that failed and get it back online, you may want it to once again be the master server. To do this type:

xe host-list

You will get a list of available servers with their UUID’s. Record the UUID of the server that you want to designate as the new master and then type:

xe pool-designate-new-master host-uuid=[the uuid of the host you want]

After you type this your pool will again disappear from XenCenter, but after about 20 – 30 seconds (be patient) it will reappear with the new server as the master. Your pool should now be healthy, and you should again be able to manage servers as normal.

IntelliCache and the IOPS Problem

If you’ve been following this blog for any length of time, you know that we’ve written extensively about XenDesktop, and spent a lot of time on best practices and problems to avoid. And one of the biggest problems to avoid is poor storage design resulting in poor VDI performance.

In a nutshell, the problem is that a Windows desktop OS uses disk far differently than a Windows server OS. Thanks to the way Windows uses the swap file, disk writes outnumber disk reads by about 2 to 1. You can build your virtual desktop infrastructure on the latest and greatest server hardware, with tons of processing power and insanely huge amounts of RAM, but if all of the disk I/O for all of those virtual desktops is hitting your SAN, you’ve got a scalability problem on your hands.

Provisioning Services (“PVS”) can help to mitigate this in two ways (assuming for sake of argument that you’re provisioning multiple virtual systems from a common, read-only image): First, if you build your Provisioning Servers correctly, you should be able to serve up most of the OS read operations from the Provisioning Server’s own cache memory. Second, you can use the virtualization host’s local disk storage as the required “write cache” - because all of those write operations have to go somewhere while the virtual system is running.

But XenDesktop 5 introduced a new way to provision desktops called “Machine Creation Services” (“MCS”). We wrote about this in the April edition of our Moose Views newsletter, so if you’re not familiar with all the pros and cons of MCS vs. PVS, I’d encourage you to take a brief time out and read that article. Suffice it to say that, despite all the advantages of MCS, the biggest downside of using MCS to provision pooled desktops was that all of the IOPS hit your SAN storage, which limited the scalability of an MCS-provisioned VDI deployment.

But all of that just changed, with the release of XenDesktop 5 Service Pack 1, which was made available for download a week ago (May 13). With SP1, XenDesktop 5 is now able to take advantage of the “IntelliCache” feature that was introduced as part of XenServer v5.6 Service Pack 2. Using MCS with the combination of XenDesktop 5 SP1 and XenServer SP2…

  • The first time a virtual desktop is booted on a given XenServer, the boot image is cached on that XenServer’s local storage.
  • Subsequent virtual desktops booted on that same XenServer will boot and run from that locally cached image.
  • You can use the XenServer’s local storage for the write cache as well.

The bottom line is that you can move as much as 90% of the IOPS off of the SAN and onto local XenServer storage, removing nearly all of the scalability limitations from an MCS-provisioned environment.

With most of the IOPS for running VMs taking place on local storage, it’s pretty straightforward to figure out how many VMs you can expect to support on a given virtualization host. Dan Feller’s blog post does a great job of walking through the process of calculating the functional IOPS that your local XenServer storage repository should be able to support, and inferring from that number how many light, normal, or power users you should be able to support as a result.

This also means that using XenServer as the hypervisor for your XenDesktop 5 deployment is going to yield a significant performance advantage over any other hypervisor, unless or until the other guys come out with similar local caching features. So, if you’re a VMware shop, my advice is this: Go ahead and virtualize all of the supporting XenDesktop server components on your VSphere infrastructure. Run your XenDesktop 5 VMs on XenServer hosts, and just don’t tell anyone! If you’re asked, just say, “Oh, yeah, these are my XenDesktop host systems - they’re completely separate from our VSphere infrastructure, because we don’t need the (insert favorite VSphere feature) function for these systems.” Your infrastructure will run better, and no one will know but you…

XenServer Tips - HBAs, HA, and HOSTDEVSCAN

In this installment of the Moose Logic Video Series, Steve Parlee, our Director of Engineering, talks about:

  • Why we always use iSCSI HBAs in our Citrix XenServer deployments.
  • The possible risks of using HA in a two-server pool. (NOTE: Initial testing indicates that XenServer v5.6 may not present the same problems in a two-server pool as earlier versions. When we have completed our testing, we will post an update here.)
  • A useful utility for XenServer called “hostdevscan.”

Citrix Continues to Virtualize Appliances

Five or six years ago, when Citrix first announced the Citrix Access Gateway appliance, I remember scratching my head and thinking, “Wait a minute, Citrix is in the software business. Why in the world do they want to start selling hardware, with all of the warranty, repair, and support issues that come along with being a hardware manufacturer?” The answer, of course, was that in order to build out the complete Application Delivery solution they envisioned, they needed components that, at the time, couldn’t be delivered using software alone.

But the world turns, and time moves on, and today Citrix has a world-class virtualization platform that runs on off-the-shelf server hardware that is itself mind-bogglingly powerful compared to what was available five or six years ago. So it makes all the sense in the world for Citrix to turn all of those hardware devices into virtual appliances as quickly as they can.

Yesterday, they formally announced virtualized versions of both the Access Gateway and the Branch Repeater. We’ll get to the virtual Branch Repeater in another post, because we’ll have our hands full in this one just covering the things you need to know about the Access Gateway VPX.

First, you need to know that the Access Gateway VPX is fundamentally a virtualized version of the 2010 CAG Appliance - with some exceptions that we’ll get into in a moment. You can download it and use XenCenter to import it directly into your XenServer environment. The cost is only $995 (compared to $3,500 for the 2010 hardware appliance), with an ongoing Subscription Advantage cost of $129/year. Here’s where it gets cool:

  • It was difficult to come up with a good solution for redundancy and automatic failover with the 2010 appliance. Unless you wanted to put a load-balancer in front of it (and if you’re going to do that, you may as well just buy a NetScaler in the first place), redundancy depended on putting primary and secondary appliance URLs or IP addresses into the CAG client. And that did you no good at all if you were trying to run it in “CSG-replacement mode” just to provide secure Web access to a XenApp farm. But the VPX virtual appliance fully supports Live Motion, XenServer HA, and NIC bonding. So the VPX doesn’t have to be redundant, because the underlying XenServer infrastructure can provide the resilience you need.
  • If you were using a 2010 appliance, and wanted to use “SmartAccess,” you had to stand up a separate “Advanced Access Control” Web server in your protected network. Obviously, that added to the cost and complexity of the solution. The VPX appliance, on the other hand, supports SmartAccess policies directly.

    Edit July 27, 2010: Not sure now where I originally picked up this information, but it is incorrect. An Advanced Access Control Web server is still required to enable SmartAccess policies with the Access Gateway VPX.

NOTE: SmartAccess, in case you’re not familiar with the term, allows you to control, at a very granular level, what applications and information a user can access, and what they can do with that information, based on the access scenario. The same user, presenting the same authentication credentials, might get a totally different level of access depending on whether s/he is connecting from inside the corporate network, from outside the network using a company-owned laptop, from home using a personal PC, or from a hotel business center using a totally untrusted device. For more information on how SmartAccess works and why it’s cool, check out this video from Citrix TV:

  • The VPX appliance fully supports the latest generation of the Citrix Receiver, and works with Dazzle and the Merchandising Server.
  • You no longer need to buy VPN client licenses to run it in “CSG replacement” mode. This is a biggie. Citrix made it clear some time ago that they would not be putting any more development time and effort into enhancing the software “Citrix Secure Gateway.” But the CSG just wouldn’t die, for one simple reason: it’s free. If you own XenApp or XenDesktop licenses with current Subscription Advantage, you’ve got the rights to use the CSG software, and your only cost is a server to run it on…and that’s pretty low in today’s virtual world. Yes, it could be argued that the CAG appliance was somewhat more secure, since it ran on a hardened Linux-derived kernel. But it cost $3,500 plus roughly $100 per concurrent user. Hmmm… CSG, free, CAG appliance, several thousand dollars. That was an easy decision for a lot of users.

    Co-incident with the release of the VPX appliance, Citrix is announcing that they’re eliminating the Access Gateway Standard User Licenses. They will no longer be sold as of June 30. Instead, when you buy an Access Gateway (physical or virtual), you get a “platform license” that entitles you to use it to secure access to a XenApp or XenDesktop farm (i.e., what’s generally referred to as “CSG Replacement Mode”) at no additional charge. So now the equation is: CSG, free, but I’ve got to put it on a server, and if it’s a Windows Server, the OS is going to cost me $700 - $800 or so. CAG VPX, $995, but I import it directly into my XenServer infrastructure and don’t have to pay for anything else unless I want the advanced access functionality. Suddenly the value proposition looks a lot more attractive.

  • Speaking of the advanced access functionality, Citrix has made some licensing changes there as well. The Access Gateway Universal licensing model has been reduced from three tiers to two, and the prices have been lowered. So now, if you didn’t purchase the XenApp or XenDesktop Platinum Editions (which include Access Gateway Universal licenses), you can purchase the Access Gateway Universal licenses separately for $100/concurrent user in quantities up to 2,500, and $50/concurrent user for 2,500+ users.

What’s the down side? Well, I’m not sure there is one. The VPX appliance isn’t going to work well as a general-purpose SSL/VPN for thousands of concurrent users, but then neither did the 2010 hardware appliance. So if that’s what you need, or if you need the high-end enterprise features like Global Server Load Balancing to enable transparent failover to a Disaster Recovery site, then we need to have a conversation about NetScalers. But for basic CSG-like functionality, or a SmartAccess deployment for a few hundred concurrent users, the virtual appliance looks pretty darned attractive to me.

For more information on the Access Gateway VPX, including a demo of just how easy it is to import it into your XenServer environment and get it running, check out the following video from Citrix TV:

Understanding Microsoft Server Virtualization Rights

So, grasshopper, you have decided to take the plunge and virtualize your server infrastructure. Someone (perhaps us) explained the business benefits of virtualization, you decided that it made sense, and that it’s time to make the move. But do you know how virtualization will affect your Windows Server licensing model?

The first thing you need to know is that Windows Server licenses are assigned to physical hardware, not to server workloads. When you purchase a license, you must “assign” that license to a physical server. How do you do that? Well, in today’s world, there is no formal process for doing that, although if it makes you feel better, you can write it down somewhere.

You may assign more than one license to a physical server, but you may not assign the same license to more than one physical server. You may reassign a license from one physical server to another, but not more frequently than every 90 days, unless the server it was assigned to is being retired due to “permanent hardware failure.”

Sound reasonable so far? Of course it does. Right up until the license model runs head-on into one of the coolest features of virtualization: live motion. Most virtualization platforms, including Microsoft’s Hyper-V R2, allow you to easily move a virtual server from one physical host to another. Great feature, right? But if you do it, you may have just violated your Windows license agreement.

I say “may” because different versions of Windows Server come with different virtualization rights. For example, a Windows Server Standard license can be used to run one physical instance of Windows (and by “physical instance,” I mean Windows is installed directly on the hardware) or one virtual instance of Windows, but not both - unless the physical instance is being used solely to manage the virtual environment.

Let me say that another way: If you buy a single license for Windows Server Standard Edition with Hyper-V, you can install it directly on the hardware without bothering with the Hyper-V role. Or you can install the Hyper-V role, have one virtual Windows Server running on top of Hyper-V, and use the physical instance exclusively to manage the virtual instance. Of course, you haven’t really gained anything by doing that…but you can purchase additional copies of Windows Server Standard, assign them to the same physical host, and run more virtual servers on Hyper-V.

Thinking this scenario through, then, if you currently have a bunch of physical Windows Servers - each licensed with Windows Standard Edition - and you want to virtualize them all, that’s no problem. You can reassign your server licenses to your virtual hosts and be perfectly legal. As long as you don’t move a server from one host to another. But if all you own are Standard Edition licenses, and you move a server from one host to another, you’ve just violated the license agreement - unless you own a “spare” server license that you have “assigned” to the target server (the host you’re moving it to) but that is not being used.

Now, in the scenario I just described, it’s possible that the most cost-effective thing you could do is to just buy a few additional licenses as “spares” rather than re-licensing your entire environment. But let’s move ahead - once we’ve covered the other Windows editions that are available to you, you’ll be better able to decide what makes financial sense for your project.

Windows Server Enterprise Edition comes with expanded virtualization rights. Each Enterprise Edition license gives you the rights to run one physical instance and up to four virtual instances on the physical host to which it is assigned. Once again, if you want to run all four virtual instances, then the physical instance may only be used to manage the virtual environment. If you want to run other services on the physical instance - and that’s actually fairly common in a Hyper-V deployment - then you only get to run three virtual instances. And you may not split the license across multiple physical hosts.

The “estimated retail price” (just the license, no Software Assurance, assuming Open Business pricing) for Windows Enterprise is $2,358, vs. $726 for Windows Standard. So Enterprise is less expensive than four copies of Standard. Therefore, if you need to buy new licenses (perhaps you’re upgrading from Server 2003 to Server 2008 as part of your virtualization project), it may make sense in a small environment to buy a copy of Enterprise Edition for each virtual host, and perhaps supplement it with a few spare copies of Standard Edition. Here’s an example:

Let’s say you have a total of nine physical servers today, and you want to virtualize them on three dual-processor virtualization hosts. (You could probably run them on two hosts, but if one failed, it might be a stretch to run all nine on one host. If you start with three hosts, and one fails, you still have two to carry the load.) You could buy nine new copies of Windows Standard Edition for $6,534, but you’d have no flexibility to use live motion to move things around. On the other hand, you could buy three copies of Enterprise Edition for your three hosts for $7,074, and effectively have one “spare” instance on each host that’s available for moving a virtual machine from one host to another.

Of course, that may not be quite enough if you want to completely unload one of your servers (perhaps to take it off-line for maintenance), because unless you’re prepared to shut down one VM completely, you’re going to need to run five VMs on one of your remaining servers. Since you may not know in advance which server needs to assume the extra VM workload, you could just buy three additional copies of Standard Edition, and assign one to each host. That would push your total license acquisition cost to $9,252, but you would then be licensed for five VMs on each of your hosts.

The ultimate in flexibility is Windows Server Datacenter Edition. Datacenter Edition is licensed per processor socket rather than per physical host, but includes unlimited virtualization rights. You can run as many VMs on your hosts as they’re capable of running, and move them around to your heart’s content. If you just don’t want to worry about what’s running where or whether or not it’s technically legal to move a given VM around, this is the license model to use.

Of course, this is also the most expensive edition of Windows. The estimated retail price for Datacenter Edition is $2,405 per processor socket (regardless of the number of cores per processor). So it would cost $14,430 to license three dual-processor servers with Datacenter Edition. This probably isn’t cost effective if you’re only virtualizing nine servers. However, if you have lots of servers, and many of them are fairly lightly loaded (in terms of processor utilization), the picture could change. If your average consolidation ratio is greater than or equal to four servers per physical processor then Datacenter Edition becomes the most cost-effective license.

In fact, if you’re even close to that 4:1 ratio, you should strongly consider Datacenter Edition, for two reasons:

  1. Windows environments inevitably grow. However many servers you have today, you’re probably going to have more of them a year from now. With Datacenter Edition, you can continue to fire up new servers to the limits of your hardware without having to buy more server licenses.
  2. AMD already has six-core processors. You know the “arms race” between Intel and AMD will continue. So the number of servers per processor that you can reasonably expect to support will continue to increase as the processors themselves become more powerful and contain more cores, and as this happens, Datacenter Edition will look better and better.

Note that everything we’ve discussed holds true if you’re virtualizing on XenServer or VMware rather than on Hyper-V. The only difference is that you won’t be using any of the allowed physical instances of Windows.

If you want to delve deeper into this issue, you can download a copy of the Microsoft Product Use Rights document from their Web site. Happy virtualizing!