Category Archives: High Availability

High Availability and Fault Tolerance Part Two

In my last post on High Availability and Fault Tolerant servers (HA/FT) we talked a little bit about redundant power, meaning you have more than one source of electricity to run your servers. But there are numerous other internal threats that can cause unplanned server outages.

After backup power the next level of redundancy comes in your servers themselves. Most server class machines have numerous redundant components built right in such as hard drives and power supplies. This means that right off the shelf, these systems have some level of Fault Tolerance (FT) built in. This can keep application and data available when a component fails. However there are still numerous threats that can cause unplanned outages. This happens when non-redundant components fail, or when multiple components fail.

Remember that High Availability means that if a virtual or physical machine goes down, it will automatically restart and come back online. Fault Tolerance means that multiple components can fail with no loss of data and no interruption of application availability.

To take HA/FT to a higher level we can turn to one of several products available on the market. Products from companies like Vision Solutions (Double Take) can provide software that allows you to create a stand-by server. More sophisticated products from VMware and Stratus allow you to mirror applications and data on identical servers using a concept known as lock-step. Lock-step means that applications and data are being processed in real time across two hosts. With these products multiple components or an entire server can fail and your applications continue to be available to users.

With Double Take Software from Vision Solutions, IT staff can create a primary and standby server pair that replicates all of your data to a stand by server in real time. This is a sufficient solution for most small to medium enterprises. However, if the primary server fails, there is still a brief interruption in application availability while the failover to the standby server occurs. In special situations that require the highest levels of High Availability and Fault Tolerance we turn to solutions from VMware or Stratus. This provides a scenario where multiple components can fail on multiple servers and your application will continue to run.

Determining which approach is right for you is really an economic decision based on the cost of downtime. If you can’t put a dollar value on what it costs your business per hour or per day when a critical application is unavailable, then that application probably isn’t sufficiently critical for you to spend a lot of money on an HA/FT solution. If you do know what that cost is, then, just like buying any other kind of business insurance, you can make a business decision as to how much money you can justify spending to protect against that risk of loss.

Is It Time to Upgrade Your DataCore SANsymphony-V?

A few months ago, DataCore released SANsymphony-V 10.0. If you’re running an earlier version of SANsymphony-V, there are several reasons why you might want to start planning your upgrade. There are some great new features in v10, and we’ll get to those in a moment, but you should also bear in mind that DataCore’s support policy is to support the current full release (v10) and the release just previous to the current full release (v9.x). Support for v8.x officially ends on December 31, 2014, and support for v7.x ended last June.

That doesn’t mean DataCore won’t help you if you have a problem with an earlier version. It does mean that their obligation is limited to “best effort” support, and does not extend to bug fixes, software updates, or root-cause analysis of issues you may run into. So, if you’re on anything earlier than v9.x, you really should talk to us about upgrading.

But even if you’re on v9.x, there are some good reasons why you may want to upgrade to 10.0:

  • Scalability has doubled from 16 to a maximum of 32 nodes.
  • Supports high-speed 40/56 GbE iSCSI, 16 Gb Fibre Channel, and iSCSI target NIC teaming.
  • Performance visualization/heat map tools to give you better insight into the behavior of flash and disk storage tiers.
  • New auto-tiering settings to optimize expensive resources like flash cards.
  • Intelligent disk rebalancing to dynamically redistribute the load across available disks within a storage tier.
  • Automated CPU load leveling and flash optimization.
  • Disk pool optimization and self-healing storage - disk contents are automatically restored across the remaining storage in the pool.
  • New self-tuning caching algorithms and optimizations for flash cards and SSDs.
  • Simple configuration wizards to rapidly set up different use cases.

And if that’s not enough, v10 now allows you to provision high-performance virtual SANs that can scale to more than 50 million IOPS and up to 32 Petabytes of capacity across a cluster of 32 servers. Not sure whether a virtual SAN can deliver the performance you need? They’ll give you a free virtual SAN for non-production evaluation use.

Check out this great overview of software-defined storage virtualization:

Stratus everRun Enterprise Technical Certification Course

This week I am sitting the Stratus everRun Enterprise Technical Certification course, held at Stratus Technologies, headquartered in Maynard, MA. Much to my delight this is an accelerated course where they cram five days of training into three!  I have worked with this and similar technologies originally designed and offered by Marathon Technologies since the nineties, so it’s like old home week coming here. Last year Stratus purchased Marathon and ultimately this is good for the evolution of the everRun product line. This week’s course is designed to train experienced technicians on the newly redesigned everRun Enterprise and I can say that I am very impressed with the new system. Originally built on Citrix XenServer the new product is built on Centos Linux 6.5 and uses the KVM hypervisor. I am a big fan of XenServer but working with this new version of everRun I see the wisdom the Stratus has leveraged in designing this impressive product. The everRun product line is designed to provide a highly fault tolerant server platform to run workloads that require up to five nines of “up time” (less than six minutes of downtime per year). This product is highly regarded as one of the best solutions in the industry for protecting critical applications and data. If you operate computer systems that cannot afford to be down you owe it to yourself to look at this system. Class wraps up tomorrow and I’ll be dashing to the airport to head home to sunny Seattle!

High Availability and Fault Tolerance Part One

In computing environments we generally accept that there are many conditions that can result in unexpected downtime. Unexpected downtime is any condition that results in not having access to your systems, applications, and data. A power outage that causes your computer to shut down is a simple example of “unexpected downtime.” In computing environments we work diligently to prevent this condition through various facilities. To prevent downtime due to a power outage we use “Uninterruptable Power Supplies” (UPS) which is a nice name for a battery pack designed to be on standby and “stand in” to keep your systems running until the power comes back online. We also can take the next step and put standby generators in place that will run indefinitely until the power is restored. The safeguards can be regarded as steps to provide “High Availability” (HA) and “Fault Tolerance” (FT) for computing systems. Over the next few weeks I will be publishing a series of blog posts exploring the difference between HA and FT in various environments and differing levels of complexity all the way up to how we use Stratus Technologies everRUN MX to provide guaranteed “five nines” of uptime for windows server workloads. A guarantee of five nines of uptime means that you should experience less than 5.25 minutes of unexpected downtime per year. We have worked with the everRun technology for over 15 years and our experience with this product has been that it performs as advertised and solves the uptime issue where many others fail. In addition to being able to protect your systems from server failures it is also possible to protect an application across multiple datacenters over distance. Besides its amazing track record everRun also allows up to 8 vCPU cores which is an amazing feat that no-one else in the industry can offer. Please stay tuned to our blog for more information on this exciting technology.

Scott’s Book Arrived!

We are pleased to announce that Scott’s books have arrived! ‘The Business Owner’s Essential Guide to I.T.’ is 217 pages packed full of pertinent information.

For those of you who pre-purchased your books, Thank You! Your books have already been signed and shipped, you should receive them shortly and we hope you enjoy them as much as Scott enjoyed writing for you.

If you haven’t purchased your copy, click here, purchase a signed copy from us and all proceeds will be donated to the WA chapter of Mothers Against Drunk Driving (MADD).