Category Archives: Disaster Recovery

Scott’s Book Arrived!

IMG_1716

We are pleased to announce that Scott’s books have arrived! ‘The Business Owner’s Essential Guide to I.T.’ is 217 pages packed full of pertinent information.

For those of you who pre-purchased your books, Thank You! Your books have already been signed and shipped, you should receive them shortly and we hope you enjoy them as much as Scott enjoyed writing for you.

If you haven’t purchased your copy, click here, purchase a signed copy from us and all proceeds will be donated to the WA chapter of Mothers Against Drunk Driving (MADD).

My first trip to a DataCenter

Friday, the technology leadership of VirtualQube (and me) descended upon Austin, Texas to meet with our datacenter vendor. It was a meeting long overdue as we had been doing business together for almost four years, but this was the first face-to-face meeting for the entire team.

Our vendor did their homework and took us out to dinner to Bob’s Steakhouse on Lavaca in downtown Austin the night before. It was a GREAT meal, and we had a blast checking out a number of watering holes in the area. According to our hosts, we apparently stopped the festivities just before entering the “seedy” part of the city. I feel like that was the perfect amount of fun to have, especially since we had a 4 hour meeting starting at 9am the next day.

The first thing we started with was a tour of the facility. Our vendor is in a CyrusOne Type Four Level II datacenter. For the uninitiated, this means it’s the best of the best. Fully redundant everything, generally with another safety valve or failover in addition. And the majority of these failovers were tested MONTHLY. Whoa, that’s impressive. We even saw the four huge generators outside that were gas powered and would support the entire building in case of a loss of electricity. Looking inside them (which we weren’t supposed to be allowed to do) was awe inspiring. Basically a V-12 design, with a filter on each cylinder due to its size. I didn’t get the specs, and I would have gotten a photo but security showed up right as I had grabbed for my phone. Just believe me that this building had thought of everything that could go wrong.

I broke the rules and took a picture of all the blinking lights. Kinda looks like my home theater, only more expensive (which is tough to do!).

DC1DC2

 

After the tour we talked about ways to work together for the coming years and both teams came away with a list of action items to make our collective futures brighter. And I’m off to get started on one of those projects now!

Why Not Amazon Web Services?

Thinking about moving to Amazon Web Services? Address these 4 Concerns FIRST

We’ve been hearing a lot of debate recently about using Amazon Web Services for all or part of a cloud infrastructure. Many people sing their praises whole-heartedly, and we here at VirtualQube have even explored their offerings to see if there was an opportunity to bend our own cost curve. But there really is a mixed bag of benefits and features. How do you know if the move is right for you? We’ve narrowed it down to 4 concerns you should address in light of your own circumstances before making the move.

blog

1. Business impact

First of all, let’s analyze the business model for AWS. Amazon rents out virtual machines for a reasonable price per desktop. But in order to get their best price, you have to pony up 36 months of service fees in advance to rent the space. If you’re an enterprise with three years of IT budget available, this is a great deal. If not, take a closer look.

The pricing from AWS also assumes that virtual machines will be spinning-down 40% of the day. If your workforce mostly logs in within an 8:30am – 6:00pm time frame, you will greatly benefit from this pricing. If your employees have much more flexibility in their schedules (due to travel, seasonal workload spikes, or shifting hours for coverage), then you may need to look at another provider.

AWS also allows for 1GB of data to flow into their cloud for free, and only charges for the outflow of data. This is great for storage if you only use it sparingly, or in the case of a disaster, but can add-up quickly if you need to access your data frequently. While this doesn’t seem to be a concern now, as businesses exchange more and larger files, the cost of this pricing model could quickly outweigh the benefits.

2. Operations impact

The operational capabilities of Amazon truly are world-class. However, to achieve scale and offer its best price/experience the lowest cost of operations, AWS has one set way of operating and its customers are required to interact with AWS in this one way alone. So AWS may not offer the flexibility that would make it easy for you to add services to your existing operations.

If your firm fits AWS’s standard use case, it could lead to an easy transition, but if you have unique requirements, the friction caused with your organization could quickly lead to discord, operational changes, and many other business costs while trying to fit the mold AWS promotes.

3. Technology impact

The technology benefit of AWS is really second-to-none. Their infrastructure has the best hardware and capabilities offered by any of the cloud vendors. The efficiencies of scale mean you can get access to best-of-breed hardware faster than you would otherwise. The only caveat is this could give you a false sense of security.

How we approach business is to think of all the things that CAN go wrong, because many times they eventually do. We coach our customers to prepare for the times when technology will fail. And fail it will. We have consistently seen multi-million dollar technology fail unpredictably, even in hundred million dollar installations. These cases are NOT supposed to happen, and may not happen frequently, but they will happen. And if the failure impacts your business, it doesn’t matter how expensive the underlying technology is. And when the technology does fail, will you be able to get a senior engineer on the phone to immediately address your concerns?

4. Flexibility impact

The ability for AWS to match your business needs during hyper-growth and/or significant volatility could make the business case alone. With AWS’s web interface, your internal technology leader can order additional computing capabilities and they will be ready as soon as you hit “Enter.” The days of placing hardware into a room, hooking up cables, creating and testing images are truly over for all cloud users, and AWS does shorten the timeline for creating these technologies from minutes to seconds. Companies with significant growth who are doubling or tripling in size within a year many years in a row are a perfect match for AWS. No question.

Your final decision…

To sum it all up: AWS works well for you if:

  • You have a scale of operations and support for tens of thousands of users
  • There is three years of IT spend on the balance sheet and it can be invested today
  • You are a typical player in your industry, which fits AWS’s definition of your industry
  • Your IT needs to meet business demands that fluctuate exponentially, immediately, and unpredictably
  • You take advantage of some advanced features for business continuity

For an more in-depth discussion on this topic, check out this: LINK. For an in-depth cost analysis, check out this: LINK. Please note, you will have to be a Citrix Service Partner to access the cost analysis.

 

The Red Cross Wants to Help You with DR Planning

Red Cross Ready Rating Program

Ready Rating Program Seal


A few days ago, I spotted a headline in the local morning paper: “SBA Partners with the Red Cross to Promote Disaster Planning.” We’ve written some posts in the past that dealt with the importance of DR planning, and how to go about it, so this piqued my curiosity enough that I visited the Red Cross “Ready Rating” Web site. I was sufficiently impressed with what I found there that I wanted to share it with you.

Membership in the Ready Rating program is free. All you have to do to become a member is to sign up and take the on-line self-assessment, which will help you determine your current level of preparedness. And I’m talking about overall business preparedness, not just IT preparedness. The assessment rates you on your responses to questions dealing with things like:

  • Have you conducted a “hazard vulnerability assessment,” including identifying appropriate emergency responders (e.g., police, fire, etc.) in your area and, if necessary, obtaining agreements with them?
  • Have you developed a written emergency response plan?
  • Has that plan been communicated to employees, families, clients, media representatives, etc.?
  • Have you developed a “continuity of operations plan?”
  • Have you trained your people on what to do in an emergency?
  • Do you conduct regular drills and exercises?

That last point is more important than you might think. It’s not easy to think clearly when you’re in the middle of an earthquake, or when you’re trying to find the exit when the building is on fire and there’s smoke everywhere. The best way to insure that everyone does what they’re supposed to do is to drill until the response is automatic. It’s why we had fire drills when we were in elementary school. It’s still effective now that we’re all grown up.

Once you become a member, your membership will automatically renew from year to year, as long as you take the self-assessment annually and can show that your score has improved from the prior year. (Once your score reaches a certain threshold, you’re only required to maintain that level to retain your membership.)

So, why should you be concerned about this? It’s hard to imagine that, after the tsunami in Japan and the flooding and tornadoes here at home, there’s anyone out there who still doesn’t get it. But, just in case, consider these points taken from the “Emergency Fast Facts” document in the members’ area:

  • Only 2 in 10 Americans feel prepared for a catastrophic event.
  • Close to 60% of Americans are wholly unprepared for a disaster of any kind.
  • 54% of Americans don’t prepare because they believe a disaster will not affect them – although 51% of Americans have experienced at least one emergency situation where they lost utilities for at least three days, had to evacuate and could not return home, could not communicate with family members, or had to provide first aid to others.
  • 94% of small business owners believe that a disaster could seriously disrupt their business within the next two years.
  • 15 – 40% of small businesses fail following a natural or man-made disaster.

If you’re not certain how to even get started, they can help there as well. Here’s a screen capture showing a partial list of the resources available in the members’ area:

Member Resources

You may also want to review the following articles and posts:

And speaking of getting started, check this out: Just about everything I’ve ever read about disaster preparedness talks about the importance of having a “72-hour kit” – something that you can quickly grab and take with you that contains everything you need to survive for three days. Well, for those of you who haven’t got the time to scrounge up all of the recommended items and pack them up, you may find the solution at your local Costco. Here’s what I spotted on my most recent trip:

Pre-Packaged 3-day Survival Kit

Yep, it’s a pre-packaged 3-day survival kit. The cost at my local store (in Woodinville, WA, if you’re curious) was $69.95. That, in my opinion, is a pretty good deal.

So, if you haven’t started planning yet, consider this your call to action. Don’t end up as a statistic. You can do this.

High Availability vs. Fault Tolerance

Many times, terms like “High Availability” and “Fault Tolerance” get thrown around as though they were the same thing. In fact, the term “fault tolerant” can mean different things to different people – and much like the terms “portal,” or “cloud,” it’s important to be clear about exactly what someone means by the term “fault tolerant.”

As part of our continuing efforts to guide you through the jargon jungle, we would like to discuss redundancy, fault tolerance, failover, and high availability, and we’d like to add one more term: continuous availability.

Our friends at Marathon Technologies shared the following graphic, which shows how IDC classifies the levels of availability:

Graphic of Availability Levels

The Availability Pyramid



Redundancy is simply a way of saying that you are duplicating critical components in an attempt to eliminate single points of failure. Multiple power supplies, hot-plug disk drive arrays, multi-pathing with additional switches, and even duplicate servers are all part of building redundant systems.

Unfortunately, there are some failures, particularly if we’re talking about server hardware, that can take a system down regardless of how much you’ve tried to make it redundant. You can build a server with redundant hot-plug power supplies and redundant hot-plug disk drives, and still have the system go down if the motherboard fails – not likely, but still possible. And if it does happen, the server is down. That’s why IDC classifies this as “Availability Level 1″ (“AL1″ on the graphic)…just one level above no protection at all.

The next step up is some kind of failover solution. If a server experiences a catastrophic failure, the work loads are “failed over” to a system that is capable of supporting those workloads. Depending on those work loads, and what kind of fail-over solution you have, that process can take anywhere from minutes to hours. If you’re at “AL2,” and you’ve replicated your data using, say, SAN replication or some kind of server-to-server replication, it could take a considerable amount of time to actually get things running again. If your servers are virtualized, with multiple virtualization hosts running against a shared storage repository, you may be able to configure your virtualization infrastructure to automatically restart a critical workload on a surviving host if the host it was running on experiences a catastrophic failure – meaning that your critical system is back up and on-line in the amount of time it takes the system to reboot – typically 5 to 10 minutes.

If you’re using clustering technology, your cluster may be able to fail over in a matter of seconds (“AL3″ on the graphic). Microsoft server clustering is a classic example of this. Of course, it means that your application has to be cluster-aware, you have to be running Windows Enterprise Edition, and you may have to purchase multiple licenses for your application as well. And managing a cluster is not trivial, particularly when you’ve fixed whatever failed and it’s time to unwind all the stuff that happened when you failed over. And your application was still unavailable during whatever interval of time was required for the cluster to detect the failure and complete the failover process.

You could argue that a fail over of 5 minutes or less equals a highly available system, and indeed there are probably many cases where you wouldn’t need anything better than that. But it is not truly fault tolerant. It’s probably not good enough if you are, say, running a security application that’s controlling the smart-card access to secured areas in an airport, or a video surveillance system that sufficiently critical that you can’t afford to have a 5-minute gap in your video record, or a process control system where a five minute halt means you’ve lost the integrity of your work in process and potentially have to discard thousands of dollars worth of raw material and lose thousands more in lost productivity while you clean out your assembly line and restart it.

That brings us to the concept of continuous availability. This is the highest level of availability, and what we consider to be true fault tolerance. Instead of simply failing workloads over, this level allows for continuous processing without disruption of access to those workloads. Since there is no disruption in service there is no data loss, no loss of productivity and no waiting for your systems to restart your workloads.

So all this leads to the question of what your business needs.

Do you have applications that are critical to your organization? If those applications go down how long could you afford to be without access to them? If those applications go down how much data can you afford to lose? 5 minutes? An hour? And, most importantly, what does it cost you if that application is unavailable for a period of time? Do you know, or can you calculate it?

This is another way to ask what the requirements are for your “RTO” (“Recovery Time Objective” – i.e., how long, when a system goes down, do you have before you must be back up) and “RPO” (“Recovery Point Objective” – i.e., when you do get the system back up, how much data it is OK to have lost in the process). We’ve discussed these concepts in previous posts. These are questions that only you can answer, and the answers are significantly different depending on your business model. If you’re a small business, and your accounting server goes down, and all it means is that you have to wait until tomorrow to enter today’s transactions, it’s a far different situation from a major bank that is processing millions of dollars in credit card transactions.

If you can satisfy your business needs by deploying one of the lower levels of availability, great! Just don’t settle for an AL1 or even an AL3 solution if what your business truly demands is continuous availability.