All Posts By

Ryan Pehrson

Why VDI Is So Hard and What To Do About It

By | EMC, View, Virtualization | No Comments

Rapid consumerization, coupled with the availability of powerful, always-connected mobile devices and the capability for anytime, anywhere access to applications and data is fundamentally transforming the relationship of IT and the end user community for most of our customers.

IT departments are now faced with the choice to manage an incredible diversity of new devices and access channels, as well as the traditional desktops in the old way, or get out of the device management business and instead deliver IT services to the end-user in a way that aligns with changing expectations. Increasingly, our customers are turning to server-hosted virtual desktop solutions—which provide secure desktop environments accessible from nearly any device—to help simplify the problem. This strategy, coupled with Mobile Device Management tools, helps to enable BYOD and BYOC initiatives, allowing IT to provide a standardized corporate desktop to nearly any device while maintaining control.

However, virtual desktop infrastructure (VDI) projects are not without risk. This seems to be well understood, because it’s been the “year of the virtual desktop” for about four years now (actually, I’ve lost count). But we’ve seen and heard of too many VDI projects that have failed due to an imperfect understanding of the related design considerations or a lack of data-driven, fact-based decision making.

There is really only one reason VDI projects fail: The provided solution fails to meet or exceed end-user expectations. Everything else can be rationalized – for example as an operational expense reduction, capital expense avoidance, or security improvement. But a CIO who fails to meet end user expectations will either have poor adoption, decreased productivity, or an outright mutiny on his/her hands.

Meeting end-user expectations is intimately related to storage performance. That is to say, end user expectations have already been set by the performance of devices they have access to today. That may be a corporate desktop with a relatively slow SATA hard drive or a MacBook Air with an SSD drive. Both deliver dedicated I/O and consistent application latency. Furthermore, the desktop OS is written with a couple of salient underlying assumptions – that the OS doesn’t have to be a “nice neighbor” in terms of access to CPU, Memory, or Disk, and that the foreground processes should get access to any resources available.

Contrast that with what we’re trying to do in a VDI environment. The goal is to cram as many of these resource-hungry little buggers on a server as you can in order to keep your cost per desktop lower than buying and operating new physical desktops.

Now, in the “traditional” VDI architecture, the physical host must access a shared pool of disk across a storage area network, which adds latency. Furthermore, those VDI sessions are little resource piranhas (Credit: Atlantis Computing for the piranhas metaphor). VDI workloads will chew up as many as IOPS as you throw at them with no regard for their neighbors. This is also why many of our customers choose to purchase a separate array for VDI in order to segregate the workload. This way, VDI workloads don’t impact the performance of critical server workloads!

But the real trouble is that most VDI environments we’ve evaluated average a whopping 80% random write at an average block size of 4-8K.

So why is this important? In order to meet end-user expectations, we must provide sufficient IO bandwidth at sufficiently low latency. But most shared storage arrays should not be sized based on front-end IOPS requirements. They must be sized based on backend IOPS and it’s the write portion of the workload which suffers a penalty.

If you’re not a storage administrator, that’s ok. I’ll explain. Due to the way that traditional RAID works, a block of data can be read from any disk on which it resides, whereas for a write to happen, the block of data must be written to one or more disks in order to ensure protection of the data. RAID1, or disk mirroring, suffers a write penalty factor of 2x because the writes have to happen on two disks. RAID5 suffers a write penalty of 4x because for each change to the disk, we must read the data, read the parity information, then write the data and write the parity to complete one operation.

Well, mathematically this all adds up. Let’s say we have a 400 desktop environment, with a relatively low 10 IOPS per desktop at 20% read. So the front-end IOPS at steady state would be:

10 IOPS per desktop x 400 Desktops = 4000 IOPS

 If I was using 10k SAS drives at an estimated 125 IOPS per drive, I could get that done with an array of 32 SAS drives. Right?

Wrong. Because the workload is heavy write, the backend IOPS calculation for a RAID5 array would look like this:

(2 IOPS read x 400 desktops) + (8 IOPS write x 400 desktops x 4 R5 Write Penalty) IOPS

This is because 20% of the 10 IOPS are read and 80% of the IOPS are write. So the backend IOPS required here is 13,600. On those 125 IOPS drives, we’re now at 110 drives (before hot-spares) instead of 32.

But all of the above is still based on this rather silly concept that our users’ average IOPS is all we need to size for. Hopefully we’ve at least assessed the average IOPS per user rather than taking any of the numerous sizing assumptions in vendor whitepapers, e.g. Power Users all consume 12-18 IOPS “steady state”. (In fairness, most vendors will tell you that your mileage will vary.)

Most of our users are used to at least 75 IOPS (a single SATA drive) dedicated to their desktop workload. Our users essentially expect to have far more than 10 IOPS available to them should they need it, such as when they’re launching Outlook. If our goal is a user experience on par with physical, sizing to the averages is just not going to cut it. So if we use this simple sizing methodology, we need to include at least 30% headroom. So we’re up to 140 disks on our array for 400 users assuming traditional RAID5. This is far more than we would need based on raw capacity.

The fact is that VDI workloads are very “peaky.” A single user may average 12-18 IOPS once all applications are open, but opening a single application can consume hundreds or even thousands of IOPS if it’s available. So what happens when a user comes in to the office, logs in, and starts any application that generates a significant write workload—at the same time everyone else is doing the same? There’s a storm of random reads and writes on your backend, your application latency increases as the storage tries to keep up, and bad things start to happen in the world of IT.

So What Do We Do About It?

I hope the preceding discussion gives the reader a sense of respect for the problem we’re trying to solve. Now, let’s get to some ways it might be solved cost-effectively.

There are really two ways to succeed here:

1)    Throw a lot of money at the storage problem, sacrifice a goat, and dance in a circle in the pale light of the next full moon [editor’s notes: a) IDS does not condone animal sacrifice and b) IDS recommends updating your resume LinkedIn Profile in this case];

2)    Assess, Design, and Deliver Results in a disciplined fashion.

Assess, Don’t Assume

The first step is to Assess. The good news is that we can understand all of the technical factors for VDI success as long as we pay attention to end user as well as administrator experience. And once we have all the data we need, VDI is mostly a math problem.

Making data-driven fact-based decisions is critical to success. Do not make assumptions if you can avoid doing so. Sizing guidelines outlined in whitepapers, even from the most reputable vendors, are still assumptions if you adopt them without data.

You should always perform an assessment of the current state environment. When we assess the current state from a storage perspective, we are generally looking for at least a few metrics, categorized by a user persona or use case.

  • I/O Requirements (I/O per Second or IOPS)
  • I/O Patterns (Block Size and Read-to-Write Ratio)
  • Throughput
  • Storage Latency
  • Capacity Requirements (GB)
  • Application Usage Profiles

Ideally, this assessment phase involves a large statistical set and runs over a complete business cycle (we recommend at least 30 days). This is important to develop meaningful average and peak numbers.

Design for Success

There’s much more to this than just storage choices and these steps will depend upon your choice of hypervisor and virtual desktop management software, but as I put our fearless VDI implementers up a pretty big tree earlier with the IOPS and latency discussion, let’s resolve some of that.

Given the metrics we’ve gathered above, we can begin to plan our storage environment. As I pointed out above this is not as simple as multiplying the number of users times the average I/O. We also cannot size based only on averages – we need at least 30% headroom.

Of course, while we calculated the number of disks we’d need to service the backend IOPS requirements in RAID5 above, we’d look at improved storage capabilities and approaches to reduce the impact of this random write workload.

Solid State Disks

Obviously, Solid State Disks offer over 10 times the IOPS per disk than spinning disks, at greatly reduced access times due to the fact that there are no moving parts. If we took the 400 desktop calculation above and used a 5000 IOPS SSD drive as the basis for our array we’d need very few to service the IOPS.

Promising. But there are both cost and reliability concerns here. The cost per GB on SSDs is much higher and write endurance on an SSD drive is finite. (There have been many discussions of MLC, eMLC, and SLC write endurance, so we won’t cover that here).

Auto-Tiering and Caching

Caching technologies can certainly provide many benefits, including reducing the number of spindles needed to service the IOPS requirements and latency reduction.

With read caching, certain “hot” blocks get loaded into an in-memory cache or more recently, an flash-based tier. When the data is requested, instead of having to seek the data on spindles, which can incur tens of milliseconds of latency, the data is available in memory or on a faster tier of storage. So long as the cache is intelligent enough to cache the right blocks, there can be a large benefit for the read portion of the workload. Read caching is a no-brainer. Most storage vendors have options here and VMware offers a host-based Read Cache.

But VDI workloads are more write intensive. This is where write buffering comes in.

Most storage vendors have write buffers serviced by DRAM or NVRAM. Basically, the storage system acknowledges the write before the write is sent to disk. If the buffer fills up, though, latency increases as the cache attempts to flush data out to the relatively slow spinning disk.

Enter the current champion in this space, EMC’s FAST Cache, which alleviates some concerns around both read I/O and write I/O.  In this model Enterprise Flash is used to extend a DRAM Cache, so if the spindles are too busy to deal with all the I/O, the extended cache is used. Benefits to us: more content in the read cache and more writes in the buffer waiting to be coalesced and sent to disk. Of course, it’s rather more complex than that, but you get the idea.

EMC FAST Cache is ideal in applications in which there is a lot of small block random I/O – like VDI environments – and where there’s a high degree of access to the same data. Without FAST Cache, the benefit of the DRAM Cache alone is about 20%. So 4 out of every 5 I/Os has to be serviced by a slow spinning disk. With FAST Cache enabled, it’s possible to reduce the impact of read and write I/O by as much as 90%. That case would be if the FAST Cache is dedicated to VDI and all of the workloads are the largely the same. Don’t assume that this means you can leverage your existing mixed workload array without significant planning.

Ok, so if we’re using an EMC VNX2 with FAST Cache and this is dedicated only to VDI, we hope to obtain a 90% reduction of back-end write IO. Call me conservative, but I think we’ll dial that back a bit for planning purposes and then test it during our pilot phase to see where we land. We calculated 12,800 in backend write IO earlier for 400 desktops. Let’s say we can halve that. We’re now at 7200 total IOPS for 400 VDI desktops. Not bad.

Hybrid and All-Flash Arrays

IDS has been closely monitoring the hybrid-flash and all-flash array space and has selected solutions from established enterprise vendors like EMC and NetApp as well as best-of-breed newer players like Nimble Storage and Pure Storage.

The truly interesting designs recognize that SSDs should not be used as if they are traditional spinning disks. Instead these designs optimize the data layout for write. As such, even though they utilize RAID technology, they do not incur a meaningful write penalty, meaning that it’s generally pretty simple to size the array based on front-end IOPS. This also reduces some of the concern about write endurance on the SSDs. When combined with techniques which both coalesce writes and compress and de-duplicate data in-line, these options can be attractive on a cost-per-workload basis even though the cost of Flash remains high.

Using a dedicated hybrid or flash-based array would get us to something like a single shelf needed for 400 users. At this point, we’re more sizing for capacity than I/O and latency, a situation that’s more familiar to most datacenter virtualization specialists. But we’re still talking about an approach with a dedicated array at scale.

Host-Based Approaches

A variety of other approaches to solving this problem have spring up, including the use of host-based SSDs to offload portions of the IO, expensive Flash memory cards providing hundreds of thousands of I/O’s per card, and software approaches such as Atlantis Computing’s ILIO virtual appliances which leverage relatively inexpensive system RAM as a low-latency de-duped data store and functionally reduce VDI’s impact on existing storage.  (Note: IDS is currently testing the Atlantis Computing solution in our Integration Lab).

Design Conclusion

Using a combination of technology approaches, it is now possible to provide VDI user experience that exceeds current user expectations at a cost per workload less than the acquisition cost of a standard laptop. The server-hosted VDI approach has many benefits in terms of operational expense reduction as well as data security.

Delivering Results

In this article, we’ve covered one design dimension that influences the success of VDI projects, but there’s much more to this than IOPS and latency. A disciplined engineering and delivery methodology is the only way to deliver results reliably for your VDI project. At minimum, IDS recommends testing your VDI environment at scale using tools such as LoginVSI or View Planner as well as piloting your solution with end user champions.

Whether you’re just getting started with your VDI initiative, or you’ve tried and failed before, IDS can help you achieve the outcomes you want to see. Using our vendor-agnostic approach and disciplined methodology, we will help you reduce cost, avoid business risk, and achieve results.

We look forward to helping you.

 

Photo credit: linademartinez via Flickr

float(1)