EMC VNX2 Review: New Hardware and a Big Change to Software for Next Generation VNX

By February 4, 2014EMC, Storage

This review will highlight some of the new changes and features of the next generation VNX.

Overview

The new VNX series—referred to as the VNX2, VNX MCx, and the next generation VNX—comes with a major software update and refreshed hardware. I’ll refer to it as the VNX2.

All the new models (5200, 5400, 5600, 5800, 7600, and 8000) come with an Intel Sandy Bridge processor, more cores, more RAM, and optimization for multiple cores.

Below is a graph of how the different core utilization might look with the VNX and VNX2 models.

The new hardware and software allows the VNX with MCx to achieve up to 1 million IOPs.

image003

Active/Active LUNs

If you choose to use traditional RAID Groups, this new feature will potentially improve performance for those LUNs by servicing IO out both storage processors at the same time. In the end, this improvement in its current state probably won’t mean a lot to many customers, as the focus is on shifting to pools. The exciting part is that they were actually able to make traditional RAID Group LUNs active/active, so maybe we will see pool LUNs be active/active in the future.

image006

FAST, FAST Cache, and Cache

VNX FAST

It works the same as it used to, with the exception of it now works at a 256MB ‘chunk’ level instead of a 1GB ‘chunk’. This allows for greater efficiency of data placement. If for example you had a pool with 1TB of SSD and 30TB of SAS/NLSAS on a VNX you obviously have a very limited amount of SSD space and you want to make the best use of that space. The way the VNX would tier the data would be at a 1GB chunk level so if a fraction of that 1GB, say 200MB was the actual hot data, 824MB would be promoted to SSD unnecessarily. On the VNX2 using the 256MB chunk only 56MB would be promoted unnecessarily – perfect? Obviously not. Better? Yes. If we took this example and multiply it by 10 – you’d have around 8GB of unnecessarily promoted data to SSD on the VNX and only 560MB unnecessarily promoted on the VNX2.

FAST Cache

Major improvements here as well. The warm up time has been improved by changing the behavior that when the capacity of FAST Cache is less than 80% utilized, any read or write will promote the data to FAST Cache. After it is 80% full it returns to the original behavior of being read or written to 3 times prior to promotion.

Cache

Also on the topic of cache, the read and write cache of the storage processors no longer need to have their cache levels set. The cache now adjusts automatically to whatever the array thinks is best for the workload on the array. This is great news: no longer need to mess with high and low water marks or what values the read and write cache should be set at.

image009

Deduplication and Thin LUNs

Another huge change to VNX pools is the ability to do out-of-band block based deduplication.

This sounds great; however, with this comes considerations. First, it only works on thin pool LUNs. EMC’s recommendation for using thin LUNs has always been to not use them for LUNs that require a low response time and produce a lot of IOPs. With the VNX2 performance improvements, the thin LUN performance impact may be less. However, I haven’t seen a comparison between the two to be able to say whether or not it has improved with the new code. Also EMC’s recommendations for using deduplication on block LUNs is only on LUNs with less than 30% writes, non-sequential, and small random IOs (smaller than 32KB).

The other recommendation is that you test it on non-production before enabling it on production. Does that mean you make a copy of your production workload and then simulate your production workload on your new copy? I’d say so, as ideally you’d want an exact duplicate of your production environment. So would you buy enough drives to have a duplicate pool, in order to have the exact percentage of drives to be able to simulate how everything would work? Maybe. Or you could just enable it and hope it works—but that would mean you should have a very good understanding of your workload before enabling it.

However, if you do choose to use deduplication and it doesn’t work out, you can always reduplicate the data and go back to a normal thin LUN. If you want to go back to a thick LUN, you would then need to do a LUN migration to go back to a thick LUN.

Also, when using the GUI to create a LUN in a pool, ‘thin’ is checked by default now. If you’re not careful and you don’t want this, you may end up over provisioning your pool without knowing it. While thin provisioning is not a new feature, enabling it by default is new.

This is not something to take lightly. A lot of people will make LUNs until the free space runs out. With thin LUNs, your free space doesn’t run out until you actually have data on those LUNs, so you can very easily over provision your pool without knowing it. So if you have a 10TB pool, you could very quickly provision 20TB and not know it. It becomes a problem when you’ve used up that 10TB, because your hosts think they have 20TB available. Once the pool becomes full, your hosts still think they can write data even though they can’t. This usually results in the host crashing. So you would need to expand your pool prior to it filling up, which means you need to monitor it closely—but the problem is you might not know you need to, if you don’t know you’re making thin LUNs.

Hot Spares

The way the hot spares in the VNX with MCx works has changed quite a bit. There are no hot spare drives now: you simply don’t provision all the drives, and a blank drive becomes a hot spare. Next, instead of equalizing, the process of the hot spare copying back to the drive that was replaced, like the VNX does, the VNX2 does permanent sparing. When a drive fails, after 5 minutes, the data is copied from the failed/failing drive if possible, otherwise rebuilt using parity to an empty drive. The 5 minute feature is new as well, which allows you to move drives to new enclosures/slots if desired.

Since the drive doesn’t equalize, if you want to have everything contiguous or laid out in a specific manor you would need to manually move the data back to the replaced drive. This is important for example if you have any R10 RAID Groups/pools you don’t want to have it spread across Bus 0 Enclosure 0 and any other enclosures. Also the vault drives work a little different, only the user data is copied to the new drive, so upon vault drive replacement you should definitely move it back manually (if you use the vault drives for data).

image012