Category

Storage

CPU Cost Comparison

CPU Cost Comparison: Bigger Spend, Better Output?

By | Review, Storage | No Comments

In a previous blog post, “Spending Money To Make Money: An IT Strategy That Really Works?” I compared the cost of running 6-core, 8-core, and 12-core CPUs across the x86 enterprise, comparing costs among the versions. My point back then was that more expensive servers could actually save you money when looking at the TCO. Now that Intel is producing 14, 16, and 18-core CPUs, I wanted to go back and see where these machines fit in terms of price and performance.

An Updated CPU Cost Comparison

While these 18-core CPUs are hot-rods featuring 5.69 billion transistors, 45MB of L3 cache, DDR4 RAM support and 9.6GB/sec of QPI, they are very expensive.

Who would argue that buying the most expensive servers is a smart business choice? Actually, I will, with some caveats.

While these CPUs make the top 5 list for VMmark’s performance specs, they actually hit #1 when we factor in power costs and cooling efficiency. So let’s do a high-level ROI when factoring in hypervisor and OS costs. One caveat up front: When I say “most expensive server,” I’m actually talking about a specific CPU line. I like the most expensive Intel E5 CPUs, which are more affordable than the top E7 CPUs. The E7 is the true top-of-the-line CPU and may only be necessary for the absolute most demanding workloads. That said, the E5 tends to follow the consumer market, which arguably moves faster than true enterprise. So the E5 benefits from having newer technology faster, which is a benefit as well.

Let’s take a look at a VDI requirement that is based on 400 concurrent users of Citrix. If the requirement is 72 physical cores and 1.5TB of RAM, there are a few different ways to satisfy the requirements with differences of cost and number of servers (using hp.com “customize and buy” as of November 11th 2014 for pricing estimates, 32GB DIMMs, 2.3Ghz CPUs with redundant power, fans, rail kit, and no hard disks).

CPU Cost Comparison Analysis

Screen Shot 2014-11-20 at 11.41.23 AM

While the 6-server option is still the cheapest, if we factor in the space in the chassis, power, cooling, not to mention management overhead, it probably makes sense to purchase and install the larger servers.

The biggest benefit is longevity and density. A larger server can be repurposed later on, can scale for a different purpose such as a database server, a test/dev environment, software defined storage, whatever. These new servers will generally last longer than a 6-core server. You might even get 4-5 years out of these 18-core CPUs, but it’s unlikely that it will make sense to run 6-core CPUs.

The Bottom Line

When choosing a server, consider that spending more money on the fastest severs that are available today, should have many benefits: Reduced management overhead, reduced software costs (per-core database software aside), reduced power and cooling costs, and a smaller footprint if you are paying for rack space. And you’ll likely get more longevity out of them as well.

The Array of Things Plans to Revolutionize Data in Chicago

Array of Things Plans to Revolutionize Data in Chicago

By | Analytics, Security, Storage | No Comments

In a quickly evolving technology industry, we were especially excited to hear about a fascinating project coming right to our own backyard. The Array of Things project recently announced plans to revolutionize how citizens live in, and interact with their cities. In partnership with the City of Chicago, the Array of Things team plans to install a network of interactive, modular sensor boxes that will collect real-time data throughout Chicago.

What Exactly Are Interactive, Modular Sensor Boxes and What Can They Do?

The Array of Things team explains that the first versions of these innovative boxes will be dedicated to collecting information on environmental factors like atmosphere and air quality. The sensors will also have the capability to collect and store information surrounding human activity, but only at a very general level. The sensors will collect data on noise level, surface temperature of sidewalks and roads and will also be able to detect the number of wireless networks in a given area. Without actually recording any personal information, the sensors will be able to extrapolate human traffic statistics.

How Does the Array of Things Team Hope to Use the Data?

Members from the Array of Things team explain that while in the beginning, data collection may be more rudimentary, they believe the sensors will continue to get more complex, allowing everyone accessing the data to use it in a more exiting way.

They state the following potential uses for data collected by the sensors:

  • Healthy walking route suggestions. Researchers could use air quality, sound and vibration data so suggest the healthiest and unhealthiest walking times and routes in the city.
  • Targeted winter salt application. The city may choose to use sidewalk and street temperature data to save money and prevent environmental damage by planning targeted salt application based on traffic.
  • Block-by-block weather reports. Weather experts could use atmosphere data to provide real-time, micro-climate weather reports by neighborhood, or even by block.
  • Safe and efficient route suggestions. Data surrounding human activity might be used to find the safest and most efficient routes in the city during different times of the day.
  • Improved traffic light timing. The city could use vibration data and data surrounding human activity to improve traffic light timing and efficiency.

Not only is the Array of Things team predicting pretty unbelievable uses for projected data, they also believe that everyone should profit from the experiment. The data collected by the Array of Things project will be available to everyone including residents, software developers, scientists, policymakers and researchers to optimize the usage output. Data is expected to be published and updated multiple times per minute.

Wait, What About Personal Security?

The data will be available to everyone, meaning security will be an extremely high priority. Due to the nature of the project, the sensors are designed only to collect general information and will not be capable of extracting personal information from people or devices. The entire project including the software and hardware will be heavily regulated and reviewed regularly to make sure standards are met and kept.

The first 50 sensors are planned for installation during the Winter of 2014-2015, with an additional eight nodes planned for Spring 2015. Potential funding opportunities mean there could be at least 500 additional sensors installed between 2015 to 2017.

Learn more about the Array of Things project.

Image credit to Urban Center for Computation and Data.

Nearline and Enterprise SAS vs. NVMe Storage Connections

Nearline and Enterprise SAS vs. NVMe (PCI express) Storage Connections

By | Design & Architecture, Storage | No Comments

Most enterprise storage arrays today have backend enterprise SATA (aka Nearline SAS) and enterprise SAS connections running 6gbps or 12gbps where the actual disks and spindles connect to the controllers. The benefits of these enterprise-class communication protocols over standard SATA include:

  • Native command queuing.
  • Dual-redundant multipath I/O connections.
  • Plenty of throughput for each individual spindle.

You would think this is plenty of bandwidth, but now that SSDs are replacing HDDs, there is a case to be made that a newer, better technology can be used. Many individual SSDs can push 500MB/sec on their own. It’s not so much that 12gbps is a bottleneck, but the future of storage isn’t just NAND flash memory. Technologies like PCM and MRAM will easily push the boundaries of being able to move large amounts of data in and out of individual drives, even on the order of 1000x.

How Can We Improve Existing Flash Performance Outputs?

We now might agree that newer technologies are on order for the long term, but even with NAND flash in use today, there could be big improvements in performance by looking at flash differently.

For example, most SSD drives today have multiple NAND chips on the circuit board. If we read and write to these chips in a more parallel fashion, we can get even faster performance. Take existing PCI express-based NAND flash systems out there today, like Fusion-IO or OCZ’s RevoDrive. How can these devices achieve higher throughput and lower latency than a 12gbps SAS connection? For starters, they use the PCI express bus, which removes some controller latency. Taken a step further, NVMe (Non-Volatile Memory Express) is a new specification that can out perform AHCI and even PCIe storage connections.  See the graphic below from communities.intel.com for the latencies of the different stacks comparing the two.

Intel SSD P3700 Series NVMe Efficiency

Image from communities.intel.com.

What Other Benefits Does NVMe Provide?

Some of the other major benefits of NVMe include:

  • Multiple thread usage.
  • Parallel access.
  • Increase in queue depth.
  • Efficiency improvements.

Let’s look at queue depth specifically. AHCI can do 1 queue and 32 commands per queue. NVMe on the other hand can do 64,000 queues with 64,000 commands per queue. Since many SSD drives don’t perform well until there’s a big demand and high queue depth, getting the most performance out of an SSD means hitting it with multiple requests. A 20,000 IOPS drive can often do 80,000-90,000 IOPS with the right queue depth, and newer NAND controller CPUs have more than double the number of channels compared to SATA-based SSD (18 instead of 8), as well as more DDR3 RAM used for cache (1TB instead of 128 or 256GB). So we are starting to see miniature storage array performance in a single SSD “spindle,” which results in capabilities with exceedingly higher performance levels.

One more thing, Intel has a special way to convert a PCIe-based SSD into a standard 2.5” form-factor with the use of the SFF-8639 connector. This connector is what we will start to see in enterprise systems. Wouldn’t it be nice if this connector could use both SATA/SAS or PCIe in the same cable?

How Does NVMe Perform in Independent Tests?

In independent tests, these NVMe-based storage drives are able to hit 200,000-750,000 IOPS using 4KB random reads with queue depths of 128-256. The 4KB random write numbers are lower, from 80,000 – 220,000 at similar queue depths. Sequential read and write performance of many of these drives can easily exceed 2GB/sec, peaking near 3GB/sec for the largest transfer sizes. Average response time peaks at 935 µs, whereas peak latency has a much larger range from 3ms up to 98ms depending on the model, brand and queue depth.

Those super-high latency numbers are proof that IOPS only matter in relation to latency, and it makes sense to choose an SSD drive that offers performance consistency if the application requires it (such as the Micron P320h – 700GB).

What Does NVMe Mean for the Future?

These are strong numbers from a single SSD drive, but the point of all this analysis is two-fold. On the one hand, NVMe is a technology that will lift a potential barrier as NL-SAS and SAS connections eventually become a bottleneck with the release of newer flash-based technologies. On the other hand, much like storage systems of the past decade they are being replaced by newer flash-based systems built from the ground up. We have the opportunity to see a new way of reading and writing to flash that yields even greater performance levels with more parallelism and concurrency, and since we are seeing existing PCI-based SSDs already pushing the limits of SAS, NVMe has a promising future as storage becomes faster and faster.

Flash Is Dead: The Next Storage Revolution Is About CPUs and RAM

By | Storage | No Comments

Alright, flash is not dead, it’s thriving. People love SSDs in their phones and laptops, because it’s so much faster than traditional hard drives. They are faster because they have lower latency, which is to say that they allow the computer to “wait less.”

SSDs operate in the millisecond and tenth of a millisecond time span, whereas typical mechanical hard drives range in the 6-10 millisecond range. That’s about 10x lower latency, which equates to twice as fast in the real world. You often do not find technologies that are ten times faster than the ones before them. But imagine something one hundred or a thousand times faster than even SSDs.

No problem for super computers costing millions of dollars. Just use CPUs and RAM, because they operate in the nanosecond realm, which is 1,000 times faster than a microsecond. We’re talking about going from 10x improvement in performance, to 100x or 1000x. Imagine an entire datacenter running in RAM with no disks. Stanford University has made the case, and they are calling it RamCloud.

“But imagine something one hundred or a thousand times faster than even SSDs.”

Cost aside, let’s try to put these types of potential speed increases into perspective and solve for cost later.

A Scenario To Consider

Let’s assume a typical 3.0GHz processor today in 2014 can perform some basic calculations and transfer data inside the chip itself in 10 nanoseconds, or 30 clock cycles. Perhaps the human equivalent in speed would be someone asking you to solve a simple math problem—what’s 2+4+3+4 in your head. You quickly add up that it’s 13 and it takes you 2 seconds from start to finish.

Now suppose that the CPU has to go back to DRAM for this data, because it doesn’t have the information handy to respond immediately. Going back to DRAM can take an additional 9-13 nanoseconds, even with today’s faster DDR3-1866 RAM. DRAM still runs at 200-300Mhz as a base clock speed, even if the bus speed itself is a lot higher. So going back to DRAM can double the time it takes for a CPU to execute on a task. It would take a human 4 seconds instead of 2. But twice as slow is nothing compared to how much slower storage is compared to RAM.

Continuing along the same lines as the math problem analogy, suppose that the math problem required all data to come from fast SSD-backed storage running at 500 microseconds (half a millisecond, or 20x quicker than a mechanic hard drive). Even with this fast storage, the simple math problem would require the computer’s CPU to spend 50,000 nanoseconds to complete the answer, when in fact it could do it roughly five thousand times faster than before. In human terms, the four second calculation of adding 2+4+3+4 would take you nearly three hours to complete.

“But twice as slow is nothing compared to how much slower storage is compared to RAM.”

man doing math on chalk board

In reality, you wouldn’t need storage to make such a simple calculation, because you could afford to keep your code small enough to cache in DRAM or in the CPU hardware registers itself. But the problem becomes much more pronounced when you have to go get real data from storage, which happens all the time with systems.

Perhaps a more complex math problem would best illustrate. Advanced math problems can require reading a paragraph describing the problem, looking at the textbook for a hint, going over notes from class, and finally scribbling it down on paper. This process can take minutes per problem.

If we used the same storage analogy in computing terms, a five-minute problem that could be solved completely in your head would instead take seventeen days to complete if we had to do it via the human equivalent of storage systems, which is to say, going back to the disk storage system and sending data back and forth. And that’s with SSDs. If we had mechanical drives, it would take nearly three months to do the problem. Imagine working on something all winter and completing it just as spring starts in. It had better be worthwhile, and I would say that looking for the cure to cancer, predicting tornadoes, or developing automated cars certainly are.

Today

So how does this play out in the real world today in 2014? Well, companies like Microsoft, EMC, Nimble, PureStorage, SAP, etc. are all taking advantage of using CPUs and RAM to accelerate their storage solutions. Today’s applications and users can wait milliseconds for data, because they were built to be used with mechanical hard drives, WAN connections, and mobile phones. So the storage companies are using CPUs and RAM to take in IO, organize data, compress it, dedupe it, secure it, place it in specific locations, replicate it, and snapshot it, because CPUs have so much time on their hands and can afford the nanoseconds to do so. They are using off the shelf Intel CPUs and DRAM to do this.

But the idea of waiting milliseconds today will seem absurd in the future. This lazy approach will someday soon change as CPUs and RAM continue to get faster than SSDs. In time, SSDs are going to be much too slow in computing terms, so we are going to see further advancements on the storage front for faster storage and memory technologies.

Things like PCM (Phase Change Memory), Spin-torque tunneling MRAM, Racetrack memory or DWM (Domain-Wall Memory) are technologies in development today. CPU frequencies are not increasing, but parallelism is, so the goal will be to place more RAM and storage closer to the CPU than before, and use more threads and cores to execute on data.

“In time, SSDs are going to be much too slow in computing terms, so we are going to see further advancements on the storage front for faster storage and memory technologies.”

Tommorow

If you have to wonder why CPUs and RAM are the keys to future storage performance, the reason is simply because CPU and RAM are hundreds of times faster than even the fastest storage systems out there today. Cost can be reduced with compression and deduplication.

And I’m betting that this speed discrepancy gap will continue for a while longer, at least over the next 3-5 years. Take a look at Intel and Micron’s Xeon Phi idea using TSV’s, which should make its way to commoditization in a few years. This will augment other advances in memory and storage technologies, driving the discussion from dealing with milliseconds of storage latency to microseconds and nanoseconds in the years to come.

Photo credits via Flickr, in order of appearance: Pier-Luc Bergeron; stuartpilbrow.

InfoSight Review: How Nimble Storage Is Turning The OEM Support Model On Its Head

By | Nimble, Review, Storage | No Comments

All storage OEMs have some kind of “call home” feature. Upon a hardware or software failure, there is usually an alert sent simultaneously to the OEM and the customer. A support ticket is logged and either a part or engineer is dispatched to fulfill the SLA.

Most OEMs also collect performance statistics on weekly intervals and provide either a portal or a reporting mechanism to view historical data, see trending, etc. Customers can then correlate that data and use it as a mechanism to drive forecasting in the environment around the future needs in their IT organization.

How about an easy-to-interpret and read dashboard view vs. a raw data dump to a text file?

What if this data was available in real-time? How would that affect my organization? What if I didn’t have to rely on my internal resources to interpret that data? How about an easy-to-interpret and read dashboard view vs. a raw data dump to a text file? How much time could I return to the business? Can this really boost the productivity of my staff?

The answer is an emphatic yes.

Let me explain why all of the above is so beneficial for today’s enterprise and why it’s such a departure from what I consider the “status quo” of traditional support models found elsewhere.

Let’s face it, IT operators are expected to do more. Time is the most valuable resource. The day of the one-trick pony is drawing to a close. The trend I see with my customers is that that they are responsible for more than one platform and are also expected to complete their expanded duties in the same amount of time. This doesn’t leave a lot of time to develop deep expertise in one skill set, let alone two or three.

Nimble InfoSight: The Benefits

Nimble steps in here by providing:

  • Easy-to-read and interpret graphical dashboards that are cloud-based, using a web front end. No Java!
  • Real-time performance monitoring and reporting. A daily summary is a huge value add here as most admins are only logging in to the controllers for care and feeding tasks (i.e. provisioning storage.)
  • Predictive upgrade modeling based on real-time analytics of performance data.
  • Executive summaries, capacity planning, trending and forecasting. Did I mention that this is a web front end and not bolt-on software to the standard management interface?

The bottom line is that Nimble’s InfoSight is an all-encompassing holistic reporting and forecasting engine that is a zero-cost add on to your Nimble storage array.

Most other OEMs charge extra for software that offers the same foundational idea of “reporting” that does not equal what I’d call a Web 2.0 caliber solution. I would argue, from a business perspective, that InfoSight offers more overall value to the enterprise from the top-down than increasing the speed of xyz application.

From a business perspective, InfoSight offers more overall value to the enterprise from the top-down than increasing the speed of xyz application.

Although the application of the technology, where it fits better, or why it’s faster has its place; I find it can be a somewhat one-dimensional conversation. I believe that overall value should be abstracted through a larger lens of how the entire solution benefits your organization as a whole.

Think big. Make you workplace better, faster, stronger! InfoSight keeps you working smarter, not harder.

Photo credit: Phil Hilfiker via Flickr

VNXe 3200: The Future of the VNX?

By | EMC, Storage, VNX | No Comments

I’ve been hearing from a lot of people that the VNX will eventually be similar to the VNXe. I didn’t believe EMC would do that until they came out with the VNXe 3200, but now it is looking like it is a possibility. I’ll need to provide a quick recap of the history of the VNXe and VNX to give you an understanding of why I believe the two are converging into a single platform.

emc vnx

VNX and VNXe History

For the last few years EMC’s marketing strategy has been selling the concept of a Unified VNX. The rest of us know better—the GUI is unified, but the array really isn’t. Prior to the VNX there were the NS120/480/960: CLARiiON and Celerra models that were “unified”; however, when they were first released, the GUI wasn’t even unified. Later, you could upgrade to a higher DART and FLARE code and you would get Unisphere, which then unified the GUIs (the hardware was still separate, though).

Instead of getting a unified array, you could also buy either a block-only or file-only VNX/CX. For a block-only array, Storage Processors serve data via iSCSI/FC/FCoE. On the file side, you have Data Movers that serve data via CIFS/NFS/iSCSI (VNX iSCSI via Data Movers requires an RPQ from EMC to support it, and is also hidden from the GUI).

Why is this important to know the history? Because on all VNXe models, prior to the VNXe 3200 release, iSCSI was done via the file/Celerra side. Now why is that important? Because it was and is terrible.

Breaking It Down

Here is a breakdown of some of the challenges with previous VNXe models prior to the new release:

  1. First of all, to create an iSCSI LUN on the file side, you would need to first create your RAID Groups and LUNs, then present the LUNs to the file side. Those LUNs would be marked as disk volumes on the on the file side and put into a file storage pool. After that, you would create a file system which would stripe or concatenate volumes based on the file AVM (Automatic Volume Management) algorithm. After that, you would then create your iSCSI LUN from the file system space. Long story short: there are a lot of layers and it’s the best for performance.
  2. When replicating iSCSI LUNs via the file side, you would need an additional 150% of the LUN size free on the file system on each side, source and target. To put it in perspective, if you had a 100GB iSCSI LUN, you would need a 250GB file system size on each side—which creates a lot of overhead. (Much less overhead using thin provisioning, but that slows things down.)
  3. iSCSI LUNs are limited to 2TB in size on the file side.
  4. Your only option for replication is either host-based or Replicator V2, no RecoverPoint, MirrorView, SAN Copy, etc. as is on the block side. (You can replicate your entire VNX file with RecoverPoint, but that is a terrible configuration.)
  5. For those reasons and more, I have lacked confidence in the VNXe since the beginning and cringed when having to fix them, since it always seemed there was either a replication or network problem.

The Difference

So why is the VNXe 3200 different? Well, it is different enough that I think it should have been announced as the VNXe 2, or VNXe with MCx, or in some big way like the VNX 2/VNX with MCx was announced.

There are some major differences with the VNXe 3200 and previous models.

  1. Fibre Channel ports are now available
  2. Better use of EFDs
    • FAST Cache can be used
    • Tiered pools can be used
  3. iSCSI now appears to be block based

Note: My only evidence of 3. is that when you put an iSCSI IP address on an Ethernet adapter, you can no longer use LACP on that port. This would make sense, since there is no LACP on the block side for iSCSI, only on the file side. Also, with the addition of FC ports being available (they’ve obviously been allowed access to the block side of the VNXe 3200), so that means block iSCSI would be possible too.

vnxe chart

So if I’m right about the iSCSI, that means a few things:

  1. iSCSI replication between pre-VNXe 3200 and VNXe 3200 models won’t be compatible (I asked some EMC product managers and was given a response that they can’t comment).
  2. iSCSI LUNs should be able to be replicated between a VNX and VNXe (depending on if they put MirrorView into the VNXe and at the very least you should be able to run a SAN Copy pull session to migrate it off a VNXe onto a VNX though)
  3. iSCSI LUNs might be able to be used with RecoverPoint (depending on if the VNXe has a RP splitter, but they might allow host based splitting with a VNXe and iSCSI if no splitter is embedded)

Conclusion

It looks like EMC is taking the VNXe in the right direction, but there are still some unknowns. Until then, it seems like a decent Unified Storage Array if you need shared storage and either didn’t want to replicate your data or you were using host-based replication. I’m hoping that if EMC chooses to do this same hardware unification with the VNX line, they get everything figured out with the VNXe first—appears they’re making the steps to do so.

Adventures in cMode Migrations: Part One

By | NetApp, Storage | No Comments

On paper, a 7-Mode to the Clustered Data OnTap (“cDOT”) migration can seem fairly straightforward. In this series, I will discuss some scenarios in terms of what can be very easy vs. what can be extremely difficult. (By “difficult” I’m mostly referring to logistical and replication challenges that arise in large enterprise environments.)

The Easy First!

Tech refresh in one site:

Bob from XYZ corp is refreshing his 7-Mode system and has decided to take advantage of the seamless scalability, non-disruptive operations and the proven efficiencies of 7-Mode by moving to the cDOT platform. Hurray Bob! Your IT director is going to double your bonus this year because of the new uptime standard you’re going to deliver to the business.

Bob doesn’t use Snapmirror today because he only has one site and does NDMP dumps to his tape library via Symantec’s Replication Director. Plus 10 points to Bob. Managing snapshot backups without a catalogue can be tricky. Which Daily.0 do I pick? Yikes! Especially if he gets hit by a bus and the new admin has to restore the CEO’s latest PowerPoint file because Jenny in accounting opened up a strange email from a Nigerian prince asking for financial assistance. Bad move, Jenny! Viruses tank productivity.

Anyway …

Bob’s got a pair of shiny new FAS8040s in a switchless cluster, the pride of the NetApp’s new mid-range fleet. He’s ready to begin the journey that is cDOT. Bob’s running NFS in his VMware environment, running CIFS for his file shares and about 20 iSCSI LUNs for his SQL DBA. Bob also has 10G switching and servers from one of the big OEMs. So no converged network yet, but he’ll get there with next year’s budget with all of the money he’s going to save the business with the lack of downtime this year! Thanks cDOT.

Approach

So what’s the plan of attack? After the new system is up and running, from a high level it would look something like this.

1. Analyze the Storage environment

a. Detail volume and LUN sizes (excel spreadsheets work well for this)
b. Lay out a migration schedule
c. Consult the NetApp Interoperability Matrix to check Fiber channel switch, HBA firmware and host operating system compatibility.
d. Build the corresponding volumes on the new cDOT system
e. Install the 7-Mode migration tool on a Windows 2008 host.
f. Using the tool to move all file based volumes.

That wasn’t so hard. Actually on paper, it looks like this scenario may seem somewhat trivial but I can assure you it is this straightforward. Next time, we are going to crank up the difficulty level a bit. We will add in multiple sites, a Solaris (or insert any other esoteric block OS, HPUX anyone?) environment as well as the usual NAS-based subjects.

See you next time for Part Two.

Photo credit: thompsonrivers via Flickr

Make Your Analytic Environment More Effective and Efficient With Optimized Infrastructure

By | Analytics, Storage | No Comments

Analytics and Big Data are at a crossroads: effectively leveraging Hadoop map-reduce on larger and larger datasets is getting expensive to store and protect.

The traditional deployment model consists of name nodes and data node services running on the same hardware as compute layer services (job scheduling and execution). Hadoop File System (HDFS) data is protected at the server level and drive level by replication to another node through the protocol. The number of copies is a tunable parameter; however, best practices recommends 3 copies.

As systems scale to petabytes and beyond, the storage requirements to sustain 3 copies becomes astronomical.

Another key feature of HDFS is that the objects that are stored are generally larger blocks. These blocks are written sequentially and are never updated or overwritten. Essentially, they are WORM (Write Once, Read Many) objects until their relevance expires and they are deleted.

One of the tenants of traditional HDFS is that “Moving computation is cheaper than moving data.” As a storage guy, I would like to rewrite this tenant as “Let’s leverage computation for computation and optimize Data infrastructure to best serve the application’s data requirements.” I know, it’s a bit wordy, but it makes the point. There are a number of technologies that added the HDFS protocols.

Isilon

Isilon added HDFS support to its OneFS code base with the 7.0 release. An Isilon cluster can scale from 3 nodes up to 144 nodes. These nodes can be one of 3 tiers:

  1. SAS (Serial Attach SCSI) and SSD (Solid State Drive) based S Nodes for extreme performance
  2. SATA (Serial ATA drive interface) and SSD based X Nodes for typical medium performance workloads
  3. SATA based NL Nodes for Archive level performance

An Islion cluster can be a combination of these nodes, allowing for tiering of data based on access. The advantage of using Isilon for HDFS is Isilon provides the data protection, so the HDFS copy parameter can be set to a single object.

This reduces nearly in half the amount of storage required to support a Hadoop cluster, while improving the reliability and simplifying the environment.

Cleversafe

In very large environments or in environments that require geo-dispersal, Cleversafe can be leveraged to provide storage via the HDFS protocol. Like Isilon, Cleversafe leverages erasure coding techniques to distribute the data across the nodes in its cluster architecture. Cleversafe, however, scales much larger and can be geo-dispersed as it’s cluster interconnect leverages TCP/IP over Ethernet as opposed to Infiniband.

IDS has integrated both the Isilon and Cleversafe technologies in our cloud and has the capacity to support customer analytics environments on this infrastructure.

Our customers can efficiently stand up a Hadoop Ecosystem and produce valuable insights without having to purchase and manage a significant investment in infrastructure.

SMR from Seagate

On a completely separate, but surpisingly related thread: one of the major developments in rotational hard drive technology in the last year have been focused on archival storage. Seagate announced Shingled Magnetic Recording (SMR) with densities up to 1.25TB per platter. SMR drives overlap groups of tracks, leaving valid read tracks inside the boundaries of wider write tracks. SMR drives can store much more data this way, but data re-writes are much slower with SMR than existing perpendicular magnetic recording (PMR) drive technology drives. This is because when a block is updated the entire group of tracks has to be overwritten—much like solid-state page writes. While the only known customers of SMR drives to date are Seagate’s subsidiary E-Vault, this technology would seem to line up well with HDFS workloads.

Photo credit: _cheryl via Flickr

Comparing Cloud Storage Offerings: Critical Factors Beyond Pricing

By | Cloud Computing, Storage | No Comments

It is amazing how quickly enterprises are looking to adopt cloud technologies in general. This kind of demand didn’t exist three years ago. Certainly a growing economy is contributing to this. With so many choices out there it can get a little crazy finding the perfect cloud offering. Organizations must take a lot into consideration, particularly critical factors that go beyond pricing.

Pricing: More Complicated Than $/Gig

The first phrase I hear all the time is “How much per Gig?” But in reality there is more to it then just the cost per GB alone. Cloud providers can make the financials of placing your data in the cloud much more complicated then it should be.

Cloud providers can make the financials of placing your data in the cloud much more complicated then it should be.

When you read between the lines you realize that most providers are charging you all sorts of fees and suddenly that dollar per GB usable figure in large bold font is not the only figure you should have paid attention to. Your invoice at the end of the month will include fees like data transfer out dollar per GB … yes it’s a two-way street.

Fees, Fees and More Fees

Another fee is for the number of PUT and GET requests, as well as how many can be done simultaneously. Did you change your mind and you want to delete your data? Well there is an early delete fee. How quickly do you need to restore your data? As you can guess, some providers will charge you a restore speed fee. There is also a limit on the amount of data you can restore on a daily or monthly basis and if you go over that—you got it—additional fees apply. When things go terribly wrong and you need to get someone on the phone there is also a technical support fee.

The bigger the cloud provider, the less that special concierge experience you will receive.

You are just another caller on the system waiting in queue. They don’t know your business or how your customers are affected. They don’t have engineering design diagrams that show how your data is created, how data flows in your environment, or how it is being leveraged.

Now that we have highlighted the economics of storing and accessing your data in the cloud, let’s take a look at more critical factors like security, durability, and availability.

Beyond Cost: Critical Cloud Factors

By now many people in the IT industry are familiar with the concept of object-based dispersed storage. One of these methods uses a type of erasure code known as Cauchy Reed-Solomon. In laymen’s terms, it is a software-defined storage solution that slices up your data and spreads it across a redundant array of inexpensive nodes, enabling you to provide much higher SLAs around security, availability, and durability—and do so in a much more cost effective manner, without having to make additional copies. This technology offers flexibility and efficiency far greater then what the typical RAID schemes have provided over the last 27 years.

This technology offers flexibility and efficiency far greater then what the typical RAID schemes have provided over the last 27 years.

A Software-Defined Storage Solution for Archiving

As Cloud Storage has gained popularity and business value in the Enterprise, this has become a big topic of discussion with our customers. At IDS, we chose to adopt Cleversafe’s software-defined storage solution in our Cloud as the storage foundation for our online archive services.

This has allowed us to set ourselves apart from most other cloud providers whose storage technologies rely on RAID and replication.

It has enabled IDS to offer our customers over 13 nines (9’s) of data durability with a mean time to data loss of 53,297,037,010,000 years. The Cleversafe storage cluster is deployed across four data centers, enabling the IDS Cloud the ability to tolerate an entire data center outage—even being wiped away from the face of the planet—while maintaining read and write access.

As an additional security measure, the information that sits in the individual data center is completely meaningless by itself. This is one of many reasons why the three-letter US government agencies have adopted the Cleversafe technology.

FREE Download: Excel Price Comparison List of Leading Cloud Storage Providers

Using the Cleversafe technology, IDS has been able to differentiate from the competition by offering our customers a simple all-inclusive price per GB model, while at the same time offering an unimaginable SLA.

The next time you are shopping for a cloud provider do the diligence to check and see what is under the hood and what is between the lines.

With that I will leave you with this Dilbert strip and ask, “Who do you trust your data with?”

Photo credit: pacgov via Flickr

float(6)