Design & Architecture

Five signs it's time to invest in your data center

Five Signs It’s Time to Invest in Your Data Center

By | Data Center, Design & Architecture, How To | No Comments

In an industry where technology development and advancement moves incredibly fast, even top CIOs may feel like it’s impossible to keep up. While they may feel like their organization is falling behind, how do they determine whether they really are? What counts as “up to date” in our constantly evolving IT landscape, and is that even good enough? It’s easy to let Data Centers get out of control, and unfortunately it’s a risky business to do so. We’ve compiled the five top signs that your Data Center needs some investment to help cut through the confusion. See one or a few things that sound eerily familiar on this list? It may be time for a Data Center upgrade.

Five Signs Your Data Center Needs an Upgrade

  1. Your data center feels like a desert. If you’re carrying around a personal fan while walking through your Data Center, you’re definitely losing the Data Center cooling battle. Some recommend looking at a Computational Fluid Dynamics (CFD) analysis to assist in cooling system arrangements, and hot and cold aisle containment. If your Data Center continuously suffers from heat stroke, it’s probably not operating to the highest capacity possible.
  2. You skipped spring-cleaning the last 10 years. While it’s easy to let gear pile up, it’s vital to complete some fundamental analysis when it comes to the hardware in your Data Center. Equipment that no longer adds value, or is simply not being used should be thrown away or donated to a non-profit organization like a school for example. Discarding old equipment can have countless benefits including increased power capacity and clearing valuable space.
  3. Your server lifecycle was up three cycles ago. There are multiple reasons why a server lifecycle may come to a close. Because server lifecycles can vary greatly, deciphering that usable life can be confusing based on legacy applications and operating systems. We follow a general rule of thumb that if the server can no longer meet your required needs after 3 years, replacement or an alternative solution will likely make sense over simple upgrades. Replacing old servers, or incorporating innovative technologies like virtualization, cloud-based services, and converged infrastructure can help consolidate and optimize the Data Center. In turn, consolidating the Data Center can reduce cabling, management, heating, cooling and ongoing maintenance costs.
  4. Your cabling looks like a rat’s nest. Cabling can easily consume a Data Center if it’s not managed properly. If you’re not labeling, tying down and properly organizing your Data Center cabling, you need a serious revamp of this vital part of the Data Center. This type of disorganization can even lead to human error that can cause downtime to business-critical applications. If a wrongly placed elbow could take your retail business offline for multiple days, it’s time to rethink your cabling strategy. In addition to organization, converged technologies can greatly decrease the cabling in your Data Center.
  5. People are walking around your data center and you don’t know who they are. If you’re finding strangers meandering through your Data Center, it’s probably time to consider the physical security and current measures in place to protect your valuable applications and data. While you may not need a full-time guard dog, your organization may consider implementing key card access, security cameras and a sign in, sign out process with regular audits. Keep in mind, the biggest threat can often come from within your organization, so checks and balances are critical. Moving your infrastructure services to the Cloud or colocation facilities can allow you to leverage enterprise-class security and controls without massive capital investment upfront.

Even with the tips above, determining when and how to update your Data Center can be a difficult decision. It’s often a good idea to bring in a third party for a Data Center assessment consultation to make sure you’re receiving unbiased feedback. Taking the time to properly assess your current Data Center infrastructure and plan an integrative upgrade will help deter hasty decisions, and ultimately save critical capital.

Creating a Technology Roadmap for your Organization

The Concepts Behind a Technology Roadmap

By | Design & Architecture, Project Management, Strategy | No Comments

Information Technology is critical to an organization’s success in modern times. Yet, too often we tend to get comfortable with what we have inherited or have in place today. Yes, IT is concerned with growth, and costs and will “shrink” on demand, but does anyone know where IT is headed? The costs question is directly related to at least two other questions that not every IT department asks:

  • Where do we [IT] want to be in 3 years? In 5 years? In 10 years?
  • If we continue to do what we do today, will we get there?

The good news is, IT is getting wiser. IT knows the importance of strategy. Strategic thinking has made its way from textbooks to the real world. Today, IT leaders work with businesses to provide direction that translates to a strategy, which leads to a plan. A technology roadmap is a special kind of plan. Here is the Wikipedia definition:

“A technology roadmap is a plan that matches short-term and long-term goals with specific technology solutions to help meet those goals. It is a plan that applies to a new product or process, or to an emerging technology.”

Some will differentiate between product roadmaps and technology roadmaps. For our purposes, we will stick to implementation of IT technologies in a typical organization.

What Drives a Technology Roadmap?

Technology roadmaps are only plausible when the goals are clear. The roadmaps are only as valuable as the areas of interest in the business. So, from an IT perspective, the rubber meets the road when we know:

  • How business applications are preparing for tomorrow’s challenges?
  • Which infrastructure changes will maximize the value to the business?

This gives a roadmap its purpose.

From there on, it is a matter of gaining a good understanding of “what is” and “what can be.” In each focus area, the IT team must evaluate the technology trends and uncertainties, relating that back to the current state and skills in the organization.

  • Do we know what else is out there?
  • Do we have what it takes to get there?

This gives a roadmap its framework.

What Can Happen Without a Technology Roadmap?

Without a technology roadmap, organizations carry unaddressed costs and risks due to outdated strategies and quick fixes that resemble patches in a quilt. Technology roadmaps bring consensus and improved planning, budgeting & coordination.

As technology evolves, so does the roadmap. An outdated technology roadmap can be almost as harmful as not having one at all. Sedentary strategies mean organizations are likely to fall victim to unplanned costs and reduced value to the business. It is, therefore, critical to setup a recurring review period where key stakeholders refresh and revise the map as business requirements continue to transform.

Stay tuned for the next part of this two-part series, when we dive into the steps needed to create your own technology roadmap.

Nearline and Enterprise SAS vs. NVMe Storage Connections

Nearline and Enterprise SAS vs. NVMe (PCI express) Storage Connections

By | Design & Architecture, Storage | No Comments

Most enterprise storage arrays today have backend enterprise SATA (aka Nearline SAS) and enterprise SAS connections running 6gbps or 12gbps where the actual disks and spindles connect to the controllers. The benefits of these enterprise-class communication protocols over standard SATA include:

  • Native command queuing.
  • Dual-redundant multipath I/O connections.
  • Plenty of throughput for each individual spindle.

You would think this is plenty of bandwidth, but now that SSDs are replacing HDDs, there is a case to be made that a newer, better technology can be used. Many individual SSDs can push 500MB/sec on their own. It’s not so much that 12gbps is a bottleneck, but the future of storage isn’t just NAND flash memory. Technologies like PCM and MRAM will easily push the boundaries of being able to move large amounts of data in and out of individual drives, even on the order of 1000x.

How Can We Improve Existing Flash Performance Outputs?

We now might agree that newer technologies are on order for the long term, but even with NAND flash in use today, there could be big improvements in performance by looking at flash differently.

For example, most SSD drives today have multiple NAND chips on the circuit board. If we read and write to these chips in a more parallel fashion, we can get even faster performance. Take existing PCI express-based NAND flash systems out there today, like Fusion-IO or OCZ’s RevoDrive. How can these devices achieve higher throughput and lower latency than a 12gbps SAS connection? For starters, they use the PCI express bus, which removes some controller latency. Taken a step further, NVMe (Non-Volatile Memory Express) is a new specification that can out perform AHCI and even PCIe storage connections.  See the graphic below from for the latencies of the different stacks comparing the two.

Intel SSD P3700 Series NVMe Efficiency

Image from

What Other Benefits Does NVMe Provide?

Some of the other major benefits of NVMe include:

  • Multiple thread usage.
  • Parallel access.
  • Increase in queue depth.
  • Efficiency improvements.

Let’s look at queue depth specifically. AHCI can do 1 queue and 32 commands per queue. NVMe on the other hand can do 64,000 queues with 64,000 commands per queue. Since many SSD drives don’t perform well until there’s a big demand and high queue depth, getting the most performance out of an SSD means hitting it with multiple requests. A 20,000 IOPS drive can often do 80,000-90,000 IOPS with the right queue depth, and newer NAND controller CPUs have more than double the number of channels compared to SATA-based SSD (18 instead of 8), as well as more DDR3 RAM used for cache (1TB instead of 128 or 256GB). So we are starting to see miniature storage array performance in a single SSD “spindle,” which results in capabilities with exceedingly higher performance levels.

One more thing, Intel has a special way to convert a PCIe-based SSD into a standard 2.5” form-factor with the use of the SFF-8639 connector. This connector is what we will start to see in enterprise systems. Wouldn’t it be nice if this connector could use both SATA/SAS or PCIe in the same cable?

How Does NVMe Perform in Independent Tests?

In independent tests, these NVMe-based storage drives are able to hit 200,000-750,000 IOPS using 4KB random reads with queue depths of 128-256. The 4KB random write numbers are lower, from 80,000 – 220,000 at similar queue depths. Sequential read and write performance of many of these drives can easily exceed 2GB/sec, peaking near 3GB/sec for the largest transfer sizes. Average response time peaks at 935 µs, whereas peak latency has a much larger range from 3ms up to 98ms depending on the model, brand and queue depth.

Those super-high latency numbers are proof that IOPS only matter in relation to latency, and it makes sense to choose an SSD drive that offers performance consistency if the application requires it (such as the Micron P320h – 700GB).

What Does NVMe Mean for the Future?

These are strong numbers from a single SSD drive, but the point of all this analysis is two-fold. On the one hand, NVMe is a technology that will lift a potential barrier as NL-SAS and SAS connections eventually become a bottleneck with the release of newer flash-based technologies. On the other hand, much like storage systems of the past decade they are being replaced by newer flash-based systems built from the ground up. We have the opportunity to see a new way of reading and writing to flash that yields even greater performance levels with more parallelism and concurrency, and since we are seeing existing PCI-based SSDs already pushing the limits of SAS, NVMe has a promising future as storage becomes faster and faster.

Reconsidering 2-Node Architecture: Design Your Data Center With “3s and 5s”

By | Design & Architecture | No Comments

In the enterprise IT world, most people would agree that all your eggs in one basket is not ideal. However, if you scale too wide, you have to manage all that overhead. That’s why a brilliant and experienced enterprise architect once told me that he prefers “3s and 5s.”

If you have anything in your data center that has redundancy, consider a rethink of anything active/standby or active/active and only having two nodes. For one thing, if you have only two systems, then you should monitor to ensure you don’t go above 50% utilization. If you have active/standby or active/passive, then each node cannot really exceed 100% utilization. In either case, you’re not getting efficiency out of your systems.

Design Comparison Matrix

Consider the following matrix comparing number of nodes versus efficiency, resiliency, and performance:

Screenshot 2014-08-12 06.38.41

But why 3s and 5s and not 4s and 6s? “Because it forces a recode of the software and moves to a structure that is a truly cloud-like design instead of old-school, fault-tolerant or highly available designs,” says David Bolthouse, enterprise architect.

Look how inefficient and rigid 2-node systems look with this comparison. You cannot burst all that much when the business needs to burst, perhaps due to acquisitions, perhaps because there’s a sudden demand workload, or headroom for peak usage times.

I’m suggesting that there is a sweet spot somewhere between 3 and 8-node systems.

If you have a blank slate on new designs for servers, networking, and storage, then consider looking at multi-node architectures. This concept can also be applied to DR and Cloud, as just having two sites—one production and one failover site—is not the future of cloud-based data centers.

If you have a blank slate on new designs for servers, networking, and storage, then consider looking at multi-node architectures.

The whole point of cloud is to distribute workloads and data across systems to achieve a higher level of overall efficiency, resiliency, and performance. Notice I didn’t say cost. We’re not quite there yet, but that’s coming. Distributing your risk across nodes and locations will eventually drive down costs, not just from the additional efficiencies gained, but also because you will be able to afford to invest in more value-driven products.

Take Cleversafe, which replicates data across multiple nodes and multiple locations. The low cost of this object-based storage allows for all those copies, while still keeping costs under control. Instead of thinking about your ability to recover from a failure, you will be thinking about how many failures you can sustain without much business impact. If the applications are written correctly, there may be very little business interruption after all.

Photo credit: Thomas Lieser via Flickr