Log Management

High Capacity Disks, Storage facility, Storage facilities, Cloud storage, Storage pool, Storage racks, Cheap storage, What is cloud, Computing storage management, Network storage, Rack mount, Storage unit, Vmware, Vmware performance monitoring, Vmware monitoring, Vmware backup, Sto-rage, Storage in

Advice from the Expert, Best Practices in Utilizing Storage Pools

By | Backup, Cisco, Data Loss Prevention, EMC, How To, Log Management, Networking, Storage, VMware | No Comments

Storage Pools for the CX4 and VNX have been around a while now, but I continue to still see a lot of people doing things that are against best practices. First, let’s start out talking about RAID Groups.

Traditionally to present storage to a host you would create a RAID Group which consisted of up to 16 disks, the most typical used RAID Groups were R1/0, R5, R6, and Hot Spare. After creating your RAID Group you would need to create a LUN on that RAID Group to present to a host.

Let’s say you have 50 600GB 15K disks that you want to create RAID Groups on, you could create (10) R5 4+1 RAID Groups. If you wanted to have (10) 1TB LUNs for your hosts you could create a 1TB LUN on each RAID Group, and then each LUN has the guaranteed performance of 5 15K disks behind it, but at the same time, each LUN has at max the performance of 5 15K disks.
[framed_box bgColor=”#F0F0F0″ textColor=”undefined” rounded=”true”] What if your LUNs require even more performance?

1. Create metaLUNs to keep it easy and effective.

2. Make (10) 102.4GB LUNs on each RAID Group, totaling (100) 102.4GB LUNs for your (10) RAID Groups.

3. Select the meta head from a RAID Group and expand it by striping it with (9) of the other LUNs from other RAID Groups.

4. For each of the other LUNs to expand you would want to select the meta head from a different RAID Group and then expand with the LUNs from the remaining RAID Groups.

5. That would then provide each LUN with the ability to have the performance of (50) 15K drives shared between them.

6. Once you have your LUNs created, you also have the option of turning FAST Cache (if configured) on or off at the LUN level.

Depending on your performance requirement, things can quickly get complicated using traditional RAID Groups.

This is where CX4 and VNX Pools come into play.
[/framed_box] EMC took the typical RAID Group types – R1/0, R5, and R6 and made it so you can use them in Storage Pools. The chart below shows the different options for the Storage Pools. The asterisks notes that the 8+1 option for R5 and the 14+2 option for R6 are only available in the VNX OE 32 release.

High Capacity Disks, Storage facility, Storage facilities, Cloud storage, Storage pool, Storage racks, Cheap storage, What is cloud, Computing storage management, Network storage, Rack mount, Storage unit, Vmware, Vmware performance monitoring, Vmware monitoring, Vmware backup, Sto-rage, Storage inNow on top of that you can have a Homogeneous Storage Pool – a Pool with only like drives, either all Flash, SAS, or NLSAS (SATA on CX4), or a Heterogeneous Storage Pool – a Storage Pool with more than one tier of storage.

If we take our example of having (50) 15K disks using R5 for RAID Groups and we apply them to pools we could just create (1) R5 4+1 Storage Pool with all (50) drives in it. This would then leave us with a Homogeneous Storage Pool, visualized below.High Capacity Disks, Storage facility, Storage facilities, Cloud storage, Storage pool, Storage racks, Cheap storage, What is cloud, Computing storage management, Network storage, Rack mount, Storage unit, Vmware, Vmware performance monitoring, Vmware monitoring, Vmware backup, Sto-rage, Storage in

The chart to the right displays what will happen underneath the Pool as it will create the same structure as the traditional RAID Groups. We would end up with a Pool that contained (10) R5 4+1 RAID Groups underneath that you wouldn’t see, you would only see the (1) Pool with the combined storage of the (50) drives. From there you would create your (10) 1TB LUNs on the pool and it will spread the LUNs across all of the RAID Groups underneath automatically. It does this by creating 1GB chunks and spreading them across the hidden RAID Groups evenly. Also you could turn FAST Cache on or off at the Storage Pool level (if configured).

On top of that, the other advantage to using a Storage Pool is the ability to create a Heterogeneous Storage Pool, which allows you to have multiple tiers where the ‘hot’ data will move up to the faster drives and the ‘cold’ data will move down to the slower drives.

Jon Blog photo 4Another thing that can be done with a Storage Pool is create thin LUNs. The only real advantage of thin LUNs is to be able to over provision the Storage Pool. For example if your Storage Pool has 10TB worth of space available, you could create 30TB worth of LUNs and your hosts would think they have 30TB available to them, when in reality you only have 10TB worth of disks.

The problem with this is when the hosts think they have more space than they really do and when the Storage Pool starts to get full, there is the potential to run out of space and have hosts crash. They may not crash but it’s safer to assume that they will crash or data will become corrupt because when a host tries to write data because it thinks it has space, but really doesn’t, something bad will happen.

In my experience, people typically want to use thin LUNs only for VMware yet will also make the Virtual Machine disk thin as well. There is no real point in doing this. Creating a thin VM on a thin LUN will grant no additional space savings, just additional overhead for performance as there is a performance hit when using thin LUNs.

High Capacity Disks, Storage facility, Storage facilities, Cloud storage, Storage pool, Storage racks, Cheap storage, What is cloud, Computing storage management, Network storage, Rack mount, Storage unit, Vmware, Vmware performance monitoring, Vmware monitoring, Vmware backup, Sto-rage, Storage inAfter the long intro to how Storage Pools work (and it was just a basic introduction, I left out quite a bit and could’ve gone over in detail) we get to the part of what to do and what not to do.

Creating Storage Pools

Choose the correct RAID Type for your tiers. At a high level – R1/0 is for high write intensive applications, R5 is high read, and R6 is typically used on large NLSAS or SATA drives and highly recommended to use on those drive types due to the long rebuild times associated with those drives.

Use the number of drives in the preferred drive count options. This isn’t always the case as there are ways to manipulate how the RAID Groups underneath are created but as a best practice use that number of drives.

Keep in mind the size of your Storage Pool. If you have FAST Cache turned on for a very large Storage Pool and not a lot of FAST Cache, it is possible the FAST Cache will be used very ineffectively and be inefficient.

If there is a disaster, the larger your Storage Pool the more data you can lose. For example, if one of the RAID Groups underneath having a dual drive fault if R5, a triple drive fault in R6, or the right (2) disks in R1/0.

Expanding Storage Pools

Use the number of drives in the preferred drive count options. If it is on a CX4 or a VNX that is pre VNX OE 32, the best practice is to expand by the same number of drives in the tier that you are expanding as the data will not relocate within the same tier. If it is a VNX on at least OE 32, you don’t need to double the size of the pool as the Storage Pool has the ability to relocate data within the same tier of storage, not just up and down tiers.

Be sure to use the same drive speed and size for the tier you are expanding. For example, if you have a Storage Pool with 15K 600GB SAS drives, you don’t want to expand it with 10K 600GB SAS drives as they will be in the same tier and you won’t get consistent performance across that specific tier. This would go for creating Storage Pools as well.

Graphics by EMC

blog ja

Letting Cache Acceleration Cards Do The Heavy Lifiting

By | EMC, How To, Log Management, Networking, Storage, VMware | No Comments

Up until now there has not been a great deal of intelligence around SSD Cache cards and flash arrays because they have primarily been configrued as DAS (Direct Attach Storage). By moving read intensive workload up to the server off of a storage array, both individual application performance as well as overall storage performance can be enhanced. There are great benefits to using SSD Cache cards in new ways yet before exploring new capabilities it is important to remember the history of the products.

The biggest problem with hard drives either local or SAN based is that they have not been able to keep up with Moore’s Law of Transistor Density. In 1965 Gordon Moore, a co-founder of Intel, made the observation that the number of components in integrated circuits doubled every year – he later (in 1975) adjusted that prediction to doubling every two years. So, system processors (CPUs), memory (DRAM), system busses, and hard drive capacity have been doubling in speed every two years, but hard drives performance has stagnated because of mechanical limitations. (mostly heat, stability, and signaling reliability from increasing spindle speeds) This effectively limits individual hard drives to 180 IOPs or 45MB/sec under typical random workloads depending on block sizes.

The next challenge is that in an effort to consolidate storage, increase the number of spindles, availability and efficiency we have pulled the storage out of our servers and placed that data on SAN arrays. There is tremendous benefit to this, however doing this introduces new considerations. The network bandwidth is 1/10th of the system bus interconnect (8Gb FC = 1GB/sec vs PCIe 3.0 x16 = 16GB/sec). An array may have 8 or 16 front-end connections yielding and aggregate of 8-16GB/sec where a single PCIe slot has the same amount of bandwidth. The difference is the array and multiple servers share its resources and each can potentially impact the other.

Cache acceleration cards address both the mechanical limitations of hard drives and the shared-resource conflict of storage networks for a specific subset of data. These cards utilize NAND flash (either SLC or MLC, but more on that later) memory packaged on a PCIe card with an interface controller to provide high bandwidth and throughput for read intensive workloads on small datasets of ephemeral data.

[framed_box bgColor=”#F0F0F0″ textColor=”undefined” rounded=”true”] I realize there was a lot of qualification statements there so lets break it down…

  • Why read intensive? As compared to SLC NAND flash, MLC NAND flash has a much higher write penalty making writes more costly in terms of time and overall life expectancy of a drive/card.
  •  Why small datasets? Most Cache acceleration cards are fairly small in comparison to hard drives. The largest top out at ~3TB (typical sizes are 300-700GB) and the cost per GB is much much higher than comparable hard drive storage.
  •  Why ephemeral data and what does that mean? Ephemeral data is data that is temporary, transient, or in process. Things like page files, SQL server TEMPDB, or spool directories.
[/framed_box] Cache acceleration cards address the shared-resource conflict by pulling resource intense activities back onto the server and off of the SAN arrays. How this is accomplished is the key differentiator of the products available today.

SSD Caching , EMC, VFCache, FusionIO, VMWare, SLC, MLC NAND Flash, Gordon Moore, Intel, processors, CPU's, DRAM

FusionIO is one of the companies that has made a name for themselves early in the enterprise PCI and PCIe Flash cache acceleration market. Their solutions have been primarily DAS(Direct Attach Storage) solutions based on SLC and MLC NAND Flash. In early 2011 FusionIO released write-through caching to their SSD cards with their acquisition of ioTurbine software to accelerate VMWare guest performance. More recently – Mid-2012 – FusionIO released their ION enterprise flash array – which consists of a chassis containing several of their PCIe cards. They leverage RAID protection across these cards for availability. Available interconnects include 8Gb FC and Infiniband. EMC release VFCache in 2012 and has subsequently released two additional updates.

The EMC VFCache is a re-packaged Micron P320h or LSI WarpDrive PCIe SSD with a write-through caching driver targeted primarily at read intensive workloads. In the subsequent releases they have enhanced VMWare functionality and added the ability to run in “split-card” mode with half the card utilized for read caching and the other half as DAS. EMC’s worst kept secret is their “Project Thunder” release of the XTremIO acquisition. “Project Thunder” is an all SSD array that will support both read and write workloads similar to the FusionIO ION array.

SSD Caching solutions are an extremely powerful solution to very specific workloads. By moving read intensive workload up to the server off of a storage array, both individual application performance as well as overall storage performance can be enhanced. The key to determining whether or not these tools will help is careful analysis around reads vs writes, and the locality of reference of active data. If random write performance is required consider SLC based cards or caching arrays over MLC.


Images courtsey of “the register” and “IRRI images

two cities

A Tale of Two Security Breaches: Sony vs RSA

By | Data Loss Prevention, Log Management, RSA, Security, Strong Authentication | No Comments

In March, when RSA announced that they experienced a network attack which may have compromised their multi factor authentication systems, it was the attack heard around the world. A few weeks later consumers received notices that email marketing firm Epsilon had also suffered a major breach to the customer information from companies like Chase Bank, Walgreens and TiVo. Even more recently, Sony has been in the news due to attackers to their network which has caused them to completely shut down their Playstation network indefinitely.

These high profile cyber attacks have had varying degrees of severity and impact. From their response time, it would seem RSA was alerted quickly and then responded even faster to minimize their public exposure. While on the other hand, Sony is continuing to find new evidence that their system was breached at least two weeks before they had any idea what was going on. Again, while Sony scrambles to figure out what went wrong and where it started, their Playstation network has been offline for a little over two weeks now.

So, what’s the difference?

I see a few pretty clear messages in these two breaches. While I hate to reference it this way, as a security company, RSA “eats its own dog food”, meaning that they use everything they in turn sell. From log consolidation and data loss prevention to governance risk and compliance, they utilize every product they sell within their own data center. RSA also quickly announced the purchase of NetWitness following their attack, a tool which was instrumental in helping RSA find out what was breached as well as it’s severity.

Why wasn’t RSA able to stop this attack?

My answer is this: the primary fault concerning this particular breach lies in social engineering. Employees clicked on a link they shouldn’t have (and should have known better), and, unfortunately, security comes down to the one variable no computer program can control: human error. Basically, we are the weakest link. Despite being hit by one of most advanced threats the security community has ever witnessed, RSA was able to react to the attack as it was taking place. Generally, with attacks of this nature, it takes companies weeks or even months before they realize that an attack has occurred or is occurring.

Looking at Sony, we see a different story. More than two weeks have passed following discovery of their attack, and they are still uncovering information. They’ve engaged multiple security firms to help them sift through the data, but what is most astonishing is that they have only now realized that they need to hire a Chief Information Security Officer to be in charge of their security infrastructure (That’s right folks, Sony had no one focused on managing and mitigating their information risk). The last thing you want be called in the security industry is reactive.

Now Sony is in the process of implementing defense in depth and doing things they should have been doing all along, like implementing more firewalls, intrusion prevention systems and automated patch management. Sony is big enough that most customers and the general public will have forgotten about this in 6-8 months and they’ll be one of the most secure companies in the world.

But, what about your company?

How much exposure and risk can your company endure? When your customers read the paper or hear on the news that YOU lost their information, will they continue to be loyal? Will they forget in 6-8 months and continue to do business with you? Do you want to take the chance of finding out? If none of the above scenarios appeal to you, then take note of Sony’s reactive response and implement security practices BEFORE a breach—not in reaction to one.

magnifying snail

Top 7 Log Management Pitfalls to Avoid

By | Log Management, Security | No Comments

As a security practitioner, log analysis is probably one of the most important parts of my job. Just as doctors check the history of their patients before they check their current vitals such as temperature and blood pressure, security practitioners review and interpret log history along with what is currently being reported. Armed with this information, we’re better able to diagnose the issues we find. Also like going to the doctor, we can’t put off log management until there is a security incident—by then, it could be too late.

I’ll admit, log management isn’t the most enjoyable job to do. However, since it is an integral component of many compliance regulations such as Payment Card Industry–Data Security Standard (PCI-DSS) and Sarbanes-Oxley (SOX), it’s important that we remain vigilant. If we don’t, we risk security breaches and loss of reputation, as well as failed compliance, which results in fines and sanctions.

When implementing a log management program, here are the top 7 pitfalls to avoid:

1. Not Logging at All (or Limited Logging)

Hopefully, the procedure for installing new devices such as a server, router, firewall and etc. on your network includes turning on logging or using more than the minimum. This could cause some angst with others in IT (especially database administrators) but we have to measure the risks vs. performance where security is concerned.

2. Logs Aren’t Reviewed

Once we having logging turned on, I’m willing to bet that there are few people that actually look at the logs on a regular basis. Many people that I’ve worked with only look at logs when they’re attempting to diagnose a problem. The reason for this is because logs are really cumbersome (boring) to read in its native format. You have to weed through all the normal events and hope that anything abnormal magically pops out at you. The person reviewing the logs must understand what’s contained in the logs as well as be able to determine when some action needs to be taken.

3. Reviewing Only a Limited Scope

If my router loses connection, how do you find out what caused it? You might check the logs and review what happened 10 minutes prior to the outage. If you don’t find anything, you might think it was a fluke. What if this port loses connection at the same time every week? How would you know? How long would it take for you to figure this out? This would take intimate knowledge of your environment and logs as well as a system to record such outages.

4. Logs Are Not Centralized

With all the different types of devices capable of creating logs, it is daunting task to log into each of devices and review the logs. Not to mention trying to mentally correlate all these devices in order to draw conclusions as data passed through various devices. With all the differences in formats between firewalls, intrusion prevention systems, routers, servers and other log devices it is very important to have logs in a format that can be easily searched and correlated in order to find that proverbial needle in a haystack. Centralized logging will help to ensure log format standardization and make it easier to correlated events among various devices.

5. Keeping Logs for Only a Short Period

Unfortunately, many security practitioners don’t figure this out until an incident happens and they discover that the problem had been going on for months. If the logs rotate after 30 days, the organization is handicapped when trying to piece together the severity of the breach. To make matters worse, regulations such as SOX and HIPAA are now requiring companies to keep logs anywhere from 6 months to 7 years.

6. Focusing on Compliance Instead of Security

Many companies decide to implement log management because of the compliance regulations they must comply with in their industry. However, simply implementing a product does not ensure that any particular compliance requirement as been met adequately. Most regulations also require some level of real-time monitoring and response. These regulations are put into place to identify and prevent breaches to sensitive data. This can’t be done by simply dropping a box on the network. Without real-time threat management companies are still leaving their data exposed.

7. Not Using an Automated Tool

In order to have an effective log management program, it is important to have an automated process that collects, normalizes, alerts and reports on logs. In these economic times, compliance mandates are growing but we don’t have the extra man power or time to implement another tedious, manual process. There are numerous tools on the market with very different focuses. There are log management tools that only act as a centralized repository without the ability to normalize the logs to make them more searchable and manageable. There are also products that may collect and normalize but don’t have the ability to alert and report on security issues or abnormalities. Generally, tools classified as Security Information and Event Management (SIEM) will fit the bill for the full automation of log collection, alerting and reporting.

To conclude, log management may be a necessary evil for security practitioners but extremely important in helping us to be successful in our jobs. When implemented and used correctly, we can not only satisfy our compliance mandates but also ensure that our corporate data is kept safe.

(Note: photo by OliBlac)