Category

Analytics

Baseball glove and ball

Breaking the Curse with Data Analytics

By | Analytics, Data Center, IDS | No Comments

Growing up a diehard Red Sox fan through the 70’s, 80’s and 90’s was tough. The Yankees always had our number, handing the Red Sox some spectacular playoff losses, including the Bucky Dent homer in 1978 and the Aaron Boone homerun off of Tim Wakefield in 2003. The old adage in New England was that “the Red Sox and lawn furniture would fold up and end up in the cellar after Labor Day.” Read More

Tuning Citrix XenApp and XenDesktop

Performance Tuning Citrix XenApp and XenDesktop 7.6 Part 2: HDX Insight Monitor and Policy Tuning

By | Analytics, Virtualization | No Comments

For more information on this topic, please see part one of this series, Performance Tuning Citrix XenApp and XenDesktop 7.6 Part 1: Citrix CloudBridge.” We will be doing a follow-up series for VMware Performance Tuning and WAM Optimization options with VMWare Horizon View 6 in the future as well.

About the HDX Insight Monitor Tool

Citrix XenApp and XenDesktop 7.6 Director integrates the HDX Insight Monitor Tool, which gives you the ability to monitor and tune the following: Read More

The Array of Things Plans to Revolutionize Data in Chicago

Array of Things Plans to Revolutionize Data in Chicago

By | Analytics, Security, Storage | No Comments

In a quickly evolving technology industry, we were especially excited to hear about a fascinating project coming right to our own backyard. The Array of Things project recently announced plans to revolutionize how citizens live in, and interact with their cities. In partnership with the City of Chicago, the Array of Things team plans to install a network of interactive, modular sensor boxes that will collect real-time data throughout Chicago.

What Exactly Are Interactive, Modular Sensor Boxes and What Can They Do?

The Array of Things team explains that the first versions of these innovative boxes will be dedicated to collecting information on environmental factors like atmosphere and air quality. The sensors will also have the capability to collect and store information surrounding human activity, but only at a very general level. The sensors will collect data on noise level, surface temperature of sidewalks and roads and will also be able to detect the number of wireless networks in a given area. Without actually recording any personal information, the sensors will be able to extrapolate human traffic statistics.

How Does the Array of Things Team Hope to Use the Data?

Members from the Array of Things team explain that while in the beginning, data collection may be more rudimentary, they believe the sensors will continue to get more complex, allowing everyone accessing the data to use it in a more exiting way.

They state the following potential uses for data collected by the sensors:

  • Healthy walking route suggestions. Researchers could use air quality, sound and vibration data so suggest the healthiest and unhealthiest walking times and routes in the city.
  • Targeted winter salt application. The city may choose to use sidewalk and street temperature data to save money and prevent environmental damage by planning targeted salt application based on traffic.
  • Block-by-block weather reports. Weather experts could use atmosphere data to provide real-time, micro-climate weather reports by neighborhood, or even by block.
  • Safe and efficient route suggestions. Data surrounding human activity might be used to find the safest and most efficient routes in the city during different times of the day.
  • Improved traffic light timing. The city could use vibration data and data surrounding human activity to improve traffic light timing and efficiency.

Not only is the Array of Things team predicting pretty unbelievable uses for projected data, they also believe that everyone should profit from the experiment. The data collected by the Array of Things project will be available to everyone including residents, software developers, scientists, policymakers and researchers to optimize the usage output. Data is expected to be published and updated multiple times per minute.

Wait, What About Personal Security?

The data will be available to everyone, meaning security will be an extremely high priority. Due to the nature of the project, the sensors are designed only to collect general information and will not be capable of extracting personal information from people or devices. The entire project including the software and hardware will be heavily regulated and reviewed regularly to make sure standards are met and kept.

The first 50 sensors are planned for installation during the Winter of 2014-2015, with an additional eight nodes planned for Spring 2015. Potential funding opportunities mean there could be at least 500 additional sensors installed between 2015 to 2017.

Learn more about the Array of Things project.

Image credit to Urban Center for Computation and Data.

Make Your Analytic Environment More Effective and Efficient With Optimized Infrastructure

By | Analytics, Storage | No Comments

Analytics and Big Data are at a crossroads: effectively leveraging Hadoop map-reduce on larger and larger datasets is getting expensive to store and protect.

The traditional deployment model consists of name nodes and data node services running on the same hardware as compute layer services (job scheduling and execution). Hadoop File System (HDFS) data is protected at the server level and drive level by replication to another node through the protocol. The number of copies is a tunable parameter; however, best practices recommends 3 copies.

As systems scale to petabytes and beyond, the storage requirements to sustain 3 copies becomes astronomical.

Another key feature of HDFS is that the objects that are stored are generally larger blocks. These blocks are written sequentially and are never updated or overwritten. Essentially, they are WORM (Write Once, Read Many) objects until their relevance expires and they are deleted.

One of the tenants of traditional HDFS is that “Moving computation is cheaper than moving data.” As a storage guy, I would like to rewrite this tenant as “Let’s leverage computation for computation and optimize Data infrastructure to best serve the application’s data requirements.” I know, it’s a bit wordy, but it makes the point. There are a number of technologies that added the HDFS protocols.

Isilon

Isilon added HDFS support to its OneFS code base with the 7.0 release. An Isilon cluster can scale from 3 nodes up to 144 nodes. These nodes can be one of 3 tiers:

  1. SAS (Serial Attach SCSI) and SSD (Solid State Drive) based S Nodes for extreme performance
  2. SATA (Serial ATA drive interface) and SSD based X Nodes for typical medium performance workloads
  3. SATA based NL Nodes for Archive level performance

An Islion cluster can be a combination of these nodes, allowing for tiering of data based on access. The advantage of using Isilon for HDFS is Isilon provides the data protection, so the HDFS copy parameter can be set to a single object.

This reduces nearly in half the amount of storage required to support a Hadoop cluster, while improving the reliability and simplifying the environment.

Cleversafe

In very large environments or in environments that require geo-dispersal, Cleversafe can be leveraged to provide storage via the HDFS protocol. Like Isilon, Cleversafe leverages erasure coding techniques to distribute the data across the nodes in its cluster architecture. Cleversafe, however, scales much larger and can be geo-dispersed as it’s cluster interconnect leverages TCP/IP over Ethernet as opposed to Infiniband.

IDS has integrated both the Isilon and Cleversafe technologies in our cloud and has the capacity to support customer analytics environments on this infrastructure.

Our customers can efficiently stand up a Hadoop Ecosystem and produce valuable insights without having to purchase and manage a significant investment in infrastructure.

SMR from Seagate

On a completely separate, but surpisingly related thread: one of the major developments in rotational hard drive technology in the last year have been focused on archival storage. Seagate announced Shingled Magnetic Recording (SMR) with densities up to 1.25TB per platter. SMR drives overlap groups of tracks, leaving valid read tracks inside the boundaries of wider write tracks. SMR drives can store much more data this way, but data re-writes are much slower with SMR than existing perpendicular magnetic recording (PMR) drive technology drives. This is because when a block is updated the entire group of tracks has to be overwritten—much like solid-state page writes. While the only known customers of SMR drives to date are Seagate’s subsidiary E-Vault, this technology would seem to line up well with HDFS workloads.

Photo credit: _cheryl via Flickr

float(1)