Tiering storage primer: Data classification, archiving key

Podcast

Tiering storage primer: Data classification, archiving key

Tiering storage can be a challenge. Data must be classified into the right storage tiers, and it’s not always clear what data should go where. IT managers must also comb through several different types of storage tiering, such as automated storage tiering, dynamic storage tiering and sub-LUN tiering. Although tools (such as data migration tools) exist to help users tier their storage, they can often be complicated to use.

In this podcast interview, Ashish Nadkarni, senior analyst and consultant at Taneja Group, discusses the strategies and tools that can help users with their data storage tiering. Find out what data classification challenges you can run into and how to avoid them, the role of information lifecycle management (ILM) in storage tiering, as well as how data migration tools and dynamic storage tiering products can help tier storage. Read Nadkarni's answers below or download the MP3.

Play now:
Download for later:

Tiering storage primer

  • Internet Explorer: Right Click > Save Target As
  • Firefox: Right Click > Save Link As

 

SearchStorage.com: What are the top two steps you should follow when classifying your data? How much of that classification is a manual process?

Nadkarni: Data classification is a tricky proposition, and it seems like there’s no real answer to how people classify or should classify their data. The traditional way has been a bottom-up approach, which is basically to start by defining your storage tiers -- assigning a cost value to it, a business value to it, and subsequently applying those tiers to your applications and the data that's part of those applications.

The new way to do it is to start with data classification right at the application layer. So you're doing the classification within the application, whether they're databases or any other types of applications, and then you work your way down, [meaning you should start] from the applications and [work] your way down the storage tiers. Matching the application types to storage tiers is the manual component, but the movement of data from one tier to another from within the application itself can be automated by deploying policies. So the two steps are trying to look at what applications you can make use of in what tiers and, secondly, what kind of policies you can apply from within the applications or the file system layer to move this data from one tier to another.

Automated tiering or dynamic tiering does help with the situation, but it’s not really a substitute for manual data classification.

SearchStorage.com: What are some of the most common challenges people run into when classifying their data for storage tiering?

Nadkarni: Matching classifications to storage tiers is often a big challenge because for one thing, many times when people talk about storage tiering, the attributes of storage tiering, whether you take performance, capacity or provisioning methods and such . . . don’t often match the business value of the applications themselves [or] the data they store. For example, you may have an application that’s not really mission critical and you may want to keep it on tier 3 or tier 4 storage. But that lower cost storage tier may not perform as well as the higher cost storage tiers, and just because this application isn't mission critical doesn’t mean it's low performance; it could have very high-performance requirements. So that’s a mismatch.

You can deploy solutions that provide this burst capacity in terms of performance. Dynamic tiering or solid-state drive [SSD]-based caching mechanisms . . . allow you to temporarily assign a higher cost tier to an application as it needs it and then move it down as necessary. The most common challenge is just the fact that your applications need to be promoted and demoted as needed.

Another challenge that people run into is 'Who do you ask?' Every business is going to want to have their applications sitting on the most expensive tier or the most high-performance tier. So [you must determine] who makes that decision and what kind of rationale is used to put these applications under their appropriate buckets.

SearchStorage.com: What role does ILM play in storage tiering?

Nadkarni: ILM, or information lifecycle management, is a phrase that was coined a few years ago, and it seems to be slowly on its way out and is being replaced with other phrases. That’s just the nature of the industry. But the sense of ILM has been captured in solutions that are coming out today -- solutions like automated storage tiering, sub-LUN tiering, dynamic tiering -- that automatically move data from one tier to another based on their access profile, value, etc. The key element of ILM was really the information management part and, unfortunately, I think more and more solutions we see today operate at a level that's far removed from the information intelligence, and information management does require content-aware solutions. I think the industry is still in the process of matching it to true storage tiering. So we’re not there yet, but if you take some of the high-level solutions that are out today, the idea is that you're able to move your data between various tiers based on the value of the information.

SearchStorage.com: What are some examples of data migration tools that can help with tiering storage?

Nadkarni: Data migration is a beast by itself. Most of the tools that I use for data migrations fall into three categories. There are ones that operate on the block I/O level, so you’ll see storage-based tools and such. The second type operates on the file system level or sub-file system level; those are either host or appliance based. And then there are the ones that work at the application level itself, so the tool will reside deep within the application and perform any data movement there. The one thing you have to keep in mind with data migration is that [it's] a one-time activity; most of the time the tools designed for data migrations are very efficient at one-time processes, whereas with storage tiering it’s more of an ongoing process. You have to be careful that you aren’t just picking data migration tools off the shelf, which are really designed to be efficient at doing something once, and automatically implying that it’ll be good at an ongoing process.

With that said, let’s look at the tools that work at the block level and handle data in a content-agnostic manner. You use [block-level tools] in an environment that treats all data types in the same way. They examine the access patterns of the blocks in question and move them accordingly from one location to another. Those operate at a very low level and can be adapted to ongoing storage tiering processes.

The ones that work at the file system or sub-LUN level are sometimes host based or appliance based. They can be made content-aware. So you can have a tool that's highly efficient at moving Exchange or SQL data, and you can have that be a tool sitting in the environment automatically moving data in an ongoing manner. And they can handle both structured and unstructured data. So you’re talking about user data or data sitting in databases or applications. And finally you have a data migration tool or a storage tiering tool sitting deep within the application, so it's very good at moving data for that particular application itself, but it’s got practically no value when it comes to a homogeneous environment. Whereas with the file system and the sub-LUN tools and the block I/O tools, they’re more suitable for heterogeneous environments or environments that have groups of sub-levels, both physical and virtual, that can be deployed in multiple scenarios.

SearchStorage.com: How can dynamic storage tiering products help with your tiered data storage?

Nadkarni: If you take the above classification I talked about and apply it to dynamic storage tiering, the one thing that becomes clear is that in dynamic tiering it's all about the analysis. So the more efficient your analysis is, the more efficient your dynamic tiering is. If the analysis is done at the block level, the action taken is at the block level. And for better or worse, it can be efficient depending on what kind of application data is being analyzed and how the application accesses the data stored in that data set, or the blocks of data being analyzed. What I’m driving at here is that the dynamic storage tiering solution should be part of the overall storage tiering solution, but it's not an end-all-be-all replacement for storage tiering or, for that matter, smart manual tiering, classification, [or] data movement at the file system and application level. So as the situation warrants, you have to do all these things together.

Finally, what everyone should keep in mind is that data archiving should also be included in the solution and part of that is purging of information. There’s no point in keeping information that isn’t needed anymore, and whatever is needed for long-time archiving should be moved to separate tiers.

This story was previously published on SearchStorage.com.


This was first published in July 2011