Article

File-system virtualization options to help reduce unstructured data growth

Jacob Gsoedl, Contributor
What you will learn in this tip: How file-system virtualization products differ from one another, which ones are delivered through software, which ones are appliance-based and which ones might be right for your storage shop.

Companies both large and small are faced with the challenge of unstructured data growth.

Requires Free Membership to View

One solution to the problem is file virtualization, which creates an abstraction layer between file servers and clients that access those file servers.

There are several different types of file-system virtualization products. This tip provides a sampling of file-system virtualization products, and information about what different vendors have to offer.

AutoVirt File Virtualization software:

Like Microsoft DFS, AutoVirt is a software-only file virtualization product that runs on Windows servers. The AutoVirt global namespace uses the CIFS protocol to interact with file servers, clients and DNS. When a client requests a file, DNS facilitates resolution to the appropriate storage device. The global namespace acts as an intermediary between client and DNS. With the AutoVirt global namespace in place, client shortcuts refer to the namespace. The namespace is the authority on the location of networked files and provides the final storage referral with the help of DNS.

AutoVirt can be introduced nondisruptively to clients, without the need to make any changes on clients, by populating the AutoVirt namespace server with the shares of existing file stores. Although it can be done manually, a data discovery service automates discovery of existing file stores and populates the AutoVirt namespace server with metadata.

This differs from Microsoft DFS, which requires clients to be configured with the new DFS shares, rather than continuing to use existing file shares. Also contrary to Microsoft DFS, AutoVirt provides a policy engine that enables rule-based data mobility across the environment to migrate, consolidate, replicate and tier data without affecting end-user access to networked files. Currently available for CIFS, AutoVirt plans to release a version for NFS by year's end.

EMC Rainfinity file virtualization appliance:

Rainfinity is a family of file-system virtualization products that virtualize access to unstructured data, and provide data mobility and file tiering services. The Rainfinity Global Namespace Appliance provides a single mount point for clients and applications; the Rainfinity File Management Appliance delivers policy-based management to automate relocation of files to different storage tiers; and the Rainfinity File Virtualization Appliance provides nondisruptive data movement.

Unlike F5's ARX, the Rainfinity File Virtualization Appliance architecture is designed to switch between in-band and out-of-band operations as needed. The appliance is out-of-band most of the time, and data flows between client systems and back-end file stores directly. It sits outside the data path until a migration is required and then switches to in-band operation.

F5 ARX Series:

Acquired from Acopia in 2007 and rebranded as F5 ARX, the F5 ARX series is an inline file-system virtualization appliance. Usually deployed as an active-passive cluster, it's located between CIFS/NFS clients and heterogeneous CIFS/NFS file stores, presenting virtualized CIFS and NFS file systems to clients.

Unstructured data is presented in a global virtualized namespace. Built like a network switch, it's available with 2 Gbps ports (ARX500), 12 Gbps ports (ARX2000) and 12 Gbps ports, plus two 10 Gbps ports (ARX4000). With a focus on data mobility and storage tiering, F5's ARX comes with strong data mobility and automated storage tiering features. Orchestrated by a policy engine, it performs bidirectional data movements between different tiers of heterogeneous storage in real-time and transparently to users. Similar to AutoVirt, policies are based on file metadata, such as last-accessed date, creation date, file size and file type. The fact that F5 ARX is an appliance allows it to provide a performance-optimized product that's hard to match by a software-only solution. Built on a split-path architecture, it has both a data path that passes data straight through the device for tasks that don't involve policies, and a control path for anything that requires policies.

Microsoft DFS:

Microsoft DFS is a set of client and server services that allow an organization using Microsoft Windows servers to organize distributed CIFS file shares into a distributed file system. DFS provides location transparency and redundancy to improve data availability in case of failure or heavy load by allowing shares in multiple locations to be logically grouped under a single DFS root. DFS supports the replication of data between servers using File Replication Service (FRS) in server versions up to Windows Server 2003, and DFS Replication (DFSR) in Server 2003 R2, Server 2008 and later versions. Microsoft DFS supports only Windows CIFS shares and has no provision for bringing NFS or NAS shares into the DFS global namespace. Furthermore, it lacks a policy engine that would enable intelligent data movements. As part of Windows Server, it's free and a good option for companies whose file stores reside mainly on Windows servers.

More file-system virtualization options

File virtualization products are also available as open-source software. For instance, the Apache Hadoop Distributed File System (HDFS) handles distribution and redundancy of files, and enables logical files that far exceed the size of any one data storage device. HDFS is designed for commodity hardware and supports anywhere from a few nodes to thousands of nodes. Another example of an open source file system is the Gluster clustered file system for building a scalable NAS with a single global namespace.

Instead of spending a lot of money for traditional network-attached storage (NAS) systems, an open-source file system running on inexpensive hardware components seems like a good alternative. But open-source file systems are usually not a good choice for the enterprise. They require significant tuning and maintenance efforts, as well as experts intimately familiar with the intricacies of the chosen software, and they don't come with the support that traditional NAS vendors offer. Availability, reliability, performance and support come first for enterprise storage, and these attributes are difficult to achieve with open-source software. Open-source file systems are a great choice for cloud storage providers and companies that have to make money on storage, as well as for the research and educational sector, but they're usually not the product of choice in the enterprise.

This article was previously posted on SearchStorage.com.