Other reasons to cluster data are increasing both the performance and capacity in the address space. Increasing performance with clustering involves distributing the data and allowing multiples of the clustered nodes to participate in the data transfer in order to increase the aggregate bandwidth. Clustering to increase the capacity in the address space increases the capacity with more storage under the clustered nodes.
There are several very different approaches to clustering, and they have different goals for the clustering implementation. The most common clustering technique, and the one that has been around the longest, is the dual-node cluster. In this implementation there are two, usually identical, nodes that are both connected to the same storage with an interconnect between them. If one node should fail -- detected by the surviving node through a heartbeat mechanism over the interconnect -- the other node would take over operation and maintain data availability. The dual nodes may be in an active-active state where both are handling I/O operations, or one may be active and the other in a standby state. Dual active is most common now. Failure of one node would reduce the performance by half in this case.
A more advanced solution is to have a multinode cluster where loss of one node is overcome with spreading the access and workload over the remaining nodes. In this case, the performance impact is a percentage based on the number of nodes in the cluster. Multinode clusters are also used extensively for increasing performance, by distributing the data and access across the multiple nodes. The complexity (and implementation variations) is in how the distribution is done and the control of the routing of access. Several approaches are used, including some distributed file systems across the nodes and intelligent Layer 2 switching for access.
An alternative for providing similar availability, address space and performance is an approach that is commonly called file virtualization, in which NAS nodes are virtualized by another element and presented as a single namespace to the clients. The virtualizer may be another type of NAS device (in a multinode configuration) or distributed software of some type. While not really a clustering solution in the definitions used by most in the industry, many of the results are the same.
NAS clustering brings some issues to the surface for customers, along with value in resiliency and performance. One of the most concerning issues is complexity. The method of implementing the clustering and other vendor implementation characteristics may require more complex administration. Some of the multinode clustering implementations may require each node to be administered individually, while others may present a single image of all nodes for administration. The dual-node cluster is the simplest for the administration requirement. In any case, the administration for a clustered NAS will be different from that of a single-node NAS.
Another obvious issue is cost. The added value that clustering offers is provided by added hardware and software and will result in added cost. The cost will include both the price of the product and the maintenance/support costs for the hardware and software.
NAS clustering has value and is increasingly deployed for customers. The initial usage of availability is being expanded for performance and capacity reasons.
Do you know…
About the author: Randy Kerns is an independent storage consultant. In the past, he served as vice president of strategy and planning for storage at Sun Microsystems Inc., and covers storage and storage management software including SAN and NAS analysis.
This was first published in November 2006