What is "Spectrum Scale"

or

Why would I need a "General Parallel File System"?

spectrum_scale-logo3.gif

 

Discover one of IBM's best kept secrets

 

Antony Steel,  Belisama

Spoiler alert - Spectrum Scale is not just a file system - it is a fully-fledged Data Management Platform.

Introduction

Think about the advantages of an NFS file store ...

... but without the bottlenecks and a single point of failure.

Think about having a single massive file system for ease of management ...

... but able to control access and performance on a more granular level.

Think about managing your growth …

... but able to expand either the data store or the throughput independently.

Think about a file system that was designed for multimedia files,

... but was then given to the High-Performance Computing Industry.

 

It is easy to see that IBM produced a scalable, high-performance generic file system... However its advocates focus on the technology rather than how incredibly useful it is and what it can do for your organisation.

What is Spectrum Scale?


IBM Spectrum Scale (formerly known as GPFS - General Parallel File System) has been around since the mid-90’s when it was the result of an IBM research project to provide high-speed access for the processing of multimedia files.  For the next 20 years it quietly grew as the file system of choice in the HPC / Supercomputer community.  About 6-8 years ago it was realised that the reason it appealed to the HPC community also made it a perfect choice for commercial environments where customers grappled with the issue of managing increasing amounts of unstructured data with dwindling resources.

 

Interesting, but why would I need a file system/object store services provided by Spectrum Scale?

 

  • HPC or HPC like environment – fast access to large amounts of data.  Modern analytic applications, for example Artificial Intelligence, Machine Learning and Deep Learning applications also share these requirements.  GPFS also has the ability to integrate with Hadoop / Spark clusters.

  • Avoid data bottlenecks – Spectrum Scale has shown it can provide data at over 2TB / second from a file system which was > 200 PB.

  • Multiple applications, one data store - need a fast shared file system for an application or applications running on many servers, which requires performance and data consistency/integrity.  These applications may be tightly or loosely linked.  For example DB2, Oracle, SAS, SAP HANA, Spectrum Protect, etc.

  • HA/DR – need a fast, reliable and secure access file system with local high availability and/or synchronous/asynchronous Disaster recovery.  Data integrity is a given.

  • Streaming data from a range of applications/data types – have applications that stream different types data at constantly increasing rates and volumes from many sources for amalgamation/analysis.  Results then need to be provided to clients on a range of platforms, each requiring different protocols.

  • Optimise costs and performance – need to be able to automatically match the cost of the storage to the importance of the data – to be able to move “cold” data to an off-line tier (tape, cloud..), from which it can be automatically recalled as required.  Also able to independently increase the throughput and/or the back-end storage as your requirements grow without having to add a new data store / file system.

  • Remote office access to file store – need to provide remote offices with control and access to a centralised data store. and

  • Appliance or build your own – IBM offers a range of pre-built solutions, or you can build your own – either by yourself or with the assistance of IBM Business Partners or IBM Services.

The critical component of all of these use cases is that Spectrum Scale provides quick and reliable access to multiple types of data with both authentication and consistency.  It is a robust, fast and mature parallel file system that:

  • Many supercomputers use.

  • Its built-in parallelism enables a data layer that meets the performance and scaling requirements of data-intensive applications and workflows such as Big Data, Analytics and AI/ML/DL. and

  • Its built-in support for POSIX, NFS, SMB, HDFS and object which accelerates workflows that require multiple access methods.

 

Digging deeper, you will find that Spectrum Scale has all the features that you would expect from a commercial data store – Spectrum Scale is not just a file system – it is a full data management platform:

  • Control the access to files/parts of the file system,

  • Set the number of “copies” of data and the RAID configuration with a comprehensive range of high availability features,

  • High availability and DR options,

  • Purchase as an appliance or build your own,

  • Efficient management of snapshots,

  • Easy setup, management and monitoring via GUI,

  • Comprehensive monitoring and alerting features,

  • Synchronous and asynchronous replication to remote sites,

  • Configure part or all as an Object Store,

  • Storing and starting OpenStack VMs without copying them from object storage to local file system

  • Ability to easily manage the placement of data - match the importance of the data to the cost of the underlying storage system or move data to offline storage with end-user recall abilities,

  • Encryption, and

  • Use NFS, CIFS or the GPFS Client (for AIX, Linux (Power, Z or x86) or Windows).

 

Contact me for further details, where we can discuss how Spectrum Scale can help you manage your data.  We can examine your options as well as explore Spectrum Scale in greater detail, arrange a demonstration or provide training.