Dell ECS Recall Performance (from Dell PowerScale)

Storage

The Dell ECS is an object based storage platform, in our case we are using it as an archive layer to our Dell PowerScale platform; where it provides a single-namespace file-system allowing “cooler” data to be pushed to the “cheaper”, slower Dell ECS storage freeing up the front-end storage for new data that is “hot” or “warm” and still in use. To the end user, all the files are still there as they are replaced with a

The Dell ECS platform consists of 3 sites of 7 nodes each (EX500s) and each site also has a pair of Kemp ECS Connection Manager H3 appliances, all of which is connected at 10Gbit edge ports and a 40Gbit backhaul.

The Dell PowerScale consists of 5 x F400 nodes and 6 x A200 nodes, providing a “top” tier and a kind of “middle” tier respectively, with 10Gbit node up-links and a 40Gbit back-haul network between the PowerScale and the Dell ECS storage.

We performed tests when the platform was first installed, however its performance was not particularly quick, our expectations were low. However more recently some more elaborate tests were performed to understand its performance in various use cases.

The Dell PowerScale uses Cloudpools to “tier” the “cooler” data from the PowerScale down to the ECS based on its last access time, e.g. an access time over 2 years old, means it is automatically tiered. The “tiering” process (i.e. putting data down to the Dell ECS) is a parallelised process, say you have a 10 node PowerScale cluster, and 1000 files have come of age and need to be tiered to the Dell ECS, roughly you’ll find 100 files are being archived by node 1, 100 files by node 2 and so on, its not quite that prescriptive, but essentially you’ll find its all nodes in the cluster archiving in parallel, rather than just one node. Doing this we saw a speed of about 3-4Gbit/sec writing to the ECS (across all three sites).

Reading it Back

The recall or reading however is different.

Let’s say a directory of files were not accessed for over 2 years, they were archived to the ECS and smartlink (stub) files takes their place on the front-end (higher tier) storage. A user then wishes to access that data, upon attempting to read the files, they are pulled from the ECS into a cache on the higher tier, however crucially this function is seemingly only performed by a single node during this end-user read. There is seemingly no parallelised transfer.

An example test on the above platform:
Reading a folder stored on the ECS containing 5GB of data took 6m32s (13MB/s).

A second example test was performed, but this time the same folder full of data was “Admin Recalled”, which means rather than an end user just accessing the smartlink files and triggering a recall of the data, an administrator performed an “admin recall” of the data to bring it into the cache on the front end storage.

In this case the process appears to be parallelised and the following was observed.

Admin recall on a folder containing 5GB of data took 24s (213MB/s).

Conclusion

What this has taught us is that for a particular end user recall of the data, the performance is fairly slow, however an “admin recall” is significantly faster.

Leave a Reply

Your email address will not be published. Required fields are marked *