Performance Comparison of Scratch vs. Various ARC Filesystems

Test Results Summary

The table below shows some informal test timings of file actions performed on a relatively small sample dataset which contains a large number of files. This type of data set can be a major challenge for many filesystems because a large portion of the time needed to process them is spent in overhead operations for the operating system, networking, storage protocol, and storage subsystems.

When the filesystem is attached via a network (as are /home and /projects on ARC systems), there is a extra layer of overhead for the network communications and storage protocols. While ARC systems are interconnected with some of the fastests and lowest possible latency networks, the aggregate impact of that latency when performing on the order of 10^5 operations and beyond can be very noticeable.

Sample fileset properties:

format

size

number of files

mean file size

stdev

min

medianmax

tar

9244907520 bytes (8.7GiB)

1290284

7165 bytes

26623

21

1785

Table of results

target filesystem

copy from HOME

untar (s)

find (s)

delete (s)

HOME

- n/a -

6365.208

276.925

314.559

k80_q node NVME

11.487

42.125

2.688

-

A100 node NVME

17.486

25.424

1.653

2.130

PROJECTS

9.352

2520

664.77

/scratch

25.385

5906.447

89.391

2821.392

Lessons to infer from these results

Data needs to be close to compute

It is a widely used mantra that “data locality” is critical for compute performance and these tests provide a nearly real-world example.

Keep many-small-files datasets tarred on networked file systems like /home and /projects

Transferring data makes it more likely to be in a nearby cache

NVMe built-in parallelism can be a huge advantage

Tinkercliffs A100 node with NVMe drive tests

[user@tc-gpu001 tmp]$ time cp ~/fstest/mil.tar .
real	0m17.486s
user	0m0.002s
sys	0m5.363s

[user@tc-gpu001 tmp]$ time tar -xf mil.tar
real	0m25.424s
user	0m2.717s
sys	0m22.601s

[user@tc-gpu001 tmp]$ time find ./10* | wc -l
1290284
real	0m1.653s
user	0m0.647s
sys	0m1.074s

[user@tc-gpu001 tmp]$ time rm -rf ./10*
real	0m32.130s
user	0m0.786s
sys	0m26.716s

[user@tc-gpu001 tmp]$ time tar -c 10* > /dev/null
real	0m6.420s
user	0m3.210s
sys	0m3.188s

[user@tc-gpu001 tmp]$ time tar -cf mil2.tar 10*
real	0m13.066s
user	0m3.787s
sys	0m9.230s

Tinkercliffs login node testing against /scratch

#Copy from $HOME to /scratch
[user@tinkercliffs2]$ time cp $HOME/fstest/mil.tar .
real	0m25.385s
user	0m0.002s
sys	0m6.788s

#Untar /scratch -> /scratch
[user@tinkercliffs2]$ time tar -xf mil.tar
real	98m26.447s
user	0m4.996s
sys	1m23.815s

#Use find to count the files in the unpacked dataset
[user@tinkercliffs2]$ time find ./10* | wc -l
1290284
real	1m29.391s
user	0m0.827s
sys	0m6.329s

# Delete files from /scratch
[user@tinkercliffs2]$ time rm -rf ./10*
real	47m1.392s
user	0m1.077s
sys	1m4.614s