Performance Comparison of Scratch vs. Various ARC Filesystems
Test Results Summary
The table below shows some informal test timings of file actions performed on a relatively small sample dataset which contains a large number of files. This type of data set can be a major challenge for many filesystems because a large portion of the time needed to process them is spent in overhead operations for the operating system, networking, storage protocol, and storage subsystems.
When the filesystem is attached via a network (as are /home
and /projects
on ARC systems), there is a extra layer of overhead for the network communications and storage protocols. While ARC systems are interconnected with some of the fastests and lowest possible latency networks, the aggregate impact of that latency when performing on the order of 10^5 operations and beyond can be very noticeable.
Sample fileset properties:
format |
size |
number of files |
mean file size |
stdev |
min |
medianmax |
---|---|---|---|---|---|---|
tar |
9244907520 bytes (8.7GiB) |
1290284 |
7165 bytes |
26623 |
21 |
1785 |
Table of results
target filesystem |
copy from HOME |
untar (s) |
find (s) |
delete (s) |
---|---|---|---|---|
HOME |
- n/a - |
6365.208 |
276.925 |
314.559 |
k80_q node NVME |
11.487 |
42.125 |
2.688 |
- |
A100 node NVME |
17.486 |
25.424 |
1.653 |
2.130 |
PROJECTS |
9.352 |
2520 |
664.77 |
|
/scratch |
25.385 |
5906.447 |
89.391 |
2821.392 |
Lessons to infer from these results
Data needs to be close to compute
It is a widely used mantra that “data locality” is critical for compute performance and these tests provide a nearly real-world example.
Keep many-small-files datasets tarred on networked file systems like /home
and /projects
Transferring data makes it more likely to be in a nearby cache
NVMe built-in parallelism can be a huge advantage
Tinkercliffs A100 node with NVMe drive tests
[user@tc-gpu001 tmp]$ time cp ~/fstest/mil.tar .
real 0m17.486s
user 0m0.002s
sys 0m5.363s
[user@tc-gpu001 tmp]$ time tar -xf mil.tar
real 0m25.424s
user 0m2.717s
sys 0m22.601s
[user@tc-gpu001 tmp]$ time find ./10* | wc -l
1290284
real 0m1.653s
user 0m0.647s
sys 0m1.074s
[user@tc-gpu001 tmp]$ time rm -rf ./10*
real 0m32.130s
user 0m0.786s
sys 0m26.716s
[user@tc-gpu001 tmp]$ time tar -c 10* > /dev/null
real 0m6.420s
user 0m3.210s
sys 0m3.188s
[user@tc-gpu001 tmp]$ time tar -cf mil2.tar 10*
real 0m13.066s
user 0m3.787s
sys 0m9.230s
Tinkercliffs login node testing against /scratch
#Copy from $HOME to /scratch
[user@tinkercliffs2]$ time cp $HOME/fstest/mil.tar .
real 0m25.385s
user 0m0.002s
sys 0m6.788s
#Untar /scratch -> /scratch
[user@tinkercliffs2]$ time tar -xf mil.tar
real 98m26.447s
user 0m4.996s
sys 1m23.815s
#Use find to count the files in the unpacked dataset
[user@tinkercliffs2]$ time find ./10* | wc -l
1290284
real 1m29.391s
user 0m0.827s
sys 0m6.329s
# Delete files from /scratch
[user@tinkercliffs2]$ time rm -rf ./10*
real 47m1.392s
user 0m1.077s
sys 1m4.614s