On the Ubuntu-based machines I use at work, SIZE defaults to the ridiculously small 1MB. It's somewhat hidden in the manpage: the documentation about the -S
option doesn't mention it, but later it says (bold added):
SIZE may be followed by the following multiplicative suffixes: % 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.
You can verify this by running seq 1 1000000000000 | sort
and, while that's happening, ls -lh /tmp/sort*
.
I actually generally don't want it to write its data to disk, because I'm usually using machines where the data fits very comfortably in RAM (data size is maybe up to 1GB) and writing to disk (even SSD) just adds slowness. Though splitting it unnecessarily also adds slowness. For my use case, specifying a much bigger buffer is appropriate.
Wow! That's way too low for modern machines, especially since sort
can query the environment to get a sense of how much memory it might be worth using.
The unix
sort
command is clever: to sort very large files it does a series of in-memory sorts, saving sorted chunks to temporary files, and then does a merge sort on those chunks. Except this often doesn't work anymore.Here's what I see if I run
man sort
and look at the documentation for--buffer-size
:That's pretty terse! What does my Mac say?
Makes sense! But then the docs for
--temporary-directory
say:And these days
/tmp
is often memory-backed, via tmpfs. This changed in Fedora 18 (2013) and Ubuntu 24.10 (2024), and is changing in Debian 13 (in a month or two).It seems to me that these days it would be better for
--temporary-directory
to default to/var/tmp
, which is preserved across reboots and so will generally be backed by disk even on systems that use tmpfs for/tmp
. In the meantime,sort --temporary-directory /var/tmp
will do the trick.