You can explicitly set windowLog
with --zstd=windowLog=...
It's sometimes useful to combine low-ish compression level with high window size. E.g. when the input data contains multiple similar large chunks that do not fit into the low-compression-level window.
At work we've recently been using zstd as a better-compressing alternative to gzip, and overall I've been pretty happy with it. A minor documentation gripe, though, is that the behavior around multithreaded compression is a bit unclear. I understand it's chunking the work and sending chunks to different threads to parallelize the compression process, and this means that I should expect to see better use of threads on larger files because there are more chunks to spread around, but what is the relationship?
When I look in
man zstd
I see that you can set-B<num>
to specify the size of the chunks, and it's documented as "generally 4 * windowSize
". Except the documentation doesn't say howwindowSize
is set.From a bit of poking at the source, it looks to me like the way this works is that
windowSize
is2**windowLog
, andwindowLog
depends on your compression level. If I know I'm doingzstd -15
, though, how doescompressionLevel=15
translate into a value forwindowLog
? There's a table inlib/compress/clevels.h
which covers inputs >256KB:See the source if you're interested in other sizes.
So it looks like
windowSize
is:≤1
: 524k2
: 1M3-8
(default): 2M9-16
: 4M17-19
: 8M20
: 32M21
: 64M22
: 128MProbably best not to rely on any of this, but it's good to know what
zstd -<level>
is doing by default!Comment via: facebook, mastodon, bluesky