common: Leverage a better CRC32C implementation
ISA-L provides a few different CRC32C implementations, of
which Ceph has only ever linked against one
(crc32_iscsi_00).
The second implementation of CRC32C provided by ISA-L
(crc32_iscsi_01) improves upon the first as it is used by
Ceph in a couple of ways:
1) crc32_iscsi_01 explicitly handles and checks for < 8
byte buffers and computes the CRC32C value using the
hardware-accelerated CRC32 instruction. In comparison,
crc32_iscsi_00 prefetches too far in cases of small
buffers, requiring the Ceph code to explicitly check
and handle this case differently in software. This
software-fallback implementation of CRC32 also comes
with a different set of LUTs (look up tables) and is
less efficient as it does not make use of the CRC32
instruction.
2) crc32_iscsi_00 makes use of large LUTs (look up
tables) to effectively perform the modular reduction
required to produce the CRC32C value. In constrast,
crc32_iscsi_01 uses the PCLMUL instruction set to
perform reductions 128-bits at a time with smaller
LUTs, resulting in greater throughput and less data
cache pollution.
Fixes: https://tracker.ceph.com/issues/65791
Signed-off-by: Tyler Stachecki <tstachecki@bloomberg.net>
(cherry picked from commit
948392a41511f5a04b13a8bad43ddb6d2731a197)