Files
abseil-cpp/absl/crc/internal
Abseil Team c4ff4d561c Use more efficient reduction algorithm in FinalizePclmulStream()
1. When reducing 4 vectors to 1, fold across 2 vectors first and then across 1,
   instead of across 1 and then across 2.  This works slightly better because it
   makes the constants be used in order.

2. Use a faster algorithm to reduce 1 vector to a scalar value.

This approach is the same one I used in the assembly code I recently wrote for
the Linux kernel in the patch series
https://lore.kernel.org/lkml/20250210174540.161705-1-ebiggers@kernel.org/T/#u
(search for "reduce_128bits_to_crc").

On Skylake (which uses num_pclmul_streams=2), this improves CRC32C performance
on 2048-byte messages by about 2%.  The overall improvement is relatively small
since FinalizePclmulStream() is only called for messages >= 2048 bytes and is
only called num_pclmul_streams times per message.  So it's not really a
bottleneck, but the new code is definitely a bit shorter and faster.

PiperOrigin-RevId: 739002382
Change-Id: I0505e61f012e4a4f8b85958f7f00478f5b1a7026
2025-03-20 18:06:56 -07:00
..
2022-11-09 13:09:34 -08:00
2023-05-02 20:24:15 +02:00