abseil-cpp

mirror of https://github.com/abseil/abseil-cpp.git synced 2026-06-04 20:14:23 +08:00

Files

Abseil Team c4ff4d561c Use more efficient reduction algorithm in FinalizePclmulStream()

1. When reducing 4 vectors to 1, fold across 2 vectors first and then across 1,
   instead of across 1 and then across 2.  This works slightly better because it
   makes the constants be used in order.

2. Use a faster algorithm to reduce 1 vector to a scalar value.

This approach is the same one I used in the assembly code I recently wrote for
the Linux kernel in the patch series
https://lore.kernel.org/lkml/20250210174540.161705-1-ebiggers@kernel.org/T/#u
(search for "reduce_128bits_to_crc").

On Skylake (which uses num_pclmul_streams=2), this improves CRC32C performance
on 2048-byte messages by about 2%.  The overall improvement is relatively small
since FinalizePclmulStream() is only called for messages >= 2048 bytes and is
only called num_pclmul_streams times per message.  So it's not really a
bottleneck, but the new code is definitely a bit shorter and faster.

PiperOrigin-RevId: 739002382
Change-Id: I0505e61f012e4a4f8b85958f7f00478f5b1a7026

2025-03-20 18:06:56 -07:00

cpu_detect.cc

Crc: Remove the __builtin_cpu_supports path for SupportsArmCRC32PMULL

2025-01-17 15:56:53 -08:00

cpu_detect.h

Add entries for Neoverse N2,V1, and V2 into CRC dynamic dispatch table.

2023-10-06 14:07:43 -07:00

crc32_x86_arm_combined_simd.h

PR #1662 : Replace shift with addition in crc multiply

2024-05-07 10:33:09 -07:00

crc32c_inline.h

Fixes many compilation issues that come from having no external CI