Updates `absl::Mutex` and related RAII lockers (`absl::MutexLock`,
etc) to deprecate legacy APIs in favor of standard-compliant
alternatives.
* `absl::Mutex`: Adds `[[deprecated]]` to legacy CamelCase methods
(e.g., `Lock`, `ReaderLock`) in favor of standard C++ lower-case
methods (`lock`, `lock_shared`) which support `std::scoped_lock`.
* `absl::MutexLock` (and friends): Adds `[[deprecated]]` to
constructors accepting raw pointers, favoring new
reference-accepting constructors.
To support this change, warnings coming from external repositories
are now suppressed in Bazel CI builds.
PiperOrigin-RevId: 852978576
Change-Id: I54ae951f28a1b7d90fcb46ceeaf09f192af257df
The class is intentionally designed for exactly what is needed here: no type
erasure, and therefore allocations and no locks taken. It's also not conceivable
that we would ever change this without introducing a new API, as doing so would
serve no purpose and harm performance.
PiperOrigin-RevId: 852955861
Change-Id: I67bb75cf17c1184392bbec6ed9d15faee2f6376b
This change introduces absl::chunked_queue, a sequence container
optimized for use as a FIFO (First-In, First-Out) queue. It is similar
in purpose to std::deque but with different performance trade-offs and
features.
absl::chunked_queue stores elements in a series of
exponentially-growing chunks of memory.
absl::chunked_queue is often a better choice than std::deque in the
following situations:
* Large queues: For very large numbers of elements, the exponential
growth strategy of absl::chunked_queue can lead to fewer, larger
memory allocations compared to std::deque, which can be a
performance advantage.
* Strict FIFO processing: When you only need to add elements to the
back (push_back) and remove them from the front (pop_front).
std::deque should be preferred in the following cases:
* Operations at both ends: std::deque is designed for efficient
insertions and deletions at both the front and the
back. absl::chunked_queue is optimized for push_back and pop_front
and does not offer a pop_back method.
* Random access: std::deque provides amortized O(1) random access to
elements via operator[]. absl::chunked_queue does not support
random access.
PiperOrigin-RevId: 850999629
Change-Id: Ie71737c10b6125b9e498109267cac87a4ca2f9e8
For length in [17, 32] we compute two chain of dependent CRC32 operations to have good entropy in the resulting two 32 bit numbers.
1. x := CRC32(CRC32(state, A), D)
2. y := CRC32(CRC32(bswap(state), C), B)
On ARM:
CRC32 has 2 cycles latency and throughput equal to 1.
Computations will be pipelined without any wait.
On x86:
CRC32 has 3 cycles latency and throughput equal to 1.
There will be 1 extra cycle wait, but we can do `cmp` in parallel.
At the end we multiply (mul - x) * (y - mul). mul is added to fill upper 32 bits of CRC result with good entropy bits. `mul = rotr(kMul, len)`
We also mixing length differently:
1. `state + 8 * len` (`lea` instruction), later one or two CRC shuffle these bits well into low 32 bit.
2. `rotr(kMul, len)` is used for filling high 32 bits before multiplication in `Mix`. This avoid reading from `kStaticRandomData`.
For smaller strings we try to extremely minimize binary size and register pressure.
CRC instruction fused with memory read is used. llvm-mca reporting 1 cycle smaller latency compared to separate `mov` + `crc`.
ASM analysis https://godbolt.org/z/e1xrKzhdc:
1. 100+ bytes binary size saving (per inline instance)
2. 25+ instruction saving
3. 2 registers are not used (r8 and r9).
Latency in isolation without accounting comparison are controversial.
1. latency for 8 bytes in isolation is 1 cycle better: https://godbolt.org/z/zc39eM3K9
2. latency for 1-3 bytes in isolation is 2 cycles better: https://godbolt.org/z/qMKfbv438
3. latency for 16 bytes in isolation is 3 cycles worse: https://godbolt.org/z/vcqr8oGv3
4. latency for 32 bytes in isolation is 5 cycles worse:
https://godbolt.org/z/nEPP5jP58
PiperOrigin-RevId: 850659551
Change-Id: I02a2434f2d98473b099c171ef1c56adffa821c60
prior to C++17. `absl::string_view` is now an alias for `std::string_view`.
It is recommended that clients simply use `std::string_view`.
PiperOrigin-RevId: 845822478
Change-Id: I220530c84118e5b9ef110baa002c232ac8f2c5f2
Modified ArgsList::ReadFromFlagfile to redact the content of unexpected lines from error messages. \
PiperOrigin-RevId: 845327732
Change-Id: I6e0bf8f443b534cc9fa14e214e0d275e30116261
This reduces binary bloat by making mangled names smaller and by reducing strings like `__PRETTY_FUNCTION__.`
The default types can be inferred and do not provide extra information.
PiperOrigin-RevId: 844837029
Change-Id: I67f3c9445bd018ea829fa584784095e2e84cb739
Since this method is only called by `NonConst` which immediately converts it to a `std::string`, this is currently safe. Add annotation nevertheless to show the contract.
This currently makes the majority of lifetime annotation suggestion provided by the lifetime analysis https://godbolt.org/z/hKvrE1hG1
PiperOrigin-RevId: 843275432
Change-Id: Ib1a0513eca944a9c7c4c612c3111bf05881c746d
AES instructions are used, when available. We load blocks of 64 bytes of the string into 4 independently hashed 128-bit vectors. We use AES encrypt and decrypt to mix the bits. Instructions are running in parallel.
Last <=64 bytes are loaded to 4 (or 2 if rest length is <=32) overlapping vectors and encrypted additionally. At the end we mix by another encryption similar to the case in 33-64.
```
name CYCLES/op CYCLES/op vs base
BM_HASHING_Combine_contiguous_Fleet_hot 479.0m ± 1% 437.0m ± 0% -8.77% (p=0.000 n=30)
BM_HASHING_Combine_contiguous_Fleet_cold 1.700 ± 2% 1.526 ± 2% -10.24% (p=0.000 n=30)
arcadia-rome:
BM_HASHING_Combine_contiguous_Fleet_hot 465.0m ± 1% 452.0m ± 1% -2.80% (p=0.000 n=30)
BM_HASHING_Combine_contiguous_Fleet_cold 4.024 ± 1% 3.676 ± 0% -8.66% (p=0.000 n=30)
```
ASM analysis https://godbolt.org/z/5EzEnT46j shows 8 cycles savings for 128 byte string. We also perform 2x less load operations.
PiperOrigin-RevId: 842818076
Change-Id: Ib89f25e0bae2c8ba9ed340350408c27afe6fd222
Motivation: hash state being first allows for fewer unnecessary moves between registers since (a) this matches the argument order in CombineContiguousImpl and (b) hash state is also the return value (on ARM64, the return value and the first argument use the same register) - [example assembly diff](https://godbolt.org/z/c1h5dMe9K) for related change.
PiperOrigin-RevId: 842309048
Change-Id: I5b1f0fb381728ced2b3fba53fb9adbc0e4a45189
instead of just trying to be MSVC
This also fixes the new warnings that are caught.
These include:
* Unreachable code after GTEST_SKIP (this is kind of ugly)
* Some -Wundef warnings
* A -Wshadow warning in vlog_config.cc
PiperOrigin-RevId: 838046186
Change-Id: Ief48d6db2b8755d2173997d052560880593d5819
instead of just trying to be MSVC
This also fixes the new warnings that are caught.
These include:
* Unreachable code after GTEST_SKIP (this is kind of ugly)
* Some -Wundef warnings
* A -Wshadow warning in vlog_config.cc
PiperOrigin-RevId: 838017208
Change-Id: I39373c0ccc57c8660c22815c51ac5b4180aec53c
Add an extra variant to FunctionToCall: `relocate_from_to_and_query_rust`. It is identical to "relocate_from_to" for C++ managers, but instructs Rust managers to perform a special operation that can be detected by the caller.
PiperOrigin-RevId: 837257408
Change-Id: Idc270af2716252612a77a26b1b3cf83778aaa20d
Detection isn't perfect, but it is better than nothing:
https://godbolt.org/z/YzeMeb58j
PiperOrigin-RevId: 837204651
Change-Id: Id1027c4c27bd95ad923e4c5d242a28079b16db79
`std::span::subspan()` has stricter preconditions than its `absl::`
counterpart. Supplying a `len` that would extend beyond the end of the
span is undefined behavior for `std::span` (unless `len` is the default
`npos` value), whereas `absl::span` simply truncates the result.
PiperOrigin-RevId: 836331418
Change-Id: I0e9a11cb434deca0b88d761e8233a44d5a9273ce