Update advice for different devices

PiperOrigin-RevId: 700993687
2026-06-02 11:54:36 +08:00 · 2024-11-28 16:44:11 +00:00
parent 1490230430
commit e56abb7a55
2 changed files with 48 additions and 8 deletions
--- a/docs/known_issues.md
+++ b/docs/known_issues.md
@@ -1,9 +1,36 @@
 # Known Issues

-### Devices other than NVIDIA A100 or H100
+## Numerical performance for different GPU devices

-There are currently known unresolved numerical issues with using devices other
-than NVIDIA A100 and H100. For now, accuracy has only been validated for A100
-and H100 GPU device types. See
+There are numerical performance issues with some GPU types that are under
+investigation, see
 [this Issue](https://github.com/google-deepmind/alphafold3/issues/59) for
 tracking.
+
+### Verified devices
+
+We have run successful large-scale numerical tests for the following devices and
+maximum number of tokens:
+
+-   H100 80 GB: up to 5,120 tokens.
+-   A100 80 GB: up to 5,120 tokens.
+-   A100 40 GB: up to 4,352 tokens with
+    [unified memory configuration](https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#nvidia-a100-40-gb).
+-   P100 16 GB: up to 1,024 tokens.
+
+Note that the 80 GB devices can run larger targets using unified memory, but
+outputs have only been verified on particular examples rather than a large-scale
+test set.
+
+#### CUDA Capability 7.x GPUs: known issues
+
+All CUDA Capability 7.x GPUs (e.g. V100) produce obviously bad output, with lots
+of clashing residues (the clashes cause a ranking score of -99 or lower). With a
+small fix relating to `bfloat16` conversion to `float32` outputs look normal,
+but there are numerical performance regressions for some bucket sizes (tested on
+V100 devices).
+
+#### CUDA Capability 6.x GPUs: no known issues
+
+CUDA Capability 6.x GPUs give reasonable output, but large scale numerical
+testing has only been done for P100.
--- a/docs/performance.md
+++ b/docs/performance.md
@@ -98,14 +98,27 @@ AlphaFold 3 can run on inputs of size up to 4,352 tokens on a single NVIDIA A100
 While numerically accurate, this configuration will have lower throughput
 compared to the set up on the NVIDIA A100 (80 GB), due to less available memory.

-#### Devices other than NVIDIA A100 or H100
+#### NVIDIA P100

-There are currently known unresolved numerical issues with using devices other
-than NVIDIA A100 and H100. For now, accuracy has only been validated for A100
-and H100 GPU device types. See
+AlphaFold 3 can run on inputs of size up to 1,024 tokens on a single NVIDIA P100
+with no configuration changes needed.
+
+#### NVIDIA V100
+
+There are known issues with V100 devices. See
 [this Issue](https://github.com/google-deepmind/alphafold3/issues/59) for
 tracking.

+#### Other devices
+
+There are known issues with CUDA Capability 7.x devices. See
+[this Issue](https://github.com/google-deepmind/alphafold3/issues/59) for
+tracking.
+
+CUDA Capability 6.x and 8.x devices other than those listed explicitly here are
+believed to work for AlphaFold 3, but large-scale testing has only been
+performed for the devices mentioned above.
+
 ## Compilation Buckets

 To avoid excessive re-compilation of the model, AlphaFold 3 implements