Document the discrepancy between AlphaFold 3 and AlphaFold Server in Known Issues

Big thanks to @stianale for reporting this in https://github.com/google-deepmind/alphafold3/issues/492.

PiperOrigin-RevId: 872892863
Change-Id: Ia24bb492daea44534be4ac743fb304bc72fe9741
This commit is contained in:
Augustin Zidek
2026-02-20 07:33:38 -08:00
committed by Copybara-Service
parent ea1667690e
commit 4ca8a65692

View File

@@ -13,3 +13,50 @@ Between commits https://github.com/google-deepmind/alphafold3/commit/f8df1c7 and
https://github.com/google-deepmind/alphafold3/commit/4e4023c, AlphaFold 3
handled incorrectly any two-letter atoms (e.g. Cl, Br) in ligands defined using
SMILES strings.
## MSA discrepancy between AlphaFold 3 and AlphaFold Server
### The root cause of the problem
The released AlphaFold 3 and AlphaFold Server use the same model weights and
equivalent featurisation and model code. However, the way they run genetic
search is slightly different. The released AlphaFold 3 searches each database in
one go, while AlphaFold Server has a sharded version of each database (split
into multiple smaller FASTA files) and searches all of the shards in parallel.
The results of these parallel searches are then merged together at the end.
The discrepancy is caused by a different (deeper) MSA on AlphaFold Server in
some cases. We discovered that the issue is caused by running sharded Jackhmmer
in AlphaFold Server without the `--domZ` flag (has to be set together with the
`--Z` flag and set to the same value) which means that effectively the AlphaFold
Server is running with roughly 100× more permissive `--domE` filter. This means
more sequences are sometimes included in the MSA.
We are keeping behaviour unchanged in both the released AlphaFold 3 and in the
AlphaFold Server, however, we are giving users with local installs an option to
replicate AlphaFold Server behaviour locally. In our large scale tests the
difference did not matter, it is only very specific inputs that get better
accuracy with the deeper MSA.
See https://github.com/google-deepmind/alphafold3/issues/492 for an example
input where a protein-DNA complex gets significantly higher ipTM and pTM with
AlphaFold Server compared to a local run.
### Replicating AlphaFold Server behaviour locally
If you want to replicate AlphaFold Server behaviour (i.e. better folding
accuracy in some cases), you can increase the value of the Jackhmmer/Nhmmer
`--domE` flag by 100× compared to its default value.
Alternatively, you can run the sharded MSA search while not setting the `--domZ`
value you would have to modify the code to do it. We added support for
searching against sharded databases in AlphaFold 3 in
https://github.com/google-deepmind/alphafold3/commit/805adc3863841d83d631ccd18136ad58ce3ecb34
and the way to run AlphaFold 3 with sharded databases is documented in
https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#sharded-genetic-databases.
It can provide 1030× speedup (potentially even more, depending on hardware) of
the genetic search.
In general, we recommend experimenting with MSA if you are seeing a prediction
with low predicted confidence. Typically adding more *relevant* sequences in the
MSA will increase AlphaFold prediction accuracy and model confidence scores.