mirror of
https://github.com/KosinskiLab/AlphaPulldown.git
synced 2026-06-04 14:14:24 +08:00
* Harden MMseqs species ID resolution fallback * Reorganize tests for CPU coverage CI * New * Fix function coverage checker def-line false positives * Expand unit coverage for helper and backend manager utilities * New. * New. * Expand unit coverage for template and post-processing helpers * Expand unit coverage for objects.py edge cases * Publish HTML coverage reports via GitHub Pages * Add CPU unit coverage for AlphaFold3 backend helpers * Reorganize tests and expand backend coverage * Reset shared test flags between cases * Expand AF3 prepare_input unit coverage * Cover AF3 and truemultimer feature creation * Test AF3 multimer MSA translation paths * Cover AF3 duplicate-residue multimer fallback * Cover AF2 resume and postprocess edge paths * Cover AF3 template mmCIF preparation * Test small script entry points * Expand workflow and ModelCIF test coverage * Add backend extras and install guide * Clarify AF3 backend installation path * Stabilize cluster GPU test runners * Document AF3 CMake SQLite hints * Simplify backend installation guide * Align AF3 install with working cluster env * Backfill typing dataclass_transform for AF2 * Pin TensorFlow for cluster installs * Fallback AF2 relax when CUDA OpenMM is unavailable * Raise AF3 default minimum bucket size * Simplify backend cluster installation guide * Fix AF3 wrapper JSON output isolation * Fix AF3 JSON wrapper outputs and MMseqs ID parsing * Fix CI entrypoint stub and Python 3.8 typing * Document release readiness test gates
247 lines
6.9 KiB
Markdown
247 lines
6.9 KiB
Markdown
# Backend Installation Guide
|
|
|
|
This guide is for direct AlphaPulldown use without Snakemake.
|
|
|
|
Two points matter in practice:
|
|
|
|
1. Run the install commands from the AlphaPulldown repo root.
|
|
2. For AlphaFold3, building the vendored `alphafold3` package is a separate step. A root-level install alone is not enough.
|
|
|
|
The Docker files remain the long-term reference environments, but the commands below are the simpler cluster-facing paths.
|
|
We revalidated them on EMBL on March 30, 2026 by creating fresh environments and running the cluster test suites there.
|
|
|
|
## Known-good cluster stacks
|
|
|
|
These are the two EMBL environments that were already working and that we rechecked while validating the wrappers:
|
|
|
|
- `AlphaPulldown` for AF2:
|
|
- Python `3.10`
|
|
- `jax 0.5.3`
|
|
- `jaxlib 0.5.3`
|
|
- `numpy 1.26.4`
|
|
- `tensorflow-cpu 2.20.0`
|
|
- `openmm 8.1.1`
|
|
- `pdbfixer 1.12`
|
|
- `modelcif 1.6`
|
|
- `AlphaPulldown_alphafold3` for AF3:
|
|
- Python `3.11`
|
|
- `jax 0.5.3`
|
|
- `jaxlib 0.5.3`
|
|
- `numpy 1.26.4`
|
|
- `tensorflow-cpu 2.18.0`
|
|
- `openmm 8.3.1`
|
|
- `pdbfixer 1.12`
|
|
- `modelcif 1.6`
|
|
- `jax-triton 0.2.0`
|
|
- `triton 3.1.0`
|
|
- `rdkit 2024.3.5`
|
|
- `typeguard 2.13.3`
|
|
- compiled `alphafold3.cpp`
|
|
|
|
The installation steps below are written to stay close to those working environments and to keep user-facing environment variables to a minimum.
|
|
|
|
## AlphaFold2 backend
|
|
|
|
Create the environment, activate it, move into the repo, and install:
|
|
|
|
```bash
|
|
mamba create -y -n apd-af2 -c conda-forge -c bioconda \
|
|
python=3.10 \
|
|
kalign2 \
|
|
hmmer \
|
|
hhsuite
|
|
mamba activate apd-af2
|
|
cd /path/to/AlphaPulldown
|
|
python -m pip install ".[alphafold2,test]"
|
|
```
|
|
|
|
Check that GPU JAX is visible:
|
|
|
|
```bash
|
|
python - <<'PY'
|
|
import jax
|
|
print(jax.__version__)
|
|
print(jax.local_devices(backend="gpu"))
|
|
PY
|
|
```
|
|
|
|
## AlphaFold3 backend
|
|
|
|
AlphaFold3 needs both the AlphaPulldown install and the vendored `alphafold3` build.
|
|
|
|
```bash
|
|
mamba create -y -n apd-af3 -c conda-forge -c bioconda \
|
|
python=3.11 \
|
|
kalign2 \
|
|
hmmer \
|
|
hhsuite \
|
|
libcifpp \
|
|
sqlite
|
|
mamba activate apd-af3
|
|
cd /path/to/AlphaPulldown
|
|
python -m pip install ".[alphafold3,test]"
|
|
python -m pip install --no-deps -e ./alphafold3
|
|
build_data
|
|
```
|
|
|
|
Check that the compiled extension and GPU JAX are available:
|
|
|
|
```bash
|
|
python - <<'PY'
|
|
import alphafold3.cpp
|
|
import jax
|
|
print(jax.__version__)
|
|
print(jax.local_devices(backend="gpu"))
|
|
print(alphafold3.cpp.__file__)
|
|
PY
|
|
```
|
|
|
|
Notes:
|
|
|
|
- The working EMBL AF3 environment is closer to `.[alphafold3,test]` than to the upstream `alphafold3/dev-requirements.txt` stack.
|
|
- For cluster installs, prefer `python -m pip install ".[alphafold3,test]"` and then build the vendored `alphafold3` package.
|
|
- The compiled `alphafold3.cpp` extension comes from `python -m pip install --no-deps -e ./alphafold3`, not from the root install.
|
|
- The vendored AF3 package provides the `build_data` entry point. Use that directly.
|
|
- If you are actively developing inside the checkout and want the root package editable as well, add:
|
|
|
|
```bash
|
|
python -m pip install -e . --no-deps
|
|
```
|
|
|
|
## Cluster validation
|
|
|
|
On EMBL, the AF2 and AF3 functional tests already have default database roots baked into the test files, so the only environment variable you need for the standard cluster runs is:
|
|
|
|
```bash
|
|
export RUN_GPU_FUNCTIONAL_TESTS=1
|
|
```
|
|
|
|
Set `ALPHAFOLD_DATA_DIR` only if your databases are not in the EMBL default locations.
|
|
|
|
Two ways to run the cluster tests:
|
|
|
|
1. Direct `srun` + `pytest`
|
|
- Best for validating a fresh install end-to-end.
|
|
- Keeps everything on one allocated GPU node.
|
|
- More reliable than the wrappers when Slurm priority is poor.
|
|
- Important: when calling `pytest` directly, pass `-o addopts="-ra --strict-markers"`. The repo-level `pytest.ini` excludes cluster tests by default.
|
|
2. Wrapper scripts
|
|
- `test/cluster/run_alphafold2_predictions.py`
|
|
- `test/cluster/run_alphafold3_predictions.py`
|
|
- Faster when the queue is healthy, because they fan out one job per pytest node.
|
|
|
|
### AlphaFold2
|
|
|
|
Direct full-suite validation:
|
|
|
|
```bash
|
|
srun -p gpu-training --gres=gpu:1 \
|
|
--cpus-per-task=4 --mem=16G --time=12:00:00 \
|
|
bash -lc '
|
|
cd /path/to/AlphaPulldown
|
|
export RUN_GPU_FUNCTIONAL_TESTS=1
|
|
python -m pytest -o addopts="-ra --strict-markers" -vv -s \
|
|
test/cluster/check_alphafold2_predictions.py --use-temp-dir
|
|
'
|
|
```
|
|
|
|
Expected result on the standard suite:
|
|
|
|
```text
|
|
11 passed, 1 skipped
|
|
```
|
|
|
|
The skip is the opt-in MMseqs functional inference check, which only runs when `RUN_MMSEQS_FUNCTIONAL_TESTS=1` is set.
|
|
|
|
Preview the collected nodes for wrapper mode:
|
|
|
|
```bash
|
|
python test/cluster/run_alphafold2_predictions.py --list
|
|
```
|
|
|
|
Wrapper-based parallel submission:
|
|
|
|
```bash
|
|
python test/cluster/run_alphafold2_predictions.py \
|
|
--max-tests 6 \
|
|
--use-temp-dir \
|
|
--partition gpu-training
|
|
```
|
|
|
|
### AlphaFold3
|
|
|
|
Direct full-suite validation:
|
|
|
|
```bash
|
|
srun -p gpu-training --gres=gpu:1 \
|
|
--cpus-per-task=4 --mem=64G --time=12:00:00 \
|
|
bash -lc '
|
|
cd /path/to/AlphaPulldown
|
|
export RUN_GPU_FUNCTIONAL_TESTS=1
|
|
python -m pytest -o addopts="-ra --strict-markers" -vv -s \
|
|
test/cluster/check_alphafold3_predictions.py --use-temp-dir
|
|
'
|
|
```
|
|
|
|
On our fresh EMBL validation run, `32G` was not enough for the full AF3 suite on `hgx5`; `64G` was.
|
|
|
|
Preview the collected nodes for wrapper mode:
|
|
|
|
```bash
|
|
python test/cluster/run_alphafold3_predictions.py --list
|
|
```
|
|
|
|
Wrapper-based parallel submission:
|
|
|
|
```bash
|
|
python test/cluster/run_alphafold3_predictions.py \
|
|
--max-tests 6 \
|
|
--use-temp-dir \
|
|
--partition gpu-training
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### AlphaFold2: `Unknown backend: 'gpu' requested, ... Platforms present are: cpu`
|
|
|
|
That means the environment has CPU-only JAX:
|
|
|
|
```bash
|
|
python -m pip install --upgrade --no-cache-dir "jax==0.5.3" "jax[cuda12]==0.5.3"
|
|
python - <<'PY'
|
|
import jax
|
|
print(jax.__version__)
|
|
print(jax.local_devices(backend="gpu"))
|
|
PY
|
|
```
|
|
|
|
### AlphaFold2: `There is no registered Platform called "CUDA"`
|
|
|
|
That comes from OpenMM relaxation, not from JAX. Older working EMBL envs exposed a CUDA-enabled OpenMM platform, while a fresh pip-installed OpenMM may expose only `Reference`, `CPU`, and `OpenCL`.
|
|
|
|
Current AlphaPulldown falls back to CPU relax automatically if CUDA is unavailable, so the AF2 cluster tests still pass. If you want GPU-backed OpenMM relax as well, install the OpenMM stack from conda before the pip install.
|
|
|
|
### AlphaFold3: `ModuleNotFoundError: No module named 'alphafold3.cpp'`
|
|
|
|
The root AlphaPulldown install succeeded, but the vendored AF3 package was not built yet:
|
|
|
|
```bash
|
|
cd /path/to/AlphaPulldown
|
|
python -m pip install ".[alphafold3,test]"
|
|
python -m pip install --no-deps -e ./alphafold3
|
|
build_data
|
|
```
|
|
|
|
### AlphaFold3 build error: `Could NOT find SQLite3`
|
|
|
|
If SQLite is installed in the conda environment but CMake still cannot find it, rerun the AF3 build with the conda prefix exposed:
|
|
|
|
```bash
|
|
mamba activate apd-af3
|
|
export CMAKE_PREFIX_PATH="$CONDA_PREFIX"
|
|
export SQLite3_ROOT="$CONDA_PREFIX"
|
|
cd /path/to/AlphaPulldown
|
|
python -m pip install --no-deps -e ./alphafold3
|
|
build_data
|
|
```
|