Files
alphafast/docs/input_format.md
2026-05-04 12:18:27 -04:00

4.3 KiB

Input JSON Format

AlphaFast uses the standard AlphaFold 3 JSON input format. Place one or more .json files in your --input_dir.

Field Reference

Field Type Required Description
name string Yes Job name. Used for output directory naming.
sequences array Yes Array of chain definitions (protein, ligand, etc.)
modelSeeds array of int Yes Random seeds for inference. At least one seed required.
dialect string Yes Must be "alphafold3"
version int Yes Supported versions: 1, 2, 3, or 4

Optional Fields

Field Type Description
bondedAtomPairs array Custom bonded atom pair constraints
userCCD string Custom Chemical Component Dictionary (CCD) entries

Protein Chain

A single protein chain with chain ID "A":

{
  "name": "my_protein",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    }
  ],
  "modelSeeds": [42],
  "dialect": "alphafold3",
  "version": 1
}

Homomeric Complex

A homodimer where both chains A and B share the same sequence. List multiple chain IDs in the id array:

{
  "name": "my_homodimer",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGY"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

Heteromeric Complex

Two different protein chains:

{
  "name": "my_heterodimer",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    },
    {
      "protein": {
        "id": ["B"],
        "sequence": "MHSSIVLATVLFVAIASASKTRELCMKSLEHAKVGTSKEAKQDGIDLYKHMFE"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

Protein with Ligand

A protein chain with an ATP ligand, specified using its CCD code:

{
  "name": "protein_atp",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    },
    {
      "ligand": {
        "id": ["B"],
        "ccdCodes": ["ATP"]
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

Multiple Seeds

Provide multiple seeds to generate independent structure samples:

{
  "name": "multi_seed",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    }
  ],
  "modelSeeds": [1, 2, 3, 4, 5],
  "dialect": "alphafold3",
  "version": 1
}

Alternatively, use the --num_seeds flag in run_alphafold.py to auto-generate N seeds from a single seed.

RNA-Protein Complex

RNA chains are supported with nhmmer-based MSA search against RNAcentral, Rfam, and NT databases:

{
  "name": "rna_protein",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    },
    {
      "rna": {
        "id": ["B"],
        "sequence": "GGGGACUGCGUUCGCGCUUUCCCC"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 3
}

DNA-Protein Complex

DNA chains pass through with empty MSA, matching AlphaFold 3's native behavior (AF3 does not perform MSA search for DNA):

{
  "name": "dna_protein",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    },
    {
      "dna": {
        "id": ["B", "C"],
        "sequence": "ACGTACGTACGT"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 3
}

Batch Processing

Place multiple JSON files in a single directory and pass it as --input_dir:

inputs/
  protein_1.json
  protein_2.json
  protein_3.json

Each file is loaded as a separate fold input. When --batch_size is set, protein chains are collected and deduplicated across the loaded inputs, then grouped into efficient batched MMseqs2-GPU searches. Multiple protein chains from a single JSON file can therefore be searched in the same MMseqs2 batch.