4.3 KiB
Input JSON Format
AlphaFast uses the standard AlphaFold 3 JSON input format. Place one or more .json files in your --input_dir.
Field Reference
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Job name. Used for output directory naming. |
sequences |
array | Yes | Array of chain definitions (protein, ligand, etc.) |
modelSeeds |
array of int | Yes | Random seeds for inference. At least one seed required. |
dialect |
string | Yes | Must be "alphafold3" |
version |
int | Yes | Supported versions: 1, 2, 3, or 4 |
Optional Fields
| Field | Type | Description |
|---|---|---|
bondedAtomPairs |
array | Custom bonded atom pair constraints |
userCCD |
string | Custom Chemical Component Dictionary (CCD) entries |
Protein Chain
A single protein chain with chain ID "A":
{
"name": "my_protein",
"sequences": [
{
"protein": {
"id": ["A"],
"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
}
}
],
"modelSeeds": [42],
"dialect": "alphafold3",
"version": 1
}
Homomeric Complex
A homodimer where both chains A and B share the same sequence. List multiple chain IDs in the id array:
{
"name": "my_homodimer",
"sequences": [
{
"protein": {
"id": ["A", "B"],
"sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGY"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}
Heteromeric Complex
Two different protein chains:
{
"name": "my_heterodimer",
"sequences": [
{
"protein": {
"id": ["A"],
"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
}
},
{
"protein": {
"id": ["B"],
"sequence": "MHSSIVLATVLFVAIASASKTRELCMKSLEHAKVGTSKEAKQDGIDLYKHMFE"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}
Protein with Ligand
A protein chain with an ATP ligand, specified using its CCD code:
{
"name": "protein_atp",
"sequences": [
{
"protein": {
"id": ["A"],
"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
}
},
{
"ligand": {
"id": ["B"],
"ccdCodes": ["ATP"]
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}
Multiple Seeds
Provide multiple seeds to generate independent structure samples:
{
"name": "multi_seed",
"sequences": [
{
"protein": {
"id": ["A"],
"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
}
}
],
"modelSeeds": [1, 2, 3, 4, 5],
"dialect": "alphafold3",
"version": 1
}
Alternatively, use the --num_seeds flag in run_alphafold.py to auto-generate N seeds from a single seed.
RNA-Protein Complex
RNA chains are supported with nhmmer-based MSA search against RNAcentral, Rfam, and NT databases:
{
"name": "rna_protein",
"sequences": [
{
"protein": {
"id": ["A"],
"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
}
},
{
"rna": {
"id": ["B"],
"sequence": "GGGGACUGCGUUCGCGCUUUCCCC"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 3
}
DNA-Protein Complex
DNA chains pass through with empty MSA, matching AlphaFold 3's native behavior (AF3 does not perform MSA search for DNA):
{
"name": "dna_protein",
"sequences": [
{
"protein": {
"id": ["A"],
"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
}
},
{
"dna": {
"id": ["B", "C"],
"sequence": "ACGTACGTACGT"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 3
}
Batch Processing
Place multiple JSON files in a single directory and pass it as --input_dir:
inputs/
protein_1.json
protein_2.json
protein_3.json
Each file is loaded as a separate fold input. When --batch_size is set, protein chains are collected and deduplicated across the loaded inputs, then grouped into efficient batched MMseqs2-GPU searches. Multiple protein chains from a single JSON file can therefore be searched in the same MMseqs2 batch.