# Input JSON Format

AlphaFast uses the standard AlphaFold 3 JSON input format. Place one or more `.json` files in your `--input_dir`.

## Field Reference

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | Yes | Job name. Used for output directory naming. |
| `sequences` | array | Yes | Array of chain definitions (protein, ligand, etc.) |
| `modelSeeds` | array of int | Yes | Random seeds for inference. At least one seed required. |
| `dialect` | string | Yes | Must be `"alphafold3"` |
| `version` | int | Yes | Supported versions: `1`, `2`, `3`, or `4` |

### Optional Fields

| Field | Type | Description |
|-------|------|-------------|
| `bondedAtomPairs` | array | Custom bonded atom pair constraints |
| `userCCD` | string | Custom Chemical Component Dictionary (CCD) entries |

## Protein Chain

A single protein chain with chain ID "A":

```json
{
  "name": "my_protein",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    }
  ],
  "modelSeeds": [42],
  "dialect": "alphafold3",
  "version": 1
}
```

## Homomeric Complex

A homodimer where both chains A and B share the same sequence. List multiple chain IDs in the `id` array:

```json
{
  "name": "my_homodimer",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGY"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}
```

## Heteromeric Complex

Two different protein chains:

```json
{
  "name": "my_heterodimer",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    },
    {
      "protein": {
        "id": ["B"],
        "sequence": "MHSSIVLATVLFVAIASASKTRELCMKSLEHAKVGTSKEAKQDGIDLYKHMFE"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}
```

## Protein with Ligand

A protein chain with an ATP ligand, specified using its CCD code:

```json
{
  "name": "protein_atp",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    },
    {
      "ligand": {
        "id": ["B"],
        "ccdCodes": ["ATP"]
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}
```

## Multiple Seeds

Provide multiple seeds to generate independent structure samples:

```json
{
  "name": "multi_seed",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    }
  ],
  "modelSeeds": [1, 2, 3, 4, 5],
  "dialect": "alphafold3",
  "version": 1
}
```

Alternatively, use the `--num_seeds` flag in `run_alphafold.py` to auto-generate N seeds from a single seed.

## RNA-Protein Complex

RNA chains are supported with nhmmer-based MSA search against RNAcentral, Rfam, and NT databases:

```json
{
  "name": "rna_protein",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    },
    {
      "rna": {
        "id": ["B"],
        "sequence": "GGGGACUGCGUUCGCGCUUUCCCC"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 3
}
```

## DNA-Protein Complex

DNA chains pass through with empty MSA, matching AlphaFold 3's native behavior (AF3 does not perform MSA search for DNA):

```json
{
  "name": "dna_protein",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
      }
    },
    {
      "dna": {
        "id": ["B", "C"],
        "sequence": "ACGTACGTACGT"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 3
}
```

## Batch Processing

Place multiple JSON files in a single directory and pass it as `--input_dir`:

```
inputs/
  protein_1.json
  protein_2.json
  protein_3.json
```

Each file is loaded as a separate fold input. When `--batch_size` is set, protein chains are collected and deduplicated across the loaded inputs, then grouped into efficient batched MMseqs2-GPU searches. Multiple protein chains from a single JSON file can therefore be searched in the same MMseqs2 batch.