Files
AlphaPulldown/alphapulldown/scripts/parse_input.py
Dima 4d802be7d6 support both af2 and af3 data pipelines (#523)
* symmetrical refactoring to support both af2 and af3 data pipelines

* Clean tests

* Keep GPU tests in place

* Reverted accidentally deleted templates

* Add AlphaFold3 feature creation pipeline and per-chain input generation

- Implement `create_pipeline_af3` to construct the AlphaFold3 data pipeline with correct database and binary paths.
- Add `create_af3_individual_features` to generate AlphaFold3 input features for each chain in a FASTA, handling protein, RNA, and DNA sequences.
- Integrate new AF3 logic into the main entry point, dispatching to AF2 or AF3 as appropriate.
- Ensure output directory creation and error handling for missing dependencies or invalid sequences.

* Convert template dates to datetime for af3

* First check for nucleotides, then for amino-acids

* Skip existing features json if --skip_existing=true

* Check if DNA before RNA

* Bump 2.1.0

* Git ignore build/ dir
2025-07-16 12:30:18 +02:00

42 lines
1.1 KiB
Python

#!/usr/bin/env python
from absl import flags, app, logging
import json
from alphapulldown.utils.modelling_setup import parse_fold, create_custom_info
from alphapulldown.utils.create_combinations import process_files
import io
logging.set_verbosity(logging.INFO)
flags.DEFINE_list(
'input_list', None,
'Path to input file list.')
flags.DEFINE_list(
'features_directory', None,
'Path to computed monomer features.')
flags.DEFINE_string(
'protein_delimiter', '+',
'Delimiter for proteins.')
flags.DEFINE_string(
'output_prefix', None,
'Prefix for output JSON files.')
FLAGS = flags.FLAGS
def main(argv):
buffer = io.StringIO()
_ = process_files(
input_files=FLAGS.input_list,
output_path=buffer,
exclude_permutations = True
)
buffer.seek(0)
all_folds = buffer.readlines()
all_folds = [x.strip() for x in all_folds]
parsed = parse_fold(all_folds, FLAGS.features_directory, FLAGS.protein_delimiter)
data = create_custom_info(parsed)
with open(FLAGS.output_prefix + "data.json", 'w') as out_f:
json.dump(data, out_f, indent=1)
app.run(main)