mirror of
https://github.com/KosinskiLab/AlphaPulldown.git
synced 2026-06-04 14:14:24 +08:00
* Harden MMseqs species ID resolution fallback * Reorganize tests for CPU coverage CI * New * Fix function coverage checker def-line false positives * Expand unit coverage for helper and backend manager utilities * New. * New. * Expand unit coverage for template and post-processing helpers * Expand unit coverage for objects.py edge cases * Publish HTML coverage reports via GitHub Pages * Add CPU unit coverage for AlphaFold3 backend helpers * Reorganize tests and expand backend coverage * Reset shared test flags between cases * Expand AF3 prepare_input unit coverage * Cover AF3 and truemultimer feature creation * Test AF3 multimer MSA translation paths * Cover AF3 duplicate-residue multimer fallback * Cover AF2 resume and postprocess edge paths * Cover AF3 template mmCIF preparation * Test small script entry points * Expand workflow and ModelCIF test coverage * Add backend extras and install guide * Clarify AF3 backend installation path * Stabilize cluster GPU test runners * Document AF3 CMake SQLite hints * Simplify backend installation guide * Align AF3 install with working cluster env * Backfill typing dataclass_transform for AF2 * Pin TensorFlow for cluster installs * Fallback AF2 relax when CUDA OpenMM is unavailable * Raise AF3 default minimum bucket size * Simplify backend cluster installation guide * Fix AF3 wrapper JSON output isolation * Fix AF3 JSON wrapper outputs and MMseqs ID parsing * Fix CI entrypoint stub and Python 3.8 typing * Document release readiness test gates
45 lines
1.1 KiB
Python
Executable File
45 lines
1.1 KiB
Python
Executable File
#!/usr/bin/env python3
|
|
import sys
|
|
from itertools import groupby
|
|
import re
|
|
|
|
"""
|
|
Rename Uniprot names in FASTA file to uniprot IDs
|
|
(split by | and take second element)
|
|
"""
|
|
|
|
|
|
def fasta_iter(fh):
|
|
"""Return iterator over FASTA file with multiple sequences.
|
|
|
|
Modified from Brent Pedersen
|
|
Correct Way To Parse A Fasta File In Python
|
|
given a fasta file. yield tuples of header, sequence
|
|
|
|
:param fh: File Handle to the FASTA file
|
|
|
|
:return: 2-element tuple with header and sequence strings
|
|
"""
|
|
|
|
# ditch the boolean (x[0]) and just keep the header or sequence since
|
|
# we know they alternate.
|
|
faiter = (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
|
|
|
|
for header in faiter:
|
|
# drop the ">"
|
|
headerStr = header.__next__()[1:].strip()
|
|
|
|
# join all sequence lines to one.
|
|
seq = "".join(s.strip() for s in faiter.__next__())
|
|
|
|
yield (headerStr, seq)
|
|
|
|
out_lines = []
|
|
|
|
with open(sys.argv[1]) as f:
|
|
for headerStr, seq in fasta_iter(f):
|
|
items = re.split(r"[ |]", headerStr)
|
|
out_lines.append(f'>{items[1]}')
|
|
out_lines.append(seq)
|
|
print("\n".join(out_lines))
|