Files
D-SCRIPT/dscript/tests/test_language_model.py
Samuel Sledzieski 1bed6a048a Claude/expand test coverage (#91)
* Expand test coverage with comprehensive test suites

Add extensive test coverage for previously untested modules:

- test_utils.py: Comprehensive tests for utility functions (setup_logger, log, RBF,
  parse_device, load_hdf5_parallel, PairedDataset, collate_paired_sequences)

- test_glider.py: Complete test suite for graph-based link prediction module
  (get_dim, densify, compute_X_normalized, scoring functions, GLIDE algorithms)

- test_loading.py: Tests for parallel HDF5 data loading with LoadingPool,
  including edge cases, error handling, and integration tests

- test_language_model.py: Expanded from 2 to 13 test methods, adding coverage
  for lm_embed, embed_from_fasta with various edge cases and validations

These additions significantly improve test coverage for:
- dscript/utils.py (167 lines, previously untested)
- dscript/glider.py (346 lines, previously untested)
- dscript/loading.py (92 lines, previously untested)
- dscript/language_model.py (minimal coverage expanded)

Total new test methods: ~200+ assertions across 4 test modules

* Add comprehensive tests for command modules and worker functions

Create four new test modules to expand coverage of previously untested code:

1. test_extract_3di.py (19 test methods, ~370 lines)
   - Tests for 3Di sequence extraction from PDB/CIF files
   - Argument parsing, file filtering, FASTA output validation
   - Integration tests for full workflow
   - Covers dscript/commands/extract_3di.py (~58 lines)

2. test_par_writer.py (24 test methods, ~400 lines)
   - Tests for parallel prediction writer process
   - TSV output writing, threshold filtering, contact map storage
   - HDF5 contact map dataset handling
   - Progress tracking and data type validation
   - Covers dscript/commands/par_writer.py (~40 lines)

3. test_main.py (24 test methods, ~320 lines)
   - Tests for CLI entry point and argument parsing
   - CitationAction class testing
   - All subcommand registration and invocation
   - Version and help flag handling
   - Integration tests for command dispatch
   - Covers dscript/__main__.py (~87 lines, increasing from ~85% to ~95%)

4. test_load_worker.py (23 test methods, ~330 lines)
   - Direct unit tests for HDF5 loading worker function
   - Queue handling, data type conversion, memory sharing
   - Error handling for corrupted/missing files
   - Multi-dimensional array support
   - Covers dscript/load_worker.py (~25 lines, previously only indirect coverage)

Total additions:
- ~1,420 lines of new test code
- 90+ test methods with comprehensive assertions
- ~210 lines of source code now directly tested
- Addresses high-priority gaps identified in coverage analysis

These tests complement the existing suite and focus on command-line
interface components and parallel processing infrastructure.

* Fix linting issues and apply code formatting

- Remove unused variables flagged by ruff
- Apply ruff formatting to all test files
- Ensure all pre-commit hooks pass

Changes:
- test_loading.py: Remove unused 'f' variable
- test_main.py: Remove unused 'fake_out' and 'output' variables
- test_utils.py: Remove unused 'log_file' variable and tmp_path param
- Applied ruff formatting to maintain code style consistency

* Fix test_load_worker.py hanging issue in CI

Rewrote test_load_worker.py to prevent CI hangs that occurred when
tests called the blocking worker function directly. The worker function
_hdf5_load_partial_func runs in an infinite loop waiting on a queue,
which caused tests to hang indefinitely.

Changes:
- Created run_worker_with_timeout() helper that wraps worker execution
  in a daemon thread with configurable timeout (default 5 seconds)
- Modified all tests to use this helper and assert successful completion
- Changed queue operations from blocking get() to non-blocking get_nowait()
- Reduced test count from 23 to 16 focused tests
- Added documentation noting worker is primarily tested via LoadingPool

This should resolve the CI timeout issue where tests hung at 43% completion.

* Rewrite test_language_model.py to use mocks instead of real model

The original tests were calling the real language model which:
- Downloads/loads pretrained model weights (slow, can fail)
- Runs actual neural network inference (resource intensive)
- Causes test failures when model files aren't available

Changes:
- Rewrote unit tests to mock get_pretrained() function
- Mock model returns realistic tensor shapes but doesn't load weights
- Tests are now fast, reliable, and don't require model files
- Moved real model tests to TestLanguageModelIntegration class
- Marked integration tests with @pytest.mark.slow so they can be skipped
- Removed unnecessary loguru import that caused import errors
- Removed problematic setup.py install step from setup_class

This should fix the 4 failing tests reported by CI.

* fix failing tests

* Update .github/workflows/autorun-tests.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update .github/workflows/autorun-tests.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-16 10:24:04 -05:00

243 lines
7.4 KiB
Python

"""
Tests for language model embedding functionality in dscript.language_model
"""
from unittest.mock import Mock, patch
import h5py
import pytest
import torch
from dscript.fasta import parse
from dscript.language_model import (
embed_from_fasta,
lm_embed,
)
class TestLanguageModelUnit:
"""Unit tests with mocked model"""
@pytest.fixture
def mock_model(self):
"""Create a mock model that behaves like the real one"""
model = Mock()
model.eval = Mock()
model.cuda = Mock(return_value=model)
model.cpu = Mock(return_value=model)
# Mock the proj layer
model.proj = Mock()
model.proj.weight = torch.randn(6165, 6165)
model.proj.bias = torch.zeros(6165)
# Mock transform to return realistic embeddings
def mock_transform(x):
batch_size = x.shape[0]
seq_len = x.shape[1]
# Return (batch, seq_len, embedding_dim=6165)
return torch.randn(batch_size, seq_len, 6165)
model.transform = Mock(side_effect=mock_transform)
return model
@patch("dscript.language_model.get_pretrained")
def test_lm_embed_shape(self, mock_get_pretrained, mock_model):
"""Test that lm_embed returns correct shape"""
mock_get_pretrained.return_value = mock_model
test_seq = "MKTAYIAKQRQISFVKSHFSRQ"
x = lm_embed(test_seq, use_cuda=False)
# Should be (batch=1, seq_len, embedding_dim=6165)
assert x.shape[0] == 1
assert x.shape[1] == len(test_seq)
assert x.shape[2] == 6165
@patch("dscript.language_model.get_pretrained")
def test_lm_embed_returns_tensor(self, mock_get_pretrained, mock_model):
"""Test that lm_embed returns a torch tensor"""
mock_get_pretrained.return_value = mock_model
test_seq = "MKTAYIAKQR"
x = lm_embed(test_seq, use_cuda=False)
assert isinstance(x, torch.Tensor)
@patch("dscript.language_model.get_pretrained")
def test_lm_embed_short_sequence(self, mock_get_pretrained, mock_model):
"""Test embedding a very short sequence"""
mock_get_pretrained.return_value = mock_model
short_seq = "MK"
x = lm_embed(short_seq, use_cuda=False)
assert x.shape[1] == 2
assert x.shape[2] == 6165
@patch("dscript.language_model.get_pretrained")
def test_lm_embed_single_amino_acid(self, mock_get_pretrained, mock_model):
"""Test embedding a single amino acid"""
mock_get_pretrained.return_value = mock_model
single_aa = "M"
x = lm_embed(single_aa, use_cuda=False)
assert x.shape[1] == 1
assert x.shape[2] == 6165
@patch("dscript.language_model.get_pretrained")
def test_embed_from_fasta_creates_h5(self, mock_get_pretrained, mock_model, tmp_path):
"""Test that embed_from_fasta creates HDF5 file"""
mock_get_pretrained.return_value = mock_model
output_path = tmp_path / "test_embed.h5"
embed_from_fasta(
"dscript/tests/test.fasta",
str(output_path),
device=-1, # Force CPU
verbose=False,
)
# Verify the output file was created
assert output_path.exists()
# Verify it's a valid HDF5 file
with h5py.File(output_path, "r") as f:
assert len(f.keys()) > 0
@patch("dscript.language_model.get_pretrained")
def test_embed_from_fasta_correct_names(
self, mock_get_pretrained, mock_model, tmp_path
):
"""Test that embed_from_fasta uses correct sequence names"""
mock_get_pretrained.return_value = mock_model
output_path = tmp_path / "test_embed.h5"
# Parse original sequences to get names
names, _ = parse("dscript/tests/test.fasta")
embed_from_fasta(
"dscript/tests/test.fasta",
str(output_path),
device=-1,
verbose=False,
)
# Verify all sequence names are in the output
with h5py.File(output_path, "r") as f:
for name in names:
assert name in f
@patch("dscript.language_model.get_pretrained")
def test_embed_from_fasta_skips_existing(
self, mock_get_pretrained, mock_model, tmp_path
):
"""Test that embed_from_fasta skips existing embeddings"""
mock_get_pretrained.return_value = mock_model
output_path = tmp_path / "test_embed.h5"
# First embedding
embed_from_fasta(
"dscript/tests/test.fasta",
str(output_path),
device=-1,
verbose=False,
)
# Get count of embeddings
with h5py.File(output_path, "r") as f:
count_before = len(f.keys())
# Second embedding (should skip existing)
embed_from_fasta(
"dscript/tests/test.fasta",
str(output_path),
device=-1,
verbose=False,
)
# Count should be the same (no duplicates)
with h5py.File(output_path, "r") as f:
count_after = len(f.keys())
assert count_before == count_after
@patch("dscript.language_model.get_pretrained")
def test_embed_from_fasta_cpu_device(self, mock_get_pretrained, mock_model, tmp_path):
"""Test embedding with explicit CPU device"""
mock_get_pretrained.return_value = mock_model
output_path = tmp_path / "test_embed_cpu.h5"
embed_from_fasta(
"dscript/tests/test.fasta",
str(output_path),
device=-1, # Force CPU
verbose=False,
)
assert output_path.exists()
@patch("dscript.language_model.get_pretrained")
@patch("dscript.language_model.log")
def test_embed_from_fasta_verbose_output(
self, mock_log, mock_get_pretrained, mock_model, tmp_path
):
"""Test that verbose mode produces log output"""
mock_get_pretrained.return_value = mock_model
output_path = tmp_path / "test_embed.h5"
embed_from_fasta(
"dscript/tests/test.fasta",
str(output_path),
device=-1,
verbose=True,
)
# Verbose mode should call log
assert mock_log.called
@pytest.mark.slow
class TestLanguageModelIntegration:
"""Integration tests that use the real model (marked as slow)"""
def test_lm_embed_real(self):
"""Test lm_embed with real model (slow)"""
# This test actually loads the model and runs inference
test_seq = "MKTAYIAK"
x = lm_embed(test_seq, use_cuda=False)
assert x.shape[0] == 1
assert x.shape[1] == len(test_seq)
assert x.shape[2] == 6165
assert isinstance(x, torch.Tensor)
def test_embed_from_fasta_real(self, tmp_path):
"""Test embed_from_fasta with real model (slow)"""
output_path = tmp_path / "test_embed_real.h5"
embed_from_fasta(
"dscript/tests/test.fasta",
str(output_path),
device=-1,
verbose=False,
)
assert output_path.exists()
# Verify HDF5 structure
names, sequences = parse("dscript/tests/test.fasta")
with h5py.File(output_path, "r") as f:
for name, seq in zip(names, sequences):
assert name in f
embedding = f[name][:]
assert embedding.shape[0] == 1
assert embedding.shape[1] == len(seq)
assert embedding.shape[2] == 6165