36 Commits

Author SHA1 Message Date
Samuel Sledzieski
1bed6a048a Claude/expand test coverage (#91)
* Expand test coverage with comprehensive test suites

Add extensive test coverage for previously untested modules:

- test_utils.py: Comprehensive tests for utility functions (setup_logger, log, RBF,
  parse_device, load_hdf5_parallel, PairedDataset, collate_paired_sequences)

- test_glider.py: Complete test suite for graph-based link prediction module
  (get_dim, densify, compute_X_normalized, scoring functions, GLIDE algorithms)

- test_loading.py: Tests for parallel HDF5 data loading with LoadingPool,
  including edge cases, error handling, and integration tests

- test_language_model.py: Expanded from 2 to 13 test methods, adding coverage
  for lm_embed, embed_from_fasta with various edge cases and validations

These additions significantly improve test coverage for:
- dscript/utils.py (167 lines, previously untested)
- dscript/glider.py (346 lines, previously untested)
- dscript/loading.py (92 lines, previously untested)
- dscript/language_model.py (minimal coverage expanded)

Total new test methods: ~200+ assertions across 4 test modules

* Add comprehensive tests for command modules and worker functions

Create four new test modules to expand coverage of previously untested code:

1. test_extract_3di.py (19 test methods, ~370 lines)
   - Tests for 3Di sequence extraction from PDB/CIF files
   - Argument parsing, file filtering, FASTA output validation
   - Integration tests for full workflow
   - Covers dscript/commands/extract_3di.py (~58 lines)

2. test_par_writer.py (24 test methods, ~400 lines)
   - Tests for parallel prediction writer process
   - TSV output writing, threshold filtering, contact map storage
   - HDF5 contact map dataset handling
   - Progress tracking and data type validation
   - Covers dscript/commands/par_writer.py (~40 lines)

3. test_main.py (24 test methods, ~320 lines)
   - Tests for CLI entry point and argument parsing
   - CitationAction class testing
   - All subcommand registration and invocation
   - Version and help flag handling
   - Integration tests for command dispatch
   - Covers dscript/__main__.py (~87 lines, increasing from ~85% to ~95%)

4. test_load_worker.py (23 test methods, ~330 lines)
   - Direct unit tests for HDF5 loading worker function
   - Queue handling, data type conversion, memory sharing
   - Error handling for corrupted/missing files
   - Multi-dimensional array support
   - Covers dscript/load_worker.py (~25 lines, previously only indirect coverage)

Total additions:
- ~1,420 lines of new test code
- 90+ test methods with comprehensive assertions
- ~210 lines of source code now directly tested
- Addresses high-priority gaps identified in coverage analysis

These tests complement the existing suite and focus on command-line
interface components and parallel processing infrastructure.

* Fix linting issues and apply code formatting

- Remove unused variables flagged by ruff
- Apply ruff formatting to all test files
- Ensure all pre-commit hooks pass

Changes:
- test_loading.py: Remove unused 'f' variable
- test_main.py: Remove unused 'fake_out' and 'output' variables
- test_utils.py: Remove unused 'log_file' variable and tmp_path param
- Applied ruff formatting to maintain code style consistency

* Fix test_load_worker.py hanging issue in CI

Rewrote test_load_worker.py to prevent CI hangs that occurred when
tests called the blocking worker function directly. The worker function
_hdf5_load_partial_func runs in an infinite loop waiting on a queue,
which caused tests to hang indefinitely.

Changes:
- Created run_worker_with_timeout() helper that wraps worker execution
  in a daemon thread with configurable timeout (default 5 seconds)
- Modified all tests to use this helper and assert successful completion
- Changed queue operations from blocking get() to non-blocking get_nowait()
- Reduced test count from 23 to 16 focused tests
- Added documentation noting worker is primarily tested via LoadingPool

This should resolve the CI timeout issue where tests hung at 43% completion.

* Rewrite test_language_model.py to use mocks instead of real model

The original tests were calling the real language model which:
- Downloads/loads pretrained model weights (slow, can fail)
- Runs actual neural network inference (resource intensive)
- Causes test failures when model files aren't available

Changes:
- Rewrote unit tests to mock get_pretrained() function
- Mock model returns realistic tensor shapes but doesn't load weights
- Tests are now fast, reliable, and don't require model files
- Moved real model tests to TestLanguageModelIntegration class
- Marked integration tests with @pytest.mark.slow so they can be skipped
- Removed unnecessary loguru import that caused import errors
- Removed problematic setup.py install step from setup_class

This should fix the 4 failing tests reported by CI.

* fix failing tests

* Update .github/workflows/autorun-tests.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update .github/workflows/autorun-tests.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-16 10:24:04 -05:00
Samuel Sledzieski
c7682041e2 update foldseek commands to use biotite interface 2025-08-12 14:05:49 +02:00
Samuel Sledzieski
8cbf288b86 massively increase test coverage and continuous integration 2025-07-21 14:21:48 -04:00
Samuel Sledzieski
687f5b09a3 Update version, fix model cuda bug 2024-09-16 09:35:49 -04:00
Samuel Sledzieski
bb4ee75a66 Upload models to huggingface hub, update prediction script to take hf path 2024-09-13 11:02:23 -04:00
Samuel Sledzieski
b0ca7aeb33 add tt3d to pretrained 2023-09-28 16:00:07 -04:00
samsledje
ce46ba69b1 added foldseek prediction, untested 2023-07-24 08:34:48 -04:00
samsledje
620bd1f1e1 add extract 3di command 2023-05-09 10:14:43 -04:00
Kapil Devkota
0e9403eab7 Added optimized interaction.py 2023-05-08 01:32:04 -04:00
Kapil Devkota
3afad84144 Added fseek changes 2023-05-08 00:43:40 -04:00
Kapil Devkota
341dc0a219 Changed versioning 2023-03-30 11:07:12 -04:00
samsledje
b7834b6caa update version number 2022-08-18 11:40:29 -04:00
samsledje
2dcf240101 add topsy turvy pretrained model to api and docs 2022-08-18 11:32:07 -04:00
samsledje
128d360c03 update setup.py requirements 2022-06-28 11:31:59 -04:00
samsledje
e883489d03 update docs and citations 2022-06-23 10:19:16 -04:00
samsledje
aa855550b8 Merge master into topsy turvy 2022-04-29 09:58:13 -04:00
samsledje
3099b7d09a version update 2022-03-16 09:59:11 -04:00
samsledje
4e3ba637b7 version 0.1.8b0 2022-03-07 15:17:55 -05:00
samsledje
cac9271dd9 versioning 2022-03-07 14:53:38 -05:00
samsledje
f2429adeaa initial topsy turvy port 2022-03-07 13:42:10 -05:00
samsledje
b052dbe30e fix issue relating to pretrained models and updated model arguments/parameters 2022-02-11 11:55:55 -05:00
samsledje
4d3259d293 v0.1.8 release 2022-02-08 13:19:59 -05:00
samsledje
a82dd7d6b8 code style updated with black 2022-02-08 12:00:11 -05:00
samsledje
65ebd2381e training seems to be fixed, require a few more cleaning passes and tests before publishing 2022-02-07 12:56:32 -05:00
samsledje
45d693bc2d updated docstrings from legacy code 2021-12-07 12:18:38 -05:00
samsledje
df4e1bf729 0.1.7-dev7 -- training fixed but code extremely messy 2021-12-07 11:46:13 -05:00
samsledje
3fa7ea685f v0.1.6 Fixed bug in data augmentation 2021-09-06 13:18:27 -04:00
Samuel Sledzieski
7e773be9cb update changelog and init for 0.1.5 2021-06-23 20:45:37 -04:00
Samuel Sledzieski
e66a95d768 updated package level imports 2021-04-23 10:09:04 -04:00
Samuel Sledzieski
7034c880a1 Bug fix issue #7 - typo in ContactModule.forward() 2021-03-05 10:14:17 -05:00
Samuel Sledzieski
2bce0cdbb4 fixed wrong variable name in loading from sequence file 2021-02-03 10:26:00 -05:00
Samuel Sledzieski
dc04e373e3 put model in eval mode before making new predictions 2020-11-30 15:58:43 -05:00
Samuel Sledzieski
4ff927a06b updated versioning 2020-11-24 11:32:12 -05:00
Samuel Sledzieski
3cbebbfa89 fix argparser and add citation cmd 2020-11-09 17:51:38 -05:00
Samuel Sledzieski
1cfeb6f0a3 updated data loading in train and reformatted everything 2020-11-09 15:37:11 -05:00
Samuel Sledzieski
a97f232db2 restructured so dscript is a module and can be installed 2020-11-09 15:21:37 -05:00