170 Commits

Author SHA1 Message Date
Samuel Sledzieski
1bed6a048a Claude/expand test coverage (#91)
* Expand test coverage with comprehensive test suites

Add extensive test coverage for previously untested modules:

- test_utils.py: Comprehensive tests for utility functions (setup_logger, log, RBF,
  parse_device, load_hdf5_parallel, PairedDataset, collate_paired_sequences)

- test_glider.py: Complete test suite for graph-based link prediction module
  (get_dim, densify, compute_X_normalized, scoring functions, GLIDE algorithms)

- test_loading.py: Tests for parallel HDF5 data loading with LoadingPool,
  including edge cases, error handling, and integration tests

- test_language_model.py: Expanded from 2 to 13 test methods, adding coverage
  for lm_embed, embed_from_fasta with various edge cases and validations

These additions significantly improve test coverage for:
- dscript/utils.py (167 lines, previously untested)
- dscript/glider.py (346 lines, previously untested)
- dscript/loading.py (92 lines, previously untested)
- dscript/language_model.py (minimal coverage expanded)

Total new test methods: ~200+ assertions across 4 test modules

* Add comprehensive tests for command modules and worker functions

Create four new test modules to expand coverage of previously untested code:

1. test_extract_3di.py (19 test methods, ~370 lines)
   - Tests for 3Di sequence extraction from PDB/CIF files
   - Argument parsing, file filtering, FASTA output validation
   - Integration tests for full workflow
   - Covers dscript/commands/extract_3di.py (~58 lines)

2. test_par_writer.py (24 test methods, ~400 lines)
   - Tests for parallel prediction writer process
   - TSV output writing, threshold filtering, contact map storage
   - HDF5 contact map dataset handling
   - Progress tracking and data type validation
   - Covers dscript/commands/par_writer.py (~40 lines)

3. test_main.py (24 test methods, ~320 lines)
   - Tests for CLI entry point and argument parsing
   - CitationAction class testing
   - All subcommand registration and invocation
   - Version and help flag handling
   - Integration tests for command dispatch
   - Covers dscript/__main__.py (~87 lines, increasing from ~85% to ~95%)

4. test_load_worker.py (23 test methods, ~330 lines)
   - Direct unit tests for HDF5 loading worker function
   - Queue handling, data type conversion, memory sharing
   - Error handling for corrupted/missing files
   - Multi-dimensional array support
   - Covers dscript/load_worker.py (~25 lines, previously only indirect coverage)

Total additions:
- ~1,420 lines of new test code
- 90+ test methods with comprehensive assertions
- ~210 lines of source code now directly tested
- Addresses high-priority gaps identified in coverage analysis

These tests complement the existing suite and focus on command-line
interface components and parallel processing infrastructure.

* Fix linting issues and apply code formatting

- Remove unused variables flagged by ruff
- Apply ruff formatting to all test files
- Ensure all pre-commit hooks pass

Changes:
- test_loading.py: Remove unused 'f' variable
- test_main.py: Remove unused 'fake_out' and 'output' variables
- test_utils.py: Remove unused 'log_file' variable and tmp_path param
- Applied ruff formatting to maintain code style consistency

* Fix test_load_worker.py hanging issue in CI

Rewrote test_load_worker.py to prevent CI hangs that occurred when
tests called the blocking worker function directly. The worker function
_hdf5_load_partial_func runs in an infinite loop waiting on a queue,
which caused tests to hang indefinitely.

Changes:
- Created run_worker_with_timeout() helper that wraps worker execution
  in a daemon thread with configurable timeout (default 5 seconds)
- Modified all tests to use this helper and assert successful completion
- Changed queue operations from blocking get() to non-blocking get_nowait()
- Reduced test count from 23 to 16 focused tests
- Added documentation noting worker is primarily tested via LoadingPool

This should resolve the CI timeout issue where tests hung at 43% completion.

* Rewrite test_language_model.py to use mocks instead of real model

The original tests were calling the real language model which:
- Downloads/loads pretrained model weights (slow, can fail)
- Runs actual neural network inference (resource intensive)
- Causes test failures when model files aren't available

Changes:
- Rewrote unit tests to mock get_pretrained() function
- Mock model returns realistic tensor shapes but doesn't load weights
- Tests are now fast, reliable, and don't require model files
- Moved real model tests to TestLanguageModelIntegration class
- Marked integration tests with @pytest.mark.slow so they can be skipped
- Removed unnecessary loguru import that caused import errors
- Removed problematic setup.py install step from setup_class

This should fix the 4 failing tests reported by CI.

* fix failing tests

* Update .github/workflows/autorun-tests.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update .github/workflows/autorun-tests.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-16 10:24:04 -05:00
Daniel E. Schaffer
9e58dad665 ruff format 2025-09-08 23:37:36 +02:00
Daniel E. Schaffer
8fdb4d4b2e Update device argument to embed and predict_serial 2025-09-08 23:37:36 +02:00
Samuel Sledzieski
c21dee6059 Ruff check and format 2025-08-12 14:05:49 +02:00
Samuel Sledzieski
103eab70b2 update to use biotite for version 0.3.1 2025-08-12 14:05:49 +02:00
Samuel Sledzieski
c7682041e2 update foldseek commands to use biotite interface 2025-08-12 14:05:49 +02:00
Samuel Sledzieski
10587a4249 fix documentation 2025-08-12 06:10:15 -04:00
Samuel Sledzieski
c0a4c1a84b fix glider docstring 2025-08-12 11:53:32 +02:00
Samuel Sledzieski
92bff2b530 skip gpu tests if gpu not available 2025-07-22 11:27:35 -04:00
Samuel Sledzieski
349a4ae0ee one day I will remember to lint 2025-07-22 11:18:00 -04:00
Samuel Sledzieski
132291da55 added tests for bipartite prediction and fixed silent failure causing hanging 2025-07-22 11:10:42 -04:00
Samuel Sledzieski
e2a4792091 minor cleanup 2025-07-22 10:10:41 -04:00
Samuel Sledzieski
fa467edd40 formatting and cleaning device code 2025-07-22 10:00:20 -04:00
Daniel E. Schaffer
92f1582719 Extend CPU support to bipartite 2025-07-21 19:28:52 -04:00
Daniel E. Schaffer
f13926a718 Improve CPU mode 2025-07-21 18:58:51 -04:00
Samuel Sledzieski
0215215a1f I forgot to run the linter 2025-07-21 17:03:40 -04:00
Samuel Sledzieski
6b2cbac3d6 slight update to testing to not run gpu test on gh actions 2025-07-21 16:59:38 -04:00
Samuel Sledzieski
3fe16083f4 Merge 2025-07-21 16:58:16 -04:00
Samuel Sledzieski
9f21e01f0b add broader device support 2025-07-21 16:53:25 -04:00
Daniel E. Schaffer
854985b787 try/except for local model loading 2025-07-21 16:52:49 -04:00
Samuel Sledzieski
2cd60a892b edit to pass tests 2025-07-21 16:20:47 -04:00
Samuel Sledzieski
50f3d9750e ruff formatting and linting 2025-07-21 15:09:08 -04:00
Samuel Sledzieski
faad88d684 Merge branch 'main' into parallel 2025-07-21 14:52:13 -04:00
Samuel Sledzieski
e80af9d123 formatting 2025-07-21 14:44:37 -04:00
Samuel Sledzieski
f7073f5c16 fix security issues 2025-07-21 14:40:26 -04:00
Samuel Sledzieski
5b9e836fa7 update default model to load from huggingfce for predict or eval 2025-07-21 14:33:17 -04:00
Samuel Sledzieski
f4bfcd824f migrate to loguru while maintaining lecacy interface 2025-07-21 14:31:12 -04:00
Samuel Sledzieski
8cbf288b86 massively increase test coverage and continuous integration 2025-07-21 14:21:48 -04:00
Daniel E. Schaffer
a1f4af2e65 Refactor csv 2025-07-21 13:21:11 -04:00
Daniel E. Schaffer
ff72bc3fb8 Restore two lines (incorrectly deleted) 2025-07-21 13:14:36 -04:00
Daniel E. Schaffer
7ddbbea10c Correct default model; move file check to main 2025-07-21 13:13:33 -04:00
Daniel E. Schaffer
875ff4af82 Change loading process defaults 2025-07-21 12:29:58 -04:00
Daniel E. Schaffer
6ecae4ed8f Change loading process defaults 2025-07-21 12:25:19 -04:00
Daniel E. Schaffer
de7707d3f8 Remove log file from GPU process 2025-07-21 12:25:19 -04:00
Samuel Sledzieski
01125ef013 Update predict.py 2025-07-21 10:39:38 -04:00
Samuel Sledzieski
c1b88c909f Update evaluate.py 2025-07-21 10:38:35 -04:00
Daniel E. Schaffer
1f0c9a2bf8 Swap command names 2025-07-21 10:26:34 -04:00
Samuel Sledzieski
724c402b79 Update train.py 2025-07-21 10:24:21 -04:00
Daniel E. Schaffer
44fc7ab723 Remove old file 2025-07-21 10:16:42 -04:00
Daniel E. Schaffer
f3e4672e3c tweak help messages 2025-07-18 15:29:25 -04:00
Daniel E. Schaffer
3a272cb719 Fix negation bug 2025-06-11 13:36:50 -04:00
Daniel E. Schaffer
650f3b5a36 Add bipartite prediction command (with blocking and multi-GPU support); minor tweaks 2025-06-09 18:53:58 -04:00
Daniel E. Schaffer
9caca8c672 Remove predict_par as a seperate command 2025-06-09 14:41:57 -04:00
Daniel E. Schaffer
881ad4bcc2 Cleanup & tweaks 2025-06-09 14:23:45 -04:00
Daniel E. Schaffer
967adda9d4 Rework/expand blocked prediction to support ~all use cases (protein list/dense pair list/sparse pair list) 2025-04-30 20:21:12 -04:00
Daniel E. Schaffer
8f2d3286f0 explicitly clear worker-side references to shared tensors 2025-04-27 15:41:00 -04:00
Daniel E. Schaffer
cc023e5ff0 Enable multi-GPU blocked prediction (also, fix issue in foldseek loading) 2025-04-27 15:35:01 -04:00
Daniel E. Schaffer
58495dc437 Re-enable foldseek usage with blocked prediction 2025-04-27 14:56:30 -04:00
Daniel E. Schaffer
16bbf70acb Add waiting so only 3 blocks are logically needed at any time 2025-04-24 02:58:49 -04:00
Daniel E. Schaffer
a21dd6bfeb Add prediction mode with blocked embedding loading 2025-04-24 02:07:11 -04:00