D-SCRIPT

mirror of https://github.com/samsledje/D-SCRIPT.git synced 2026-06-04 15:04:24 +08:00

Author	SHA1	Message	Date
Samuel Sledzieski	1bed6a048a	Claude/expand test coverage (#91 ) * Expand test coverage with comprehensive test suites Add extensive test coverage for previously untested modules: - test_utils.py: Comprehensive tests for utility functions (setup_logger, log, RBF, parse_device, load_hdf5_parallel, PairedDataset, collate_paired_sequences) - test_glider.py: Complete test suite for graph-based link prediction module (get_dim, densify, compute_X_normalized, scoring functions, GLIDE algorithms) - test_loading.py: Tests for parallel HDF5 data loading with LoadingPool, including edge cases, error handling, and integration tests - test_language_model.py: Expanded from 2 to 13 test methods, adding coverage for lm_embed, embed_from_fasta with various edge cases and validations These additions significantly improve test coverage for: - dscript/utils.py (167 lines, previously untested) - dscript/glider.py (346 lines, previously untested) - dscript/loading.py (92 lines, previously untested) - dscript/language_model.py (minimal coverage expanded) Total new test methods: ~200+ assertions across 4 test modules * Add comprehensive tests for command modules and worker functions Create four new test modules to expand coverage of previously untested code: 1. test_extract_3di.py (19 test methods, ~370 lines) - Tests for 3Di sequence extraction from PDB/CIF files - Argument parsing, file filtering, FASTA output validation - Integration tests for full workflow - Covers dscript/commands/extract_3di.py (~58 lines) 2. test_par_writer.py (24 test methods, ~400 lines) - Tests for parallel prediction writer process - TSV output writing, threshold filtering, contact map storage - HDF5 contact map dataset handling - Progress tracking and data type validation - Covers dscript/commands/par_writer.py (~40 lines) 3. test_main.py (24 test methods, ~320 lines) - Tests for CLI entry point and argument parsing - CitationAction class testing - All subcommand registration and invocation - Version and help flag handling - Integration tests for command dispatch - Covers dscript/__main__.py (~87 lines, increasing from ~85% to ~95%) 4. test_load_worker.py (23 test methods, ~330 lines) - Direct unit tests for HDF5 loading worker function - Queue handling, data type conversion, memory sharing - Error handling for corrupted/missing files - Multi-dimensional array support - Covers dscript/load_worker.py (~25 lines, previously only indirect coverage) Total additions: - ~1,420 lines of new test code - 90+ test methods with comprehensive assertions - ~210 lines of source code now directly tested - Addresses high-priority gaps identified in coverage analysis These tests complement the existing suite and focus on command-line interface components and parallel processing infrastructure. * Fix linting issues and apply code formatting - Remove unused variables flagged by ruff - Apply ruff formatting to all test files - Ensure all pre-commit hooks pass Changes: - test_loading.py: Remove unused 'f' variable - test_main.py: Remove unused 'fake_out' and 'output' variables - test_utils.py: Remove unused 'log_file' variable and tmp_path param - Applied ruff formatting to maintain code style consistency * Fix test_load_worker.py hanging issue in CI Rewrote test_load_worker.py to prevent CI hangs that occurred when tests called the blocking worker function directly. The worker function _hdf5_load_partial_func runs in an infinite loop waiting on a queue, which caused tests to hang indefinitely. Changes: - Created run_worker_with_timeout() helper that wraps worker execution in a daemon thread with configurable timeout (default 5 seconds) - Modified all tests to use this helper and assert successful completion - Changed queue operations from blocking get() to non-blocking get_nowait() - Reduced test count from 23 to 16 focused tests - Added documentation noting worker is primarily tested via LoadingPool This should resolve the CI timeout issue where tests hung at 43% completion. * Rewrite test_language_model.py to use mocks instead of real model The original tests were calling the real language model which: - Downloads/loads pretrained model weights (slow, can fail) - Runs actual neural network inference (resource intensive) - Causes test failures when model files aren't available Changes: - Rewrote unit tests to mock get_pretrained() function - Mock model returns realistic tensor shapes but doesn't load weights - Tests are now fast, reliable, and don't require model files - Moved real model tests to TestLanguageModelIntegration class - Marked integration tests with @pytest.mark.slow so they can be skipped - Removed unnecessary loguru import that caused import errors - Removed problematic setup.py install step from setup_class This should fix the 4 failing tests reported by CI. * fix failing tests * Update .github/workflows/autorun-tests.yml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update .github/workflows/autorun-tests.yml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-16 10:24:04 -05:00
Daniel E. Schaffer	9e58dad665	ruff format	2025-09-08 23:37:36 +02:00
Daniel E. Schaffer	8fdb4d4b2e	Update device argument to embed and predict_serial	2025-09-08 23:37:36 +02:00
Samuel Sledzieski	c21dee6059	Ruff check and format	2025-08-12 14:05:49 +02:00
Samuel Sledzieski	103eab70b2	update to use biotite for version 0.3.1	2025-08-12 14:05:49 +02:00
Samuel Sledzieski	c7682041e2	update foldseek commands to use biotite interface	2025-08-12 14:05:49 +02:00
Samuel Sledzieski	10587a4249	fix documentation	2025-08-12 06:10:15 -04:00
Samuel Sledzieski	c0a4c1a84b	fix glider docstring	2025-08-12 11:53:32 +02:00
Samuel Sledzieski	92bff2b530	skip gpu tests if gpu not available	2025-07-22 11:27:35 -04:00
Samuel Sledzieski	349a4ae0ee	one day I will remember to lint	2025-07-22 11:18:00 -04:00
Samuel Sledzieski	132291da55	added tests for bipartite prediction and fixed silent failure causing hanging	2025-07-22 11:10:42 -04:00
Samuel Sledzieski	e2a4792091	minor cleanup	2025-07-22 10:10:41 -04:00
Samuel Sledzieski	fa467edd40	formatting and cleaning device code	2025-07-22 10:00:20 -04:00
Daniel E. Schaffer	92f1582719	Extend CPU support to bipartite	2025-07-21 19:28:52 -04:00
Daniel E. Schaffer	f13926a718	Improve CPU mode	2025-07-21 18:58:51 -04:00
Samuel Sledzieski	0215215a1f	I forgot to run the linter	2025-07-21 17:03:40 -04:00
Samuel Sledzieski	6b2cbac3d6	slight update to testing to not run gpu test on gh actions	2025-07-21 16:59:38 -04:00
Samuel Sledzieski	3fe16083f4	Merge	2025-07-21 16:58:16 -04:00
Samuel Sledzieski	9f21e01f0b	add broader device support	2025-07-21 16:53:25 -04:00
Daniel E. Schaffer	854985b787	try/except for local model loading	2025-07-21 16:52:49 -04:00
Samuel Sledzieski	2cd60a892b	edit to pass tests	2025-07-21 16:20:47 -04:00
Samuel Sledzieski	50f3d9750e	ruff formatting and linting	2025-07-21 15:09:08 -04:00
Samuel Sledzieski	faad88d684	Merge branch 'main' into parallel	2025-07-21 14:52:13 -04:00
Samuel Sledzieski	e80af9d123	formatting	2025-07-21 14:44:37 -04:00
Samuel Sledzieski	f7073f5c16	fix security issues	2025-07-21 14:40:26 -04:00
Samuel Sledzieski	5b9e836fa7	update default model to load from huggingfce for predict or eval	2025-07-21 14:33:17 -04:00
Samuel Sledzieski	f4bfcd824f	migrate to loguru while maintaining lecacy interface	2025-07-21 14:31:12 -04:00
Samuel Sledzieski	8cbf288b86	massively increase test coverage and continuous integration	2025-07-21 14:21:48 -04:00
Daniel E. Schaffer	a1f4af2e65	Refactor csv	2025-07-21 13:21:11 -04:00
Daniel E. Schaffer	ff72bc3fb8	Restore two lines (incorrectly deleted)	2025-07-21 13:14:36 -04:00
Daniel E. Schaffer	7ddbbea10c	Correct default model; move file check to main	2025-07-21 13:13:33 -04:00
Daniel E. Schaffer	875ff4af82	Change loading process defaults	2025-07-21 12:29:58 -04:00
Daniel E. Schaffer	6ecae4ed8f	Change loading process defaults	2025-07-21 12:25:19 -04:00
Daniel E. Schaffer	de7707d3f8	Remove log file from GPU process	2025-07-21 12:25:19 -04:00
Samuel Sledzieski	01125ef013	Update predict.py	2025-07-21 10:39:38 -04:00
Samuel Sledzieski	c1b88c909f	Update evaluate.py	2025-07-21 10:38:35 -04:00
Daniel E. Schaffer	1f0c9a2bf8	Swap command names	2025-07-21 10:26:34 -04:00
Samuel Sledzieski	724c402b79	Update train.py	2025-07-21 10:24:21 -04:00
Daniel E. Schaffer	44fc7ab723	Remove old file	2025-07-21 10:16:42 -04:00
Daniel E. Schaffer	f3e4672e3c	tweak help messages	2025-07-18 15:29:25 -04:00
Daniel E. Schaffer	3a272cb719	Fix negation bug	2025-06-11 13:36:50 -04:00
Daniel E. Schaffer	650f3b5a36	Add bipartite prediction command (with blocking and multi-GPU support); minor tweaks	2025-06-09 18:53:58 -04:00
Daniel E. Schaffer	9caca8c672	Remove predict_par as a seperate command	2025-06-09 14:41:57 -04:00
Daniel E. Schaffer	881ad4bcc2	Cleanup & tweaks	2025-06-09 14:23:45 -04:00
Daniel E. Schaffer	967adda9d4	Rework/expand blocked prediction to support ~all use cases (protein list/dense pair list/sparse pair list)	2025-04-30 20:21:12 -04:00
Daniel E. Schaffer	8f2d3286f0	explicitly clear worker-side references to shared tensors	2025-04-27 15:41:00 -04:00
Daniel E. Schaffer	cc023e5ff0	Enable multi-GPU blocked prediction (also, fix issue in foldseek loading)	2025-04-27 15:35:01 -04:00
Daniel E. Schaffer	58495dc437	Re-enable foldseek usage with blocked prediction	2025-04-27 14:56:30 -04:00
Daniel E. Schaffer	16bbf70acb	Add waiting so only 3 blocks are logically needed at any time	2025-04-24 02:58:49 -04:00
Daniel E. Schaffer	a21dd6bfeb	Add prediction mode with blocked embedding loading	2025-04-24 02:07:11 -04:00

1 2 3 4

170 Commits