Skip to content

[ENH]: perform collection hard deletes from garbage collector #4605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: feat-gc-hard-delete-collection-sysdb
Choose a base branch
from

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented May 22, 2025

Description of changes

This updates the garbage collector to transition soft deleted collections to hard deleted when eligible.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

I updated the proptest with a new transition for deleting collections.

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

n/a

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 5b21248 to 5b6a52b Compare May 22, 2025 17:29
@codetheweb codetheweb changed the base branch from feat-validate-file-paths-during-gc to feat-gc-hard-delete-collection-sysdb May 22, 2025 17:29
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from f545524 to 822e31d Compare May 22, 2025 18:06
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 5b6a52b to 65e13b1 Compare May 22, 2025 18:06
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 822e31d to afe5084 Compare May 22, 2025 18:13
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 65e13b1 to 9ab64f5 Compare May 22, 2025 18:13
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from afe5084 to 9aed3e4 Compare May 22, 2025 20:11
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch 2 times, most recently from 8aff535 to d78d710 Compare May 22, 2025 21:54
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes in this file provide a little more debugging info + a new defensive check

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 9aed3e4 to ca845a3 Compare May 22, 2025 22:23
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch 4 times, most recently from 720af86 to c815632 Compare May 22, 2025 23:11
@codetheweb codetheweb changed the title [ENH]: move collection hard deletes to garbage collector [ENH]: perform collection hard deletes from garbage collector May 22, 2025
@codetheweb codetheweb marked this pull request as ready for review May 22, 2025 23:13
@codetheweb codetheweb requested a review from sanketkedia May 22, 2025 23:13
Copy link
Contributor

propel-code-bot bot commented May 22, 2025

Garbage Collector Now Performs and Finalizes Collection Hard Deletes

This PR reworks the garbage collection pipeline to transition eligible soft-deleted collections to hard-deletion from within the garbage collector, rather than relying on side effects elsewhere. The change spans the garbage collector's orchestrator, its operators, system database abstraction, test suites, and API error types, introducing a new batch interface for querying soft delete status, wiring through relevant status checks, and implementing the full hard delete logic when conditions are met (e.g., no live forks). Defensive checks and additional test coverage for fork scenarios are added to ensure correctness.

Key Changes:
• Garbage collector orchestrator (rust/garbage_collector/src/garbage_collector_orchestrator_v2.rs) identifies soft-deleted collections and, after pruning versions/files, finalizes deletion by invoking sysdb hard delete.
• Introduced batch_get_collection_soft_delete_status interface and error type in sysdb and types to efficiently check collection states.
• sysdb now exposes finish_collection_deletion to finalize/hard delete collections as needed, implemented for GrpcSysDb and TestSysDb.
• Garbage collection test harness and reference state machine reworked to explicitly model collection status transitions (Alive -> SoftDeleted -> Deleted); proptests extended to exercise these states.
• Operators (compute_versions_to_delete_from_graph.rs) and orchestrator now propagate soft-deleted collection information throughout the deletion pipeline, ensuring all versions are marked for deletion.
• Additional and improved proptests and defensive graph checks added, particularly to handle forked collection trees and ensure graph invariants.
• API/types extended for soft delete batch status error propagation and handling.

Affected Areas:
• Garbage collector orchestrator logic
• Garbage collector operators (version deletion logic)
• SysDb abstraction (Rust and test backends, Grpc implementation)
• Test harness and property-based (proptest) tests
• API/types error and status handling

Potential Impact:

Functionality: Enables fully-automated, stateful transition from soft-deletion to true ('hard') deletion of collections only when all fork-children are deleted. Reduces possibility for orphaned data and greatly clarifies deletion flow.

Performance: Batch status queries for soft delete state should scale well, avoiding N calls. Deletion up to O(number of collections) hard deletes may marginally increase load during GC tasks.

Security: No negative impact; possibly positive in ensuring prompt, correct, final deletion for user data.

Scalability: Batching and explicit transition handling should enable robust handling even in large forked-collection graphs.

Review Focus:
• Correctness of forked collection deletion logic (ensuring no live children pre-hardelete).
• SysDb API compatibility across backends.
• Atomicity and idempotency of hard delete transitions (especially on crash/retry).
• Proptest/test coverage for edge GC/delete cases.
• Backward compatibility for collection state transitions and error error propagation.

Testing Needed

• Run the full test suite, especially all proptests in garbage_collector/tests/ to validate complex fork/GC/delete transitions.
• Test end-to-end deletion cycle (create, fork, delete, GC) in integration/staging with both normal and forked collections.
• Check sysdb Grpc and Test implementations for hard delete propagation.
• Exercise batch soft delete query API for regressions.

Code Quality Assessment

rust/garbage_collector/tests/proptest_helpers/garbage_collector_under_test.rs: Adapts to the new logic; validation and check_invariant logic improved to cover all status transitions.

rust/garbage_collector/src/garbage_collector_orchestrator_v2.rs: Clear organization, correct async/await discipline, explicit error propagation, effective defensive checks.

rust/garbage_collector/src/operators/compute_versions_to_delete_from_graph.rs: Improved logic for status filtering; code is thoroughly tested via added and updated property-based tests.

rust/sysdb/src/sysdb.rs: Well-structured, new API introduces minimal disruption, maintainable extension.

rust/garbage_collector/tests/proptest_helpers/garbage_collector_reference.rs: Complex but clear; new status model brings explicitness.

Best Practices

Error Handling:
• Consistent error typing and propagation
• Use of defensive programming (graph component check)

Test Coverage:
• Extensive use of property-based/proptest and new integration tests

Modularity:
• Changes localized per-responsibility (operators, orchestrator, sysdb, types)
API boundary expansion performed cleanly

Potential Issues

• Edge case: If sysdb or GC crashes mid-hard-delete, double-deletion should be idempotent but should be validated.
• Compatibility: TestSysDb and SQLite sysdb backends have unimplemented stubs for new calls - should not be used in production.
• Potential race if forked collections are quickly created/deleted in parallel; needs attention in prod load.

This summary was automatically generated by @propel-code-bot

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from ca845a3 to ccc9b82 Compare May 22, 2025 23:24
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from c815632 to d56d085 Compare May 22, 2025 23:24
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from ccc9b82 to 4dcc3a4 Compare May 23, 2025 00:01
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from d56d085 to ab68c6d Compare May 23, 2025 00:02
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 4dcc3a4 to f2adff1 Compare May 23, 2025 00:49
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from ab68c6d to 3c756a0 Compare May 23, 2025 00:49
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from f2adff1 to 660013f Compare May 23, 2025 01:27
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 3c756a0 to 7e8bfb4 Compare May 23, 2025 01:28
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 660013f to b10b4b0 Compare May 23, 2025 02:08
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 7e8bfb4 to 59c1d1c Compare May 23, 2025 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant