diff --git a/docs/implementation.html b/docs/implementation.html index f679e0f0620..54598db41fd 100644 --- a/docs/implementation.html +++ b/docs/implementation.html @@ -219,7 +219,15 @@
- Data is deleted one log segment at a time. The log manager allows pluggable delete policies to choose which files are eligible for deletion. The current policy deletes any log with a modification time of more than N days ago, though a policy which retained the last N GB could also be useful. To avoid locking reads while still allowing deletes that modify the segment list we use a copy-on-write style segment list implementation that provides consistent views to allow a binary search to proceed on an immutable static snapshot view of the log segments while deletes are progressing. + Data is deleted one log segment at a time. The log manager applies two metrics to identify segments which are + eligible for deletion: time and size. For time-based policies, the record timestamps are considered, with the + largest timestamp in a segment file (order of records is not relevant) defining the retention time for the + entire segment. Size-based retention is disabled by default. When enabled the log manager keeps deleting the + oldest segment file until the overall size of the partition is within the configured limit again. If both + policies are enabled at the same time, a segment that is eligible for deletion due to either policy will be + deleted. To avoid locking reads while still allowing deletes that modify the segment list we use a copy-on-write + style segment list implementation that provides consistent views to allow a binary search to proceed on an + immutable static snapshot view of the log segments while deletes are progressing.