-
Notifications
You must be signed in to change notification settings - Fork 52
kvs: call content.flush before checkpoint #6240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
kvs: call content.flush before checkpoint #6240
Conversation
e1f7562
to
b019755
Compare
src/modules/kvs/kvs.c
Outdated
if (!(f = kvs_checkpoint_commit (h, NULL, rootref, rootseq, 0, 0)) | ||
|| flux_rpc_get (f, NULL) < 0) | ||
/* first must ensure all content is flushed */ | ||
if (!(f1 = flux_rpc (h, "content.flush", NULL, 0, 0)) | ||
|| flux_rpc_get (f1, NULL) < 0) { | ||
/* fallthrough to kvs_checkpoint_commit(), ENOSYS may be due | ||
* to no backing store, but checkpoint can still be done to | ||
* content cache. | ||
*/ | ||
if (errno != ENOSYS) | ||
goto error; | ||
} | ||
|
||
if (!(f2 = kvs_checkpoint_commit (h, NULL, rootref, rootseq, 0, 0)) | ||
|| flux_rpc_get (f2, NULL) < 0) | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why fall through to the checkpoint commit? If there is no backing store then wouldn't it also fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this admittedly surprised me, but you can checkpoint to the content cache without it going through to the backing store.
i.e. if there is no backing store, the checkpoint can succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya know, thinking about this a bit .... do we want that behavior? Do we want checkpoints to work with the content cache and w/o a backing store. It seems like a nuanced corner case. I'm not entirely sure why it was initially supported that way. This would be for a different issue I think, as this PR solves a specific content.flush
before "checkpoint" corner case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a hook in the content cache for the checkpoint, why don't we do the cache flush there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I've forgotten so many details here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. supporting checkpoint
w/o a backing module simply may have been a fallout from supporting the none
backing module. #4492
perhaps checkpoint specifically should not be supported by the content cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it allows the kvs module to be reloaded and not lose all the data when there is no backing store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it allows the kvs module to be reloaded and not lose all the data when there is no backing store?
I do think that's the point. But that is sort of a nuanced use case, possibly only useful in testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a hook in the content cache for the checkpoint, why don't we do the cache flush there?
That's not a bad idea. Lemme look into that, although it'd be for a different PR b/c FLUX_KVS_SYNC
does two calls and that should probably be collapsed down into one as well.
b019755
to
76e9d9f
Compare
making WIP, feel #6251 should come first now |
76e9d9f
to
cb88b17
Compare
re-pushed, PR was built on top of #6255 and re-worked logically |
cb88b17
to
50be094
Compare
e8cb848
to
6d023a5
Compare
Problem: There is no coverage to ensure FLUX_KVS_SYNC fails when there is no longer space on disk. Add coverage to t0090-content-enospc.t.
Problem: There is no coverage to ensure FLUX_KVS_SYNC does not work if there is no backing store. Add coverage in t1010-kvs-commit-sync.t.
Problem: When the KVS module is unloaded, a checkpoint of the root reference is attempted. However, a content.flush is not done beforehand. This could result in an invalid checkpoint reference as data is not guaranteed to be flushed to the backing store. Solution: Call content.flush before checkpointing. Fixes flux-framework#6237
6d023a5
to
5cee656
Compare
Now that #6255 is in, removing WIP from this. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6240 +/- ##
==========================================
- Coverage 83.83% 83.80% -0.04%
==========================================
Files 535 535
Lines 89313 89318 +5
==========================================
- Hits 74880 74855 -25
- Misses 14433 14463 +30
🚀 New features to boost your workflow:
|
Problem: When the KVS module is unloaded, a checkpoint of the root
reference is attempted. However, a content.flush is not done
beforehand. This could result in an invalid checkpoint reference
as data is not guaranteed to be flushed to the backing store.
Solution: Call content.flush before checkpointing.
Fixes #6237
I threw in a few new tests for some extra coverage of FLUX_KVS_SYNC too.