-
Notifications
You must be signed in to change notification settings - Fork 453
Remove timerange root search #5760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Remove timerange root search #5760
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change being pretty big, I couldn't go through all of it yet. Here are a few comments already.
pub(crate) fn extract_start_end_timestamp_from_ast( | ||
query_ast: QueryAst, | ||
timestamp_field: &str, | ||
start_timestamp: &mut Option<i64>, | ||
end_timestamp: &mut Option<i64>, | ||
) -> QueryAst { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This signature is confusing as it is both transforming an owned input and mutating references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, do you maybe prefer:
pub(crate) fn extract_start_end_timestamp_from_ast(
query_ast: QueryAst,
timestamp_field: &str,
start_timestamp: Option<i64>,
end_timestamp: Option<i64>,
) -> (QueryAst, Option<i64>, Option<i64>) {
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the gain from making both the boundary refinement and the extraction at once is marginal, and it would make this PR a lot easier to review (and the code easier to read) to keep refine_start_end_timestamp_from_ast()
as is and add a separate method for the extraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is with transform
accepting a value and not borrow of QueryAst
, meaning I would need to clone
to keep the old signature of refine_start_end_timestamp_from_ast
which accepts a &QueryAst
, are you ok with this clone
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
a8afa5d
to
28e5e76
Compare
To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.
Does the same as RemoveTimestampRange, but even does a little more.
28e5e76
to
d10c70c
Compare
@trinity-1686a can you review? |
before i review further, i have a major concern to express. as far as i can tell, it seems in the state this PR is, a query with sub-second precision may return results it shouldn't (for instance |
@trinity-1686a My memory is failing me. I remember you already added some optim around extracting timestamp from the AST. What is the current status of timestamp checking before and after this PR? The secs/subsecs problem is a problem but we can probably work around it. |
I understand now your concern, it also makes my realize that there's no reason to try to extract twice (once in the root and once in the leaf). Maybe we should also fix the root to not convert to seconds on extraction, and work with |
before:
after: (please correct me @tontinton if i say anything wrong)
pros: would remove a bunch of nearly-duplicate code, and allow us to do only a single pass on the ast instead of two as currently done
cons: as is, this doesn't always return the correct result. path forward: to make this PR returns the correct result, we'd need start/end_timestamp to be ns precision. To that effect, every usage of start/end_timestamp would need to be checked. To not have to search manually (which would be error-prone both to write and to review), we could retire start/end_timestamp from SearchRequest, introduce start/end_timestamp_ns in their place, and add start/end_timestamp() helpers to get the second-precision value as before. The compiler should be able to find for us every use of the second-precision bounds |
i think |
Cool I'll work on it once I have some time, hopefully this week / next week. |
@trinity-1686a @tontinton My take:
|
i would like to understand what this is about, because it could change what is the ideal solution
i think a good solution should deduplicate the code, having both a QueryAstVisitor and a QueryAstTransformer doing almost the same thing is definitely not ideal.
this is something we already do on all request while turning them to tantivy Query. Also, that's a lot more error prone than it seems, there is a subtle bug with the tldr: depending on what that optimisation from the 1st message in the PR is, i think removing start/end_timestamp is probably the best solution (which implies the other points in the list) |
@tontinton would that make sense? |
The optimizations I talk about are stuff like
True, this should be fixed.
To be honest, I'm not 100% following, you're saying keep the start and end timestamp in the AST, and let leafs optimize out when they are contained in the time range to convert to a But what about the optimization PR I've linked to get the count from metastore (first link)? That code is in the root still, would that code also convert the query to |
@trinity-1686a I leave it to you to review and judge when and what should be merge, and merge the PRs: #5760 #5759 #5758 |
Extract and remove time range from query ast.
To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.