Skip to content

Remove timerange root search #5760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tontinton
Copy link
Contributor

Extract and remove time range from query ast.

To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.

Copy link
Collaborator

@rdettai rdettai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change being pretty big, I couldn't go through all of it yet. Here are a few comments already.

Comment on lines +14 to +33
pub(crate) fn extract_start_end_timestamp_from_ast(
query_ast: QueryAst,
timestamp_field: &str,
start_timestamp: &mut Option<i64>,
end_timestamp: &mut Option<i64>,
) -> QueryAst {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature is confusing as it is both transforming an owned input and mutating references.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do you maybe prefer:

pub(crate) fn extract_start_end_timestamp_from_ast(
    query_ast: QueryAst,
    timestamp_field: &str,
    start_timestamp: Option<i64>,
    end_timestamp: Option<i64>,
) -> (QueryAst, Option<i64>, Option<i64>) {

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the gain from making both the boundary refinement and the extraction at once is marginal, and it would make this PR a lot easier to review (and the code easier to read) to keep refine_start_end_timestamp_from_ast() as is and add a separate method for the extraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is with transform accepting a value and not borrow of QueryAst, meaning I would need to clone to keep the old signature of refine_start_end_timestamp_from_ast which accepts a &QueryAst, are you ok with this clone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@tontinton tontinton requested a review from rdettai April 29, 2025 17:45
@tontinton tontinton force-pushed the remove-timerange-root-search branch 2 times, most recently from a8afa5d to 28e5e76 Compare April 29, 2025 18:51
To allow for the optimization stage to run if the resulting ast after
extracting the time range is a simple query.
Does the same as RemoveTimestampRange, but even does a little more.
@tontinton tontinton force-pushed the remove-timerange-root-search branch from 28e5e76 to d10c70c Compare April 30, 2025 12:20
@fulmicoton-dd
Copy link
Collaborator

@trinity-1686a can you review?

@trinity-1686a
Copy link
Contributor

before i review further, i have a major concern to express.
for mostly historic reasons, start/end_timestamps are only precise up to the second, but the ast can express sub-second precision, up to the nanosecond.
the previous code purposefully left the range query inside the ast for that reason (it had some handling of rounding already, but that's only to make sure the extracted timestamps are never more restrictive than the query is meant to be, leaving the exact precision to the ast)

as far as i can tell, it seems in the state this PR is, a query with sub-second precision may return results it shouldn't (for instance ts:[2025-05-13T12:00:00.123 TO *] could return doc with a ts of 2025-05-13T12:00:00.000), and not return a result it should (ts:{2025-05-13T12:00:00.123 TO *] would round up because the bound is excluding, not matching docs with a ts of 2025-05-13T12:00:00.999)

@fulmicoton-dd
Copy link
Collaborator

@trinity-1686a My memory is failing me. I remember you already added some optim around extracting timestamp from the AST. What is the current status of timestamp checking before and after this PR?
(e.g. what happens if we have a time range elastic search query, do we have time pruning and where does the range filtering happen. (post filtering? rangequery))

The secs/subsecs problem is a problem but we can probably work around it.

@tontinton
Copy link
Contributor Author

before i review further, i have a major concern to express. for mostly historic reasons, start/end_timestamps are only precise up to the second, but the ast can express sub-second precision, up to the nanosecond. the previous code purposefully left the range query inside the ast for that reason (it had some handling of rounding already, but that's only to make sure the extracted timestamps are never more restrictive than the query is meant to be, leaving the exact precision to the ast)

as far as i can tell, it seems in the state this PR is, a query with sub-second precision may return results it shouldn't (for instance ts:[2025-05-13T12:00:00.123 TO *] could return doc with a ts of 2025-05-13T12:00:00.000), and not return a result it should (ts:{2025-05-13T12:00:00.123 TO *] would round up because the bound is excluding, not matching docs with a ts of 2025-05-13T12:00:00.999)

I understand now your concern, it also makes my realize that there's no reason to try to extract twice (once in the root and once in the leaf).

Maybe we should also fix the root to not convert to seconds on extraction, and work with Bounds<DateTime>?

@trinity-1686a
Copy link
Contributor

trinity-1686a commented May 14, 2025

before:

  • in root.rs, we have a QueryAstVisitor that is used to extract approximate (second precision) start/end timestamp. It does not mutate the AST. This is called rather early in query evaluation, and allow for time pruning.
  • in leaf.rs, we have a QueryAstTransformer, it removes all must timestamp ranges, and summarize them (along start/end_timestamps) into a single range. It check if that range is more restrictive than the split bounds, and drop the range entirely if it isn't. I don't expect people to put multiple timestamp ranges in their queries, but the use for this code is twofold: we don't naively add a range based on query parameters (which may have been inferred from the ast), so we save us from duplicating the timestamp ranges (saves some compute); and we remove bounds entirely, before warmup, when we know they are useless, saving both compute and io. (after that step, we do some query simplification while transforming into a tantivy QueryAst, which may or may not be able to do a better job because we removed some clauses, but this isn't specific to timestamps)
  • The resulting range query are eventually turned into tantivy FastFieldRangeQuery. There is no longer any collector post-filtering for timestamps

after: (please correct me @tontinton if i say anything wrong)

  • in root.rs, a QueryAstTransformer is used to extract timestamps ranges to the query params, and remove them from the ast (as is, causing a loss of precision). this allows for time pruning.
  • in leaf.rs, the same QueryAstTransformer is used, though it shoudn't find much in all likelihood it won't find anything. This could actually be simplified further by adding the bounds only if they are more restrictive than the split bounds (assuming start/end_timestamps are capable of full ns precision), without looking into the ast.
  • the range are eventually turned into FastFieldRangeQuery, as before

pros: would remove a bunch of nearly-duplicate code, and allow us to do only a single pass on the ast instead of two as currently done
@tontinton could you clarify this? it's not obvious to me what optimization you mean.

To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.

cons: as is, this doesn't always return the correct result.

path forward: to make this PR returns the correct result, we'd need start/end_timestamp to be ns precision. To that effect, every usage of start/end_timestamp would need to be checked. To not have to search manually (which would be error-prone both to write and to review), we could retire start/end_timestamp from SearchRequest, introduce start/end_timestamp_ns in their place, and add start/end_timestamp() helpers to get the second-precision value as before. The compiler should be able to find for us every use of the second-precision bounds

@trinity-1686a
Copy link
Contributor

trinity-1686a commented May 14, 2025

Maybe we should also fix the root to not convert to seconds on extraction, and work with Bounds?

i think Bounds<DateTime> would be a tad annoying because SearchRequest must be protobuf-serializable, but an Option<i64> that happens to contain ns is likely good enough

@tontinton
Copy link
Contributor Author

tontinton commented May 14, 2025

i think Bounds<DateTime> would be a tad annoying because SearchRequest must be protobuf-serializable, but an Option<i64> that happens to contain ns is likely good enough

Cool I'll work on it once I have some time, hopefully this week / next week.

@fulmicoton-dd
Copy link
Collaborator

fulmicoton-dd commented May 14, 2025

@trinity-1686a @tontinton
What would be the ideal solution according to you?

My take:

  1. the start_timestamp end_timestamp would be removed from the protobuf SearchRequest object.
  2. the quickwit API (not elastic) would integrate the bound from the rest api into the query ast
  3. we would work on the ast as an optimization to bubble up/merge/remove time range from the tree whenever possible in the root
  4. we would extract start/end timestamp on the ast for split pruning (in the root)
  5. in leaf, individually for each split, we would run an extra optimization pass to apply split specific optimization.
    e.g:
    We would replace timerange node by an always match query when the split boundary are within the range, and run a simplification pass to remove those always match nodes. Maybe there is another optimization for splits that simply do not have this timestamp field.

@trinity-1686a
Copy link
Contributor

i would like to understand what this is about, because it could change what is the ideal solution

To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.

i think a good solution should deduplicate the code, having both a QueryAstVisitor and a QueryAstTransformer doing almost the same thing is definitely not ideal.
i don't think visiting the ast multiple time is really expensive, so we could get rid of start/end_timestamp and have the query ast be the only source of trust, but that might interfere with the optimisation @tontinton has in mind

and run a simplification pass to remove those always match nodes.

this is something we already do on all request while turning them to tantivy Query. Also, that's a lot more error prone than it seems, there is a subtle bug with the .filter(|result| result != &Ok(QueryAst::MatchAll)) added in this PR, when a bool query has a single .must that gets removed, and it also has some .should.

tldr: depending on what that optimisation from the 1st message in the PR is, i think removing start/end_timestamp is probably the best solution (which implies the other points in the list)

@fulmicoton-dd
Copy link
Collaborator

@tontinton would that make sense?

@tontinton
Copy link
Contributor Author

tontinton commented May 14, 2025

@fulmicoton-dd @trinity-1686a

The optimizations I talk about are stuff like is_metadata_count_request_with_ast checking that the query is a MatchAll query. I thought of this PR after making this PR.

this is something we already do on all request while turning them to tantivy Query. Also, that's a lot more error prone than it seems, there is a subtle bug with the .filter(|result| result != &Ok(QueryAst::MatchAll)) added in this PR, when a bool query has a single .must that gets removed, and it also has some .should.

True, this should be fixed.

@tontinton would that make sense?

To be honest, I'm not 100% following, you're saying keep the start and end timestamp in the AST, and let leafs optimize out when they are contained in the time range to convert to a MatchAll, which will allow stuff like fn optimize( in leaf.rs to be optimized even with a time range, something like what I did here.

But what about the optimization PR I've linked to get the count from metastore (first link)? That code is in the root still, would that code also convert the query to MatchAll?

@fulmicoton
Copy link
Collaborator

@trinity-1686a I leave it to you to review and judge when and what should be merge, and merge the PRs: #5760 #5759 #5758

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants