Remove timerange root search #5760

tontinton · 2025-04-20T20:10:44Z

Extract and remove time range from query ast.

To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.

rdettai

The change being pretty big, I couldn't go through all of it yet. Here are a few comments already.

quickwit/quickwit-search/src/root.rs

rdettai · 2025-04-22T10:31:53Z

quickwit/quickwit-search/src/extract_timestamp_range.rs

+pub(crate) fn extract_start_end_timestamp_from_ast(
+    query_ast: QueryAst,
+    timestamp_field: &str,
+    start_timestamp: &mut Option<i64>,
+    end_timestamp: &mut Option<i64>,
+) -> QueryAst {


This signature is confusing as it is both transforming an owned input and mutating references.

Hmm, do you maybe prefer:

pub(crate) fn extract_start_end_timestamp_from_ast( query_ast: QueryAst, timestamp_field: &str, start_timestamp: Option<i64>, end_timestamp: Option<i64>, ) -> (QueryAst, Option<i64>, Option<i64>) {

?

I think the gain from making both the boundary refinement and the extraction at once is marginal, and it would make this PR a lot easier to review (and the code easier to read) to keep refine_start_end_timestamp_from_ast() as is and add a separate method for the extraction.

The problem is with transform accepting a value and not borrow of QueryAst, meaning I would need to clone to keep the old signature of refine_start_end_timestamp_from_ast which accepts a &QueryAst, are you ok with this clone?

To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.

Does the same as RemoveTimestampRange, but even does a little more.

fulmicoton-dd · 2025-05-12T13:54:58Z

@trinity-1686a can you review?

trinity-1686a · 2025-05-13T15:14:30Z

before i review further, i have a major concern to express.
for mostly historic reasons, start/end_timestamps are only precise up to the second, but the ast can express sub-second precision, up to the nanosecond.
the previous code purposefully left the range query inside the ast for that reason (it had some handling of rounding already, but that's only to make sure the extracted timestamps are never more restrictive than the query is meant to be, leaving the exact precision to the ast)

as far as i can tell, it seems in the state this PR is, a query with sub-second precision may return results it shouldn't (for instance ts:[2025-05-13T12:00:00.123 TO *] could return doc with a ts of 2025-05-13T12:00:00.000), and not return a result it should (ts:{2025-05-13T12:00:00.123 TO *] would round up because the bound is excluding, not matching docs with a ts of 2025-05-13T12:00:00.999)

fulmicoton-dd · 2025-05-14T07:41:51Z

@trinity-1686a My memory is failing me. I remember you already added some optim around extracting timestamp from the AST. What is the current status of timestamp checking before and after this PR?
(e.g. what happens if we have a time range elastic search query, do we have time pruning and where does the range filtering happen. (post filtering? rangequery))

The secs/subsecs problem is a problem but we can probably work around it.

tontinton · 2025-05-14T10:28:29Z

before i review further, i have a major concern to express. for mostly historic reasons, start/end_timestamps are only precise up to the second, but the ast can express sub-second precision, up to the nanosecond. the previous code purposefully left the range query inside the ast for that reason (it had some handling of rounding already, but that's only to make sure the extracted timestamps are never more restrictive than the query is meant to be, leaving the exact precision to the ast)

as far as i can tell, it seems in the state this PR is, a query with sub-second precision may return results it shouldn't (for instance ts:[2025-05-13T12:00:00.123 TO *] could return doc with a ts of 2025-05-13T12:00:00.000), and not return a result it should (ts:{2025-05-13T12:00:00.123 TO *] would round up because the bound is excluding, not matching docs with a ts of 2025-05-13T12:00:00.999)

I understand now your concern, it also makes my realize that there's no reason to try to extract twice (once in the root and once in the leaf).

Maybe we should also fix the root to not convert to seconds on extraction, and work with Bounds<DateTime>?

trinity-1686a · 2025-05-14T10:35:19Z

before:

in root.rs, we have a QueryAstVisitor that is used to extract approximate (second precision) start/end timestamp. It does not mutate the AST. This is called rather early in query evaluation, and allow for time pruning.
in leaf.rs, we have a QueryAstTransformer, it removes all must timestamp ranges, and summarize them (along start/end_timestamps) into a single range. It check if that range is more restrictive than the split bounds, and drop the range entirely if it isn't. I don't expect people to put multiple timestamp ranges in their queries, but the use for this code is twofold: we don't naively add a range based on query parameters (which may have been inferred from the ast), so we save us from duplicating the timestamp ranges (saves some compute); and we remove bounds entirely, before warmup, when we know they are useless, saving both compute and io. (after that step, we do some query simplification while transforming into a tantivy QueryAst, which may or may not be able to do a better job because we removed some clauses, but this isn't specific to timestamps)
The resulting range query are eventually turned into tantivy FastFieldRangeQuery. There is no longer any collector post-filtering for timestamps

after: (please correct me @tontinton if i say anything wrong)

in root.rs, a QueryAstTransformer is used to extract timestamps ranges to the query params, and remove them from the ast (as is, causing a loss of precision). this allows for time pruning.
in leaf.rs, the same QueryAstTransformer is used, though it shoudn't find much in all likelihood it won't find anything. This could actually be simplified further by adding the bounds only if they are more restrictive than the split bounds (assuming start/end_timestamps are capable of full ns precision), without looking into the ast.
the range are eventually turned into FastFieldRangeQuery, as before

pros: would remove a bunch of nearly-duplicate code, and allow us to do only a single pass on the ast instead of two as currently done
@tontinton could you clarify this? it's not obvious to me what optimization you mean.

To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.

cons: as is, this doesn't always return the correct result.

path forward: to make this PR returns the correct result, we'd need start/end_timestamp to be ns precision. To that effect, every usage of start/end_timestamp would need to be checked. To not have to search manually (which would be error-prone both to write and to review), we could retire start/end_timestamp from SearchRequest, introduce start/end_timestamp_ns in their place, and add start/end_timestamp() helpers to get the second-precision value as before. The compiler should be able to find for us every use of the second-precision bounds

trinity-1686a · 2025-05-14T10:37:22Z

Maybe we should also fix the root to not convert to seconds on extraction, and work with Bounds?

i think Bounds<DateTime> would be a tad annoying because SearchRequest must be protobuf-serializable, but an Option<i64> that happens to contain ns is likely good enough

tontinton · 2025-05-14T11:04:38Z

i think Bounds<DateTime> would be a tad annoying because SearchRequest must be protobuf-serializable, but an Option<i64> that happens to contain ns is likely good enough

Cool I'll work on it once I have some time, hopefully this week / next week.

fulmicoton-dd · 2025-05-14T11:39:28Z

@trinity-1686a @tontinton
What would be the ideal solution according to you?

My take:

the start_timestamp end_timestamp would be removed from the protobuf SearchRequest object.
the quickwit API (not elastic) would integrate the bound from the rest api into the query ast
we would work on the ast as an optimization to bubble up/merge/remove time range from the tree whenever possible in the root
we would extract start/end timestamp on the ast for split pruning (in the root)
in leaf, individually for each split, we would run an extra optimization pass to apply split specific optimization.
e.g:
We would replace timerange node by an always match query when the split boundary are within the range, and run a simplification pass to remove those always match nodes. Maybe there is another optimization for splits that simply do not have this timestamp field.

trinity-1686a · 2025-05-14T12:21:25Z

i would like to understand what this is about, because it could change what is the ideal solution

To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.

i think a good solution should deduplicate the code, having both a QueryAstVisitor and a QueryAstTransformer doing almost the same thing is definitely not ideal.
i don't think visiting the ast multiple time is really expensive, so we could get rid of start/end_timestamp and have the query ast be the only source of trust, but that might interfere with the optimisation @tontinton has in mind

and run a simplification pass to remove those always match nodes.

this is something we already do on all request while turning them to tantivy Query. Also, that's a lot more error prone than it seems, there is a subtle bug with the .filter(|result| result != &Ok(QueryAst::MatchAll)) added in this PR, when a bool query has a single .must that gets removed, and it also has some .should.

tldr: depending on what that optimisation from the 1st message in the PR is, i think removing start/end_timestamp is probably the best solution (which implies the other points in the list)

fulmicoton-dd · 2025-05-14T12:51:37Z

@tontinton would that make sense?

tontinton · 2025-05-14T18:25:01Z

@fulmicoton-dd @trinity-1686a

The optimizations I talk about are stuff like is_metadata_count_request_with_ast checking that the query is a MatchAll query. I thought of this PR after making this PR.

this is something we already do on all request while turning them to tantivy Query. Also, that's a lot more error prone than it seems, there is a subtle bug with the .filter(|result| result != &Ok(QueryAst::MatchAll)) added in this PR, when a bool query has a single .must that gets removed, and it also has some .should.

True, this should be fixed.

@tontinton would that make sense?

To be honest, I'm not 100% following, you're saying keep the start and end timestamp in the AST, and let leafs optimize out when they are contained in the time range to convert to a MatchAll, which will allow stuff like fn optimize( in leaf.rs to be optimized even with a time range, something like what I did here.

But what about the optimization PR I've linked to get the count from metastore (first link)? That code is in the root still, would that code also convert the query to MatchAll?

fulmicoton · 2025-05-15T09:50:23Z

@trinity-1686a I leave it to you to review and judge when and what should be merge, and merge the PRs: #5760 #5759 #5758

rdettai reviewed Apr 22, 2025

View reviewed changes

tontinton requested a review from rdettai April 29, 2025 17:45

tontinton force-pushed the remove-timerange-root-search branch 2 times, most recently from a8afa5d to 28e5e76 Compare April 29, 2025 18:51

tontinton added 2 commits April 30, 2025 15:17

Extract and remove time range from query ast

5fdcbfe

To allow for the optimization stage to run if the resulting ast after extracting the time range is a simple query.

Reuse new ExtractTimestampRange in leaf search

d10c70c

Does the same as RemoveTimestampRange, but even does a little more.

tontinton force-pushed the remove-timerange-root-search branch from 28e5e76 to d10c70c Compare April 30, 2025 12:20

fulmicoton-dd requested a review from trinity-1686a May 12, 2025 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove timerange root search #5760

Remove timerange root search #5760

tontinton commented Apr 20, 2025

rdettai left a comment

rdettai Apr 22, 2025

tontinton Apr 22, 2025

rdettai Apr 23, 2025

tontinton Apr 29, 2025

tontinton Apr 30, 2025

fulmicoton-dd commented May 12, 2025

trinity-1686a commented May 13, 2025

fulmicoton-dd commented May 14, 2025

tontinton commented May 14, 2025

trinity-1686a commented May 14, 2025 •

edited

Loading

trinity-1686a commented May 14, 2025 •

edited

Loading

tontinton commented May 14, 2025 •

edited

Loading

fulmicoton-dd commented May 14, 2025 •

edited

Loading

trinity-1686a commented May 14, 2025

fulmicoton-dd commented May 14, 2025

tontinton commented May 14, 2025 •

edited

Loading

fulmicoton commented May 15, 2025

Remove timerange root search #5760

Are you sure you want to change the base?

Remove timerange root search #5760

Conversation

tontinton commented Apr 20, 2025

rdettai left a comment

Choose a reason for hiding this comment

rdettai Apr 22, 2025

Choose a reason for hiding this comment

tontinton Apr 22, 2025

Choose a reason for hiding this comment

rdettai Apr 23, 2025

Choose a reason for hiding this comment

tontinton Apr 29, 2025

Choose a reason for hiding this comment

tontinton Apr 30, 2025

Choose a reason for hiding this comment

fulmicoton-dd commented May 12, 2025

trinity-1686a commented May 13, 2025

fulmicoton-dd commented May 14, 2025

tontinton commented May 14, 2025

trinity-1686a commented May 14, 2025 • edited Loading

trinity-1686a commented May 14, 2025 • edited Loading

tontinton commented May 14, 2025 • edited Loading

fulmicoton-dd commented May 14, 2025 • edited Loading

trinity-1686a commented May 14, 2025

fulmicoton-dd commented May 14, 2025

tontinton commented May 14, 2025 • edited Loading

fulmicoton commented May 15, 2025

trinity-1686a commented May 14, 2025 •

edited

Loading

trinity-1686a commented May 14, 2025 •

edited

Loading

tontinton commented May 14, 2025 •

edited

Loading

fulmicoton-dd commented May 14, 2025 •

edited

Loading

tontinton commented May 14, 2025 •

edited

Loading