Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CreateIndex Logic to Avoid Index Overwriting and Conflicts #17274

Open
drmingdrmer opened this issue Jan 14, 2025 · 5 comments
Open

Fix CreateIndex Logic to Avoid Index Overwriting and Conflicts #17274

drmingdrmer opened this issue Jan 14, 2025 · 5 comments

Comments

@drmingdrmer
Copy link
Member

In the current CreateIndex implementation, the process only requires an index name to create an index for a table, and the IndexMeta stores a tableId. However, an index should be uniquely associated with a specific table. This implies that the index’s identity should include both the tableId and the IndexName, instead of relying solely on the IndexName.

Problems with the Current Design:
1. Index Overwriting: When creating an index with the same name for another table, it may overwrite an existing index unintentionally.
2. Data Loss: Due to identifier conflicts, the index data of another table might be deleted.

Proposed Solution:

The correct CreateIndex logic should:
• Retrieve the tableId of the target table during the creation process.
• Include the tableId as part of the IndexKey. For instance:
• IndexKey = {tableId}_{IndexName}

This adjustment will ensure that each index is uniquely tied to its respective table, preventing overwriting and conflicts across different tables.

@drmingdrmer
Copy link
Member Author

drmingdrmer commented Jan 14, 2025

does this make sense? @SkyFan2002 @sundy-li
The migration might be challenging, as we need to maintain backward compatibility with the currently stored data.

@b41sh
Copy link
Member

b41sh commented Jan 14, 2025

In the original design, the aggregate index's SQL could contain join statements, so a single aggregate index could correspond to multiple tables. In this case IndexKey may be IndexKey = {tableId1}_{tableId2}_..._{IndexName}

@drmingdrmer
Copy link
Member Author

So, what does IndexMeta.table_id represent for a aggregate index?
https://github.com/datafuselabs/databend/blob/88fb9e3e1e20d57cf26571fa72feeb9861b5fb2a/src/meta/app/src/schema/index.rs#L56

@b41sh
Copy link
Member

b41sh commented Jan 14, 2025

So, what does IndexMeta.table_id represent for a aggregate index? https://github.com/datafuselabs/databend/blob/88fb9e3e1e20d57cf26571fa72feeb9861b5fb2a/src/meta/app/src/schema/index.rs#L56

Since we have not yet implemented join for the aggregate index, a `table_id' is sufficient. The design is not very rigorous, and perhaps we can take this opportunity to redesign it.

@drmingdrmer
Copy link
Member Author

Since we have not yet implemented join for the aggregate index, a `table_id' is sufficient. The design is not very rigorous, and perhaps we can take this opportunity to redesign it.

Aye, we could use this opportunity to refine the design and make it more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants