Flink support #8849
Replies: 3 comments 4 replies
-
Thank everyone putting efforts on Flink support. Glad to see Flink experts willing to join. For the implementation, my 2 cents so far is to extract a Java accelerator library out for our next framework support. I have been working on one https://github.com/velox4j/velox4j at which you may want to take a look. @shuai-xu Substrait haven't been helping much on the extensibility of Gluten. We have to do a bunch of customizations on it that makes Gluten difficult to integrate with other Susbtrait consumer libraries. Moreover, Velox and CH both have incompatible usages on same Substrait features, for example the My feeling is a layered design will be the better arch for such accelerators. We build a Java library that tightly binds to the native libraries through JNI, then translate framework's query plan into library's query plan in Java in one shot. Things will be made much clearer and maintenance will be much simpler with such architecture. |
Beta Was this translation helpful? Give feedback.
-
How much code we can reuse from Gluten? if we can't reuse it may be a better idea to create a subproject. @weiting-chen |
Beta Was this translation helpful? Give feedback.
-
Some Ideas:
Velox does not have an internal state. We can introduce RocksDB as the state, which is not too difficult. However, this could become a potential performance bottleneck, so careful consideration is necessary. Since RocksDB involves disk operations, it is essential to orchestrate IO as asynchronous batch reads.
This is one of the fundamental differences between stream and batch processing. Streams are updated, while batches are not. For example, a table from MySQL naturally has updates and deletes. Flink introduces a stream retract mechanism to address this issue, which involves carrying metadata on the data to indicate whether it is an update, delete, or insert. Clearly, Velox does not have this capability. Several modifications are needed:
Once both the operators and the data stream are modified to support rollback, the overall execution flow should be fine.
|
Beta Was this translation helpful? Give feedback.
-
There is a PoC PR of Flink support recently submitted: #8839 (author: @shuai-xu)
The background doc: https://docs.google.com/document/d/1VNMs1oR0c02kuQLFLQXZeTyk3CoPjrtAfiKGoLGxGzE/edit?usp=sharing (author: @weiting-chen )
The design doc: WIP
There are no existing issues / discussions tracking on Flink support. So let's discuss here.
Beta Was this translation helpful? Give feedback.
All reactions