Skip to content

Commit 86794c2

Browse files
authored
feat: Clickhouse offline store (feast-dev#4725)
* Clickhouse offline store - initial working version Signed-off-by: Tomasz Wrona <[email protected]> * Remove untested `pull_all_from_table_or_query` Signed-off-by: Tomasz Wrona <[email protected]> * Reorder functions Signed-off-by: Tomasz Wrona <[email protected]> * Remove commented line Signed-off-by: Tomasz Wrona <[email protected]> * Fix frozen mypy errors Signed-off-by: Tomasz Wrona <[email protected]> * mypy fixes; remove online source creator Signed-off-by: Tomasz Wrona <[email protected]> * Remove commented code Signed-off-by: Tomasz Wrona <[email protected]> * Added docs Signed-off-by: Tomasz Wrona <[email protected]> * Python 3.9 deps Signed-off-by: Tomasz Wrona <[email protected]> * Python 3.10 deps Signed-off-by: Tomasz Wrona <[email protected]> * Python 3.11 deps (updated) Signed-off-by: Tomasz Wrona <[email protected]> * Remove unused ClickhouseOnlineStoreConfig Signed-off-by: Tomasz Wrona <[email protected]> * Regenerate requirements.txt files Signed-off-by: Tomasz Wrona <[email protected]> * Lint & format fixes Signed-off-by: Tomasz Wrona <[email protected]> * Regenerate requirements.txt files Signed-off-by: Tomasz Wrona <[email protected]> * Add clickhouse to pyproject.toml Signed-off-by: Tomasz Wrona <[email protected]> * Fix dependencies Signed-off-by: Tomasz Wrona <[email protected]> * Simplify names Signed-off-by: Tomasz Wrona <[email protected]> * Skip problematic Clickhouse tests Signed-off-by: Tomasz Wrona <[email protected]> * format & lint Signed-off-by: Tomasz Wrona <[email protected]> * Post-merge `make lock-python-dependencies-all` Signed-off-by: Tomasz Wrona <[email protected]> * Pin torch to 2.2.2 Signed-off-by: Tomasz Wrona <[email protected]> --------- Signed-off-by: Tomasz Wrona <[email protected]>
1 parent fba66fe commit 86794c2

28 files changed

+1312
-111
lines changed

Makefile

+22-1
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,28 @@ test-python-universal-postgres-offline:
246246
not gcs_registry and \
247247
not s3_registry and \
248248
not test_snowflake and \
249-
not test_universal_types" \
249+
not test_spark" \
250+
sdk/python/tests
251+
252+
test-python-universal-clickhouse-offline:
253+
PYTHONPATH='.' \
254+
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.offline_stores.contrib.clickhouse_repo_configuration \
255+
PYTEST_PLUGINS=sdk.python.feast.infra.offline_stores.contrib.clickhouse_offline_store.tests \
256+
python -m pytest -v -n 8 --integration \
257+
-k "not test_historical_retrieval_with_validation and \
258+
not test_historical_features_persisting and \
259+
not test_universal_cli and \
260+
not test_go_feature_server and \
261+
not test_feature_logging and \
262+
not test_reorder_columns and \
263+
not test_logged_features_validation and \
264+
not test_lambda_materialization_consistency and \
265+
not test_offline_write and \
266+
not test_push_features_to_offline_store and \
267+
not gcs_registry and \
268+
not s3_registry and \
269+
not test_snowflake and \
270+
not test_spark" \
250271
sdk/python/tests
251272

252273
test-python-universal-postgres-online:

docs/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@
101101
* [PostgreSQL (contrib)](reference/offline-stores/postgres.md)
102102
* [Trino (contrib)](reference/offline-stores/trino.md)
103103
* [Azure Synapse + Azure SQL (contrib)](reference/offline-stores/mssql.md)
104+
* [Clickhouse (contrib)](reference/offline-stores/clickhouse.md)
104105
* [Remote Offline](reference/offline-stores/remote-offline-store.md)
105106
* [Online stores](reference/online-stores/README.md)
106107
* [Overview](reference/online-stores/overview.md)

docs/reference/data-sources/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,7 @@ Please see [Data Source](../../getting-started/concepts/data-ingestion.md) for a
5353
{% content-ref url="mssql.md" %}
5454
[mssql.md](mssql.md)
5555
{% endcontent-ref %}
56+
57+
{% content-ref url="clickhouse.md" %}
58+
[clickhouse.md](clickhouse.md)
59+
{% endcontent-ref %}
+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Clickhouse source (contrib)
2+
3+
## Description
4+
5+
Clickhouse data sources are Clickhouse tables or views.
6+
These can be specified either by a table reference or a SQL query.
7+
8+
## Disclaimer
9+
10+
The Clickhouse data source does not achieve full test coverage.
11+
Please do not assume complete stability.
12+
13+
## Examples
14+
15+
Defining a Clickhouse source:
16+
17+
```python
18+
from feast.infra.offline_stores.contrib.clickhouse_offline_store.clickhouse_source import (
19+
ClickhouseSource,
20+
)
21+
22+
driver_stats_source = ClickhouseSource(
23+
name="feast_driver_hourly_stats",
24+
query="SELECT * FROM feast_driver_hourly_stats",
25+
timestamp_field="event_timestamp",
26+
created_timestamp_column="created",
27+
)
28+
```
29+
30+
The full set of configuration options is available [here](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.contrib.clickhouse_offline_store.clickhouse_source.ClickhouseSource).
31+
32+
## Supported Types
33+
34+
Clickhouse data sources support all eight primitive types and their corresponding array types.
35+
The support for Clickhouse Decimal type is achieved by converting it to double.
36+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Clickhouse offline store (contrib)
2+
3+
## Description
4+
5+
The Clickhouse offline store provides support for reading [ClickhouseSource](../data-sources/clickhouse.md).
6+
* Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe. A Pandas dataframes will be uploaded to Clickhouse as a table (temporary table by default) in order to complete join operations.
7+
8+
## Disclaimer
9+
10+
The Clickhouse offline store does not achieve full test coverage.
11+
Please do not assume complete stability.
12+
13+
## Getting started
14+
In order to use this offline store, you'll need to run `pip install 'feast[clickhouse]'`.
15+
16+
## Example
17+
18+
{% code title="feature_store.yaml" %}
19+
```yaml
20+
project: my_project
21+
registry: data/registry.db
22+
provider: local
23+
offline_store:
24+
type: feast.infra.offline_stores.contrib.clickhouse_offline_store.clickhouse.ClickhouseOfflineStore
25+
host: DB_HOST
26+
port: DB_PORT
27+
database: DB_NAME
28+
user: DB_USERNAME
29+
password: DB_PASSWORD
30+
use_temporary_tables_for_entity_df: true
31+
online_store:
32+
path: data/online_store.db
33+
```
34+
{% endcode %}
35+
36+
Note that `use_temporary_tables_for_entity_df` is an optional parameter.
37+
The full set of configuration options is available in [ClickhouseOfflineStoreConfig](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.contrib.clickhouse_offline_store.clickhouse.ClickhouseOfflineStore).
38+
39+
## Functionality Matrix
40+
41+
The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
42+
Below is a matrix indicating which functionality is supported by the Clickhouse offline store.
43+
44+
| | Clickhouse |
45+
| :----------------------------------------------------------------- |:-----------|
46+
| `get_historical_features` (point-in-time correct join) | yes |
47+
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
48+
| `pull_all_from_table_or_query` (retrieve a saved dataset) | no |
49+
| `offline_write_batch` (persist dataframes to offline store) | no |
50+
| `write_logged_features` (persist logged features to offline store) | no |
51+
52+
Below is a matrix indicating which functionality is supported by `ClickhouseRetrievalJob`.
53+
54+
| | Clickhouse |
55+
| ----------------------------------------------------- |------------|
56+
| export to dataframe | yes |
57+
| export to arrow table | yes |
58+
| export to arrow batches | no |
59+
| export to SQL | yes |
60+
| export to data lake (S3, GCS, etc.) | yes |
61+
| export to data warehouse | yes |
62+
| export as Spark dataframe | no |
63+
| local execution of Python-based on-demand transforms | yes |
64+
| remote execution of Python-based on-demand transforms | no |
65+
| persist results in the offline store | yes |
66+
| preview the query plan before execution | yes |
67+
| read partitioned data | yes |
68+
69+
To compare this set of functionality against other offline stores, please see the full [functionality matrix](overview.md#functionality-matrix).

pyproject.toml

+3-2
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ azure = [
5555
"pymssql"
5656
]
5757
cassandra = ["cassandra-driver>=3.24.0,<4"]
58+
clickhouse = ["clickhouse-connect>=0.7.19"]
5859
couchbase = ["couchbase==4.3.2", "couchbase-columnar==1.0.0"]
5960
delta = ["deltalake"]
6061
docling = ["docling>=2.23.0"]
@@ -95,7 +96,7 @@ opentelemetry = ["prometheus_client", "psutil"]
9596
spark = ["pyspark>=3.0.0,<4"]
9697
trino = ["trino>=0.305.0,<0.400.0", "regex"]
9798
postgres = ["psycopg[binary,pool]>=3.0.0,<4"]
98-
pytorch = ["torch>=2.2.2", "torchvision>=0.17.2"]
99+
pytorch = ["torch==2.2.2", "torchvision>=0.17.2"]
99100
qdrant = ["qdrant-client>=1.12.0"]
100101
redis = [
101102
"redis>=4.2.2,<5",
@@ -150,7 +151,7 @@ ci = [
150151
"types-setuptools",
151152
"types-tabulate",
152153
"virtualenv<20.24.2",
153-
"feast[aws, azure, cassandra, couchbase, delta, docling, duckdb, elasticsearch, faiss, gcp, ge, go, grpcio, hazelcast, hbase, ibis, ikv, k8s, milvus, mssql, mysql, opentelemetry, spark, trino, postgres, pytorch, qdrant, redis, singlestore, snowflake, sqlite_vec]"
154+
"feast[aws, azure, cassandra, clickhouse, couchbase, delta, docling, duckdb, elasticsearch, faiss, gcp, ge, go, grpcio, hazelcast, hbase, ibis, ikv, k8s, milvus, mssql, mysql, opentelemetry, spark, trino, postgres, pytorch, qdrant, redis, singlestore, snowflake, sqlite_vec]"
154155
]
155156
nlp = ["feast[docling, milvus, pytorch]"]
156157
dev = ["feast[ci]"]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
feast.infra.offline\_stores.contrib.clickhouse\_offline\_store package
2+
======================================================================
3+
4+
Subpackages
5+
-----------
6+
7+
.. toctree::
8+
:maxdepth: 4
9+
10+
feast.infra.offline_stores.contrib.clickhouse_offline_store.tests
11+
12+
Submodules
13+
----------
14+
15+
feast.infra.offline\_stores.contrib.clickhouse\_offline\_store.clickhouse module
16+
--------------------------------------------------------------------------------
17+
18+
.. automodule:: feast.infra.offline_stores.contrib.clickhouse_offline_store.clickhouse
19+
:members:
20+
:undoc-members:
21+
:show-inheritance:
22+
23+
feast.infra.offline\_stores.contrib.clickhouse\_offline\_store.clickhouse\_source module
24+
----------------------------------------------------------------------------------------
25+
26+
.. automodule:: feast.infra.offline_stores.contrib.clickhouse_offline_store.clickhouse_source
27+
:members:
28+
:undoc-members:
29+
:show-inheritance:
30+
31+
Module contents
32+
---------------
33+
34+
.. automodule:: feast.infra.offline_stores.contrib.clickhouse_offline_store
35+
:members:
36+
:undoc-members:
37+
:show-inheritance:
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
feast.infra.offline\_stores.contrib.clickhouse\_offline\_store.tests package
2+
============================================================================
3+
4+
Submodules
5+
----------
6+
7+
feast.infra.offline\_stores.contrib.clickhouse\_offline\_store.tests.data\_source module
8+
----------------------------------------------------------------------------------------
9+
10+
.. automodule:: feast.infra.offline_stores.contrib.clickhouse_offline_store.tests.data_source
11+
:members:
12+
:undoc-members:
13+
:show-inheritance:
14+
15+
Module contents
16+
---------------
17+
18+
.. automodule:: feast.infra.offline_stores.contrib.clickhouse_offline_store.tests
19+
:members:
20+
:undoc-members:
21+
:show-inheritance:

sdk/python/docs/source/feast.infra.offline_stores.contrib.rst

+9
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Subpackages
99

1010
feast.infra.offline_stores.contrib.athena_offline_store
1111
feast.infra.offline_stores.contrib.couchbase_offline_store
12+
feast.infra.offline_stores.contrib.clickhouse_offline_store
1213
feast.infra.offline_stores.contrib.mssql_offline_store
1314
feast.infra.offline_stores.contrib.postgres_offline_store
1415
feast.infra.offline_stores.contrib.spark_offline_store
@@ -33,6 +34,14 @@ feast.infra.offline\_stores.contrib.couchbase\_columnar\_repo\_configuration mod
3334
:undoc-members:
3435
:show-inheritance:
3536

37+
feast.infra.offline\_stores.contrib.clickhouse\_repo\_configuration module
38+
--------------------------------------------------------------------------
39+
40+
.. automodule:: feast.infra.offline_stores.contrib.clickhouse_repo_configuration
41+
:members:
42+
:undoc-members:
43+
:show-inheritance:
44+
3645
feast.infra.offline\_stores.contrib.mssql\_repo\_configuration module
3746
---------------------------------------------------------------------
3847

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
feast.infra.utils.clickhouse package
2+
====================================
3+
4+
Submodules
5+
----------
6+
7+
feast.infra.utils.clickhouse.clickhouse\_config module
8+
------------------------------------------------------
9+
10+
.. automodule:: feast.infra.utils.clickhouse.clickhouse_config
11+
:members:
12+
:undoc-members:
13+
:show-inheritance:
14+
15+
feast.infra.utils.clickhouse.connection\_utils module
16+
-----------------------------------------------------
17+
18+
.. automodule:: feast.infra.utils.clickhouse.connection_utils
19+
:members:
20+
:undoc-members:
21+
:show-inheritance:
22+
23+
Module contents
24+
---------------
25+
26+
.. automodule:: feast.infra.utils.clickhouse
27+
:members:
28+
:undoc-members:
29+
:show-inheritance:

sdk/python/docs/source/feast.infra.utils.rst

+1
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Subpackages
88
:maxdepth: 4
99

1010
feast.infra.utils.couchbase
11+
feast.infra.utils.clickhouse
1112
feast.infra.utils.postgres
1213
feast.infra.utils.snowflake
1314

sdk/python/feast/infra/offline_stores/contrib/clickhouse_offline_store/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)