-
Notifications
You must be signed in to change notification settings - Fork 484
[VL] Memory Leak possibly TableScan #9456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@zhouyuan any clue ? |
@nimesh1601 Can you try setting |
@zhztheplayer Thanks for your suggestion. I tried the given configuration, and it worked, didn't got any failures, but wouldn't it impact performance ? |
I also tried running the same application with the new logs you have added, but couldn't see them |
Got it, thank for trying.
Usually it depends. Perhaps you can have some test in person for your environment? We thought the memory leak issues related to the IO threads should already be fixed by |
I am also trying a few things out to fix this, and I will post the updates for the same. It will be great if you can tag me in the further discussion on this issue, and I will be happy to help in trying any fixes. |
Sure. The Velox PR I referred was wrong, I have updated inline. |
this should be the reason and the PR has no followup. @rui-mo do we? I remember there is a quick fix. |
Backend
VL (Velox)
Bug description
There seems to be a memory leak from Table Scan
Got this response from Velox team : Velox Issue
Sample query
WITH
cdl_summary AS (
SELECT
user_id,
CASE
WHEN <TRACKING_LABEL_COL> = 'store_front' THEN 'storefront'
ELSE <TRACKING_LABEL_COL>
END AS source,
FROM <SCHEMA_CDL>.<TABLE_CDL> cdl
WHERE TRUE
AND datestr BETWEEN :START_112D AND :END_DATE
AND name IN (<IMPRESSION_EVENTS>, <CLICK_EVENTS>, <ORDER_EVENTS>)
AND COALESCE(<SESSION_ID_COL>, user_id) IS NOT NULL
AND is_first_event = TRUE
AND user_id IS NOT NULL AND user_id <> ''
AND <FEED_CONTEXT_COL> IN ('home','vertical','allstores','all_stores')
GROUP BY 1,2
),
xlb_summary AS (
SELECT
user_id,
CASE
WHEN <TRACKING_LABEL_COL> = 'store_front' THEN 'storefront'
ELSE <TRACKING_LABEL_COL>
END AS source,
FROM <SCHEMA_XLB>.<TABLE_XLB> xlb
WHERE TRUE
GROUP BY 1,2
),
data_summary AS (
SELECT
COALESCE(cdl.user_id, xlb.user_id) AS user_id,
COALESCE(cdl.source, xlb.source) AS source,
COALESCE(cdl.impression_count_7d, 0)
+ COALESCE(xlb.impression_count_7d, 0) AS impression_count_7d,
FROM cdl_summary cdl
FULL OUTER JOIN xlb_summary xlb
ON cdl.user_id = xlb.user_id
AND cdl.source = xlb.source
),
summary_agg AS (
SELECT
user_id,
SUM(impression_count_7d) AS total_impression_7d,
FROM data_summary
GROUP BY 1
)
INSERT OVERWRITE TABLE <SCHEMA_TARGET>.<TABLE_TARGET>
PARTITION (datestr)
SELECT
CONCAT(d.user_id, '|', d.source) AS uuid,
d.user_id,
d.source,
d.impression_count_7d,
agg.total_impression_7d,
d.latest_interaction_time,
:END_DATE AS datestr
FROM data_summary d
JOIN summary_agg agg
ON d.user_id = agg.user_id
WHERE d.source IS NOT NULL;
Gluten version
No response
Spark version
None
Spark configurations
NA
System information
NA
Relevant logs
The text was updated successfully, but these errors were encountered: