-
Notifications
You must be signed in to change notification settings - Fork 329
clear min and max on every ExportView #1182
base: master
Are you sure you want to change the base?
Conversation
stats/view/worker.go
Outdated
@@ -233,6 +233,7 @@ func (w *worker) reportView(v *viewInternal, now time.Time) { | |||
e.ExportView(viewData) | |||
} | |||
exportersMu.Unlock() | |||
v.clearRows() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will affect sum/count and bucket count in Distribution. It will also affect count aggregation which is expected to be cumulative.
I understand the problem of min and max that you are trying to solve. Reseting just min and max might be the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia Thanks for the review :)
So I don't think it's just the min/max. I'm not sure about the sum/count in Distribution just yet, but for the Count-Aggregation, it should probably also be reset and let the Exporter be responsible for the cumulativeness side of things.
The reason I say that is because if the server restarts for whatever reason, then the count is completely gone. Furthermore, if the Count is displayed on a timeseries graph, then a lot of these values will be falsely duplicated.
For example, imagine a server receives 10 requests per second..after 10 seconds, the graph will say there are 100 requests in the last second (which is wrong, there were only 10) and if the serve restarts you will see a sharp drop from 100 back to 10.
If we collect the "count per interval", then that should solve the issue.
Would also love to know more about which data should never be reset, even if the process restarts and the data gets "forced" to be reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distributions and Counters have Start-time. Start-time lets backend detect the restart.
It also lets it calculate the rate correctly.
Example:
At time T0: Application starts
T1: First request served. Start-time is set to T1
T2: 100 request served.
T3: 200 request served. Export enabled.
T4: 300 request served. First export happens, Count = 300, Start-time = T1. Rate=300/(T4-T1)
T5: 400 request served.
T6: Application Restarts
T7: First request served after restart, Start-time= T7
T8: 100 request served. Export count = 100, Start-time = T7, Rate = 100/(T8-T7)
Resetting all metrics would be a behavior change and would have an adverse consequence on some backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia awesome. However, from what I can see there's no way for a backend to determine that:
The CountData only has access to Value
and no idea of start time: https://github.com/census-instrumentation/opencensus-go/blob/master/stats/view/aggregation_data.go#L42
In your example, the Exporter does not have access to T1
(does it?), it can have access to T0
by just putting func init() { t1 = time.Now() }
I imagine but that's not accurate.
Furthermore, dividing Count by (Tnow-T1) will just make an even distribution across time, and not the real distribution. For example, if the first minute we got 300 requests, but the next minute we got 10 requests, it will not show that accurately no?
Finally, do you have an example of an exporter doing this time calculation?
Thanks again :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startTime per view is stored here
There are two interface for exporters to get the data
- ExportView(vd *ViewData)
starttime for ViewData is set here - ExportMetric(context.Context, []*metricdata.Metric)
startTime for Metric is retrieved here
and set here
Regarding even distribution - If backend has individual data points then it can correctly represent the rate for each time window. For example
T0: Count = 300
T1: Count = 400
T2: Count = 410
Rate for the window T1-T0 = (400-300)/(T1-T0)
Rate for the window T2-T1 = (410-400)/(T2-T1)
ViewData and Metric both have end time as well. So the backend can compute the rate accordingly.
I'll post a screenshot shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer
So in this case, min/max are restarted, count/sum/avg are diffed, and SumOfSquaredDev is an unsolvable bug. This means Exporters should probably be aware of those details to export the metrics accurately.
over
Would you prefer the above over view.DisableMonotonicity(true) configuration?
because, the later changes the meaning of aggregation as currently defined. It would be sort of a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia sounds good, then feel free to review/merge this pr 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia there's a potential new issue: how can we ensure that ExportView does not get called if a stats.Record()
was never called after the last interval?
AFAIK, in the StatsD model, you only report to the backend when there's data. But OpenCensus will infinitely call ExportData even if the code hasn't called stats.Record since the last ReportingPeriod...right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original solution, clearRows
made sure that there are no rows, and therefore no new data is reported.
I'm not sure how I can replicate that or find a way to determine that no new data has been given. This is because if I take the diff from the current data minus the previous data, if the value is zero it can either mean that the there's no new data, or that by accident the current data is equal to the previous data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Create Delta types for both Count [and Sum?] and DistributionData and implement them everywhere
@marwan-at-work @rghetia I'm not sure if that ^ option has been discarded or not. Wouldn't this solve all issues mentioned if implemented ?
This reverts commit 2be56c0.
@@ -233,6 +233,7 @@ func (w *worker) reportView(v *viewInternal, now time.Time) { | |||
e.ExportView(viewData) | |||
} | |||
exportersMu.Unlock() | |||
v.resetValues() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This should be also called from w.Read() method. This is for new mechanism (ExportMetrics)
- Apart from that reportUsage() should not go through all the views unless there is an exporter registered (used for old mechanism ExportView())
If user configures both ways or multiple exporter using new mechanism then values of min/max is unpredictable. - Min/Max doc should be updated to reflect the behavior.
- Add tests for min/max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia done, let me know if the last commit is what you were looking for 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test is good.
Item 2 above is not taken care of.
There should be new option in ReadAndExport method. If the option to reset min/max is specified then only do the reset.
Similarly, for old approach (export viewdata) there should be a new method view.ResetMinMaxOnExport() - It is rather a long name but I don't have a good suggestion. Alternatively, provide reset functionality only using new approach with ReadAndExport().
d, ok := vd.Rows[0].Data.(*CountData) | ||
if !ok { | ||
debug.PrintStack() | ||
panic("BYE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
@@ -233,6 +233,7 @@ func (w *worker) reportView(v *viewInternal, now time.Time) { | |||
e.ExportView(viewData) | |||
} | |||
exportersMu.Unlock() | |||
v.resetValues() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test is good.
Item 2 above is not taken care of.
There should be new option in ReadAndExport method. If the option to reset min/max is specified then only do the reset.
Similarly, for old approach (export viewdata) there should be a new method view.ResetMinMaxOnExport() - It is rather a long name but I don't have a good suggestion. Alternatively, provide reset functionality only using new approach with ReadAndExport().
Fixes #1181
Or at least I think so :)
I am not familiar with the code that much so would love some close eyes on this. I am not sure what kind of impact this one line of code would have on performance. However, it does seem to fix the behavior described in the issue above and all the unit tests did pass locally.