There are few approaches to measure the performance of the system by using
- P90 percentile latency
- P95 percentile latency
- P50 percentile latency
- Median And how we calculate the sample data points over time for the metrics.
Median
AVG: By getting all the data point in the range of time then calculate to the mean value of them, it will be represented as an unique sample data point at that range of time.
P95
Formula to find the index: P|k = k/100 × (n+1) P95: By ordering all the value of data points for a range of time, we collects all the latency data points in that range of time, sort them by ascending order. The index at the position 95% of that array will the sample value of that range of time. By using the same method, it reveals how to calculate the with P50 P99 latency method.
All of them can be used for the latency metric system, and prove the performance of the system with the given SLA. Like the requirement of the customer they need our system must have 100ms P99 SLA for all of our endpoints performance.
SLO SLI SLA
SLA: Agreement contract with the customer about the stabability, performance of the system. Penalty fee can be applied if the agreement is broken. SLI: Service level indicator, like the P99 must be under 500ms. SLO: Service level objectives, use to monitor system performance before it reaches to the SLA. Can use P99 P90 For the performance of the SLA, Use the percent of requests under 100ms for the report.