-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve metrics #3847
base: master
Are you sure you want to change the base?
Improve metrics #3847
Conversation
@@ -144,75 +144,25 @@ var ( | |||
completedJobsTotalLabels, | |||
) | |||
|
|||
jobStartupDurationSeconds = prometheus.NewHistogramVec( | |||
prometheus.HistogramOpts{ | |||
jobLastStartupDurationSeconds = prometheus.NewGaugeVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth adding a comment (in addition to commit message) why Gague is used, while ideally Histogram seems better data type - might avoid a lot of WTFs and wasting time basically reverting this change ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in commit description?
They cause a creation of new services with each job execution.
vs
They cause a creation of new series with each job execution.
Also worth mentioning cardinality explosion and OOMs
…on times. Its difficult to calculate any duration is intervals between jobs are not frequent enough. Last duration would give a better overview.
They cause a creation of new series with each job execution leading to OOM kills and degraded performance.
…n otherwords waiting for a runner
…query memory, cpu and cpu throttling metrics
8c32cd2
to
34f2a62
Compare
This PR makes following changes:
gha_runner_job
which can be used to link jobs to runner pods (metric is only exported while job is running).name
label to always contain the clean runnerScaleSetName (value used in GHA job runs-on property to select runner).Example queries:
Memory usage per job:
CPU usage per job:
CPU Throttling: