Replies: 5 comments 1 reply
-
Sounds almost like "no-op mode"… Pardon me if I'm just naive here (I don't have a lot of context on the compliance test suite), but is Alertmanager really needed for compliance testing if it isn't really doing anything? Couldn't the test suite itself emulate the Alertmanager API and be configured as an Alertmanager in the alert-generator to test? |
Beta Was this translation helpful? Give feedback.
-
The test suite acts like an alertmanager which receives alerts from the alert-generator (for example a Prometheus instance). There is no alertmanager in picture here. The specific problem here is for the cloud providers; there is an alertmanager sitting in between the alert-generator and the test suite which is creating problems and it is not trivial to let alerts bypass this alertmanager in the cloud. Hence this feature request. |
Beta Was this translation helpful? Give feedback.
-
I have opened #2873 as a reference implementation for this |
Beta Was this translation helpful? Give feedback.
-
Originally, compliance testing was meant to be only for the alert generator (Prometheus) as opposed to the alert receiver (Alertmanagert). However, after looking at the broader landscape compliance switched its approach to include the Alertmanager. Why? If we look at how cloud offerings (AWS as a non-Grafana example) ship "Prometheus-based alerts" as a product, they don't distinguish between the alert generator and alert receiver. Furthermore, they don't allow the configuration of an external alert receiver by its users - with reason (added product complexity and in most cases final users don't care about it). In a cloud offering, you set up an alert and eventually, it gets delivered (more often than not the delivery is through the Alertmanager). Now, I'm not saying this is correct, but it's essential to understand the precedent when making changes. The Goal of Compliance Compliance's end goal and main premise is: Any user must be able to download the compliance test suite and execute the test themselves. This forces compliance's hand to include the Alertmanager as part of the workflow. Without it, we're creating a compliance test suite that forces cloud offerings to change their alerting product implementation before the compliance is out. Now, if there is a strong reason (and this is where I would need help from Alertmanager maintainers) why cloud offerings should expose end-user configuration for the alert generator to specify the alert receiver I'm all ears but from where I sit this doesn't look right due to the reasons I detailed above. What is blocking compliance? With all that said, what compliance (Ganesh in this context) is proposing actually makes sense to me (for different reasons but compliance included). The Alertmanager was designed to be an AP system and due to this deduplication was built in as a feature to support it. However, when configuring, debugging and understanding an HA alerting environment (2 Prometheus forwarding alerts to 2 Alertmanagers) I can see the usefulness of disabling the group-level semantics at a receiver level and notifying about alerts from a group as they are received. I have reservations about the proposed implementation (e.g. I don't think we should do |
Beta Was this translation helpful? Give feedback.
-
hi , Please help how to setup UI for alertmanager ...it is needed to ease the managing of blackout and silences of alerts for hundred of servers and DB. |
Beta Was this translation helpful? Give feedback.
-
What
I would like to propose a feature (via config) to mention
forward_alerts: true
(or some other name) under routing config which totally ignores group interval, group wait, repeat interval, and group by, and simply forwards all alerts that it comes across to the configured receivers.Why
Grafana Cloud and Amazon Managed Prometheus (and probably more) uses Prometheus Alertmanager in the cloud offerings to send alerts. This adds hurdles in testing the Prometheus alert-generator compliance because alerts cannot be send back to the test suite properly as expected. This can be fixed up to some extent by having
groupinterval=1s, groupwait=1s, repeatinterval=100y, groupby=...
. But it won't forward duplicate alerts that are sent by alert-generator every minute. (setting repeatinterval=1m also has its problems).There might also be other use cases with this that I don't know of yet.
Beta Was this translation helpful? Give feedback.
All reactions