-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Custom Health Checks for Kustomization using Common Expression Language(CEL) #4528
base: main
Are you sure you want to change the base?
Conversation
…ion Language(CEL) Signed-off-by: Soule BA <[email protected]>
cc @tmmorin |
// Kind of the custom health check. | ||
// +required | ||
Kind string `json:"kind"` | ||
// InProgress is the CEL expression that verifies that the status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that it would be nice to here have a Name
field, to allow definitng a healthCheck applicable to a specific resource, with a finer grained granularity than the CRD.
The kind of scenario where this would be relevant would be situations where a Kustomization would produce multiple resources of the same kind but the right healthcheck to do would be different among those resources.
An optional Name
field would be interpreted as: the healthCheck to use for a resource of Kind Y and Name X, would be the item in CustomHealthCheckExprs that specifies Kind Y and Name X if any, or if there is none an item for Kind Y and Name unspecified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The health check expression is common for all instances of a Kind version, as it targets the status conditions which are the same for all. Can you give an example where you would write different CEL expressions for the same Kind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will gladly admit not having any exact use case in mind.
The idea expressed here might be one, though, but whether or not it's relevant is certainly debatable. The intersection of "I want this Deployment to be healhcheck'd in a specific way" and "my Kustomization produces multiple Deploments" is perhaps empty.
From a user standpoint, it simply feels like "if this is going to be extensible, why not make it fully flexible". But from an implementation and code maintenance standpoint, I would understand that this idea might be perceived as over-engineering.
I just wanted to share this to see how relevant and implementable the idea might be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you group objects based on expected behaviour? Or chain Kustomizations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the slow feedback
yes @souleb there would be workarounds
and this is overall a very useful evolution to what we have today which works pretty well for many cases already, so the wise path is of course, to keep it simple
Thanks for starting this RFC, which looks very promising. 👍 |
|
||
### Non-Goals | ||
|
||
- We do not plan to support custom `healthChecks` for core resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this restriction ?
It seems perhaps arbitrary to not allow custom heathChecks for core resources. For instance for a Deployment, depending on the situation we might sometimes want ready to mean "all pods are ready", or "at least one pod is ready", or "at least minAvailable pods are ready".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this is that kstatus covers all the builtin types, excluding them on a per kind bases is not possible, so we would need to drop them all and make users write hundreds of lines of CEL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seen from a user standpoint it feel like this might be implemented as "use kstatus for kind XXX unless CustomHealthChecksExprs defines something for XXX".
Looking closer, I understand that the proposal is to reuse the kstatus statusreaders extension point.
However, after looking at kstatus code, I was wondering if the built-in statusreaders couldn't be all dropped and then the ones not covered by any CustomHealthChecksExprs would be added back (NewReplicaSetStatusReader/NewDeploymentResourceReader/NewStatefulSetResourceReader).
(Of course, don't delay this RFC just based on this topic, I haven't digged enough to have any idea of what is reasonably feasible with kstatus.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope of this RFC is to allow defining health checks for custom resources. Once this is implemented, we could consider expanding CEL to native resources in a followup RFC, if it deems to be feasible and if Flux users will ask for it. So far, no one asked for more than what kstatus does for builtin kinds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To expand on Stefan answer, the default poller uses a slice of readers:
statusReaders = append(statusReaders,
deploymentStatusReader,
statefulSetStatusReader,
replicaSetStatusReader,
defaultStatusReader,
)
There is a Support()
method that will be used to find a reader that support a given objet Kind
. The defaultStatusReader
is the one that is used for all custom resources. It is able to compute the status if the CR is compliant with Kstatus
. The actual proposal is to expand this list. So in order to support native kinds
we would need to remove the other Readers.
Another thing to consider is the SSA
. Objects are applied in stages, and we perform a normalization of all native kinds
mainly working around various upstream Kubernetes API bugs. This implies that we expect a certain behaviour. Also normalization could invalidate a CEL
expression as we pass the whole object to the evaluator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to keeping the thing simple
my ideas of refinements may prove relevant later... or not ;)
This RFC proposes to add support for custom heath checks during the health check phase of a
kustomization
reconciliation usingCEL
expressions.