Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding worker autoscaling support with KEDA #277

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sdaberdaku
Copy link
Member

No description provided.

@cla-bot cla-bot bot added the cla-signed label Dec 13, 2024
@sdaberdaku
Copy link
Member Author

For some reason, the trino_execution_resourcegroups_InternalResourceGroup_RunningQueries JMX metric is not exposed by Trino 446. It is by versions 435 and by 467. My KEDA test starts with 0 worker replicas. Then, a query on TPCH is launched on the coordinator, which increases this value from 0 to 1 and triggers the creation of a worker pod. Unfortunately, queries that are submitted when no workers are available enter a state of "WAITING_FOR_RESOUCES" which is not blocked nor queued.

@nineinchnick I am open to suggestions for a better metric or testing strategy.

@sdaberdaku sdaberdaku force-pushed the feature/add-keda-support branch 2 times, most recently from 35dc778 to b9dcd3b Compare December 14, 2024 18:20
@sdaberdaku sdaberdaku marked this pull request as ready for review December 14, 2024 18:31
@sdaberdaku sdaberdaku force-pushed the feature/add-keda-support branch 2 times, most recently from 52c79e9 to 757de90 Compare December 15, 2024 13:00
@sdaberdaku
Copy link
Member Author

I found the trino_execution_ClusterSizeMonitor_RequiredWorkers metric that works with all versions of Trino and allows us to test scaling up from 0 workers.

I am also wondering if the Chart should support the creation of TriggerAuthentication objects to cover the cases when Prometheus requires authentication. One could always create this object outside the chart and reference it in the ScaledObject trigger, so it is not mandatory. What is blocking me is how to implement the trigger - TriggerAuthentication assignment. One TriggerAuthentication can be referenced by multiple triggers. Naming also needs to be deterministic, since the trigger needs to reference the TriggerAuthentication object by name. I could "inject" the {{ Release.Name }} prefix to these objects, but I don't like it very much.

Copy link
Member

@nineinchnick nineinchnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, I haven't yet checked all the new properties.

@@ -114,6 +114,70 @@ server:
# selectPolicy: Max
# ```

# -- Configure [KEDA](https://keda.sh/) for workers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document this is exclusive with server.autoscaling, and if possible, help users make the choice, if they're just starting with autoscaling and don't know either one of those options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes totally sense. I will improve the documentation on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clarify this in the documentation. I also added a warning in NOTES.txt to indicate that keda would take precedente over hpa in case they are both enabled.

tests/trino/test-values.yaml Show resolved Hide resolved
@sdaberdaku sdaberdaku force-pushed the feature/add-keda-support branch from 757de90 to d20836f Compare December 21, 2024 16:19
Copy link
Member

@nineinchnick nineinchnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other popular autoscalers for Kubernetes? I see this just creates the ScaledObject resource. What's the value in including this in the Trino chart? How hard is to manage this as an add-on, for example using an umbrella chart?

@@ -42,7 +42,12 @@ spec:
- --password
{{- end }}
- --debug
{{- if .Values.server.keda.enabled }}
{{/* When testing KEDA we need a query that requires workers to run. */}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we always use that query?

enabled: false
pollingInterval: 30
# -- Period to wait after the last trigger reported active before scaling the resource back to 0
cooldownPeriod: 300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - would be nice to known the unit. I was about to suggest including seconds in the property name, but I see it's consitent with Keda's API, so just mention that in the comment.

@@ -50,7 +50,7 @@ data:

config.properties: |
coordinator=true
{{- if gt (int .Values.server.workers) 0 }}
{{- if or .Values.server.keda.enabled (gt (int .Values.server.workers) 0) }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not depend on Keda, but have an explicit property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a bit more on your comment?
My reasoning was that you need to set this property if you have more than 0 workers or if you are using KEDA (and you could have the minimum number of workers set to 0 in this case).

@@ -1,4 +1,4 @@
{{- if .Values.server.autoscaling.enabled -}}
{{- if and .Values.server.autoscaling.enabled (not .Values.server.keda.enabled) -}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Values.server.autoscaling.enabled exclusive with .Values.server.keda.enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a warning telling the user that KEDA will override HPA. I have seen this solution in other charts (KEDA superseding classic HPA). Maybe it is better to avoid confusion and fail with an explicit error message if both are enabled? Now that I think about it, it makes more sense to just fail since we are not re-using properties of .Values.server.autoscaling in .Values.server.keda.

@sdaberdaku
Copy link
Member Author

Are there any other popular autoscalers for Kubernetes?

I am not aware of other open-source Kubernetes autoscalers for Pods/Jobs.

I see this just creates the ScaledObject resource. What's the value in including this in the Trino chart?

KEDA allows for scaling worker Pods down to zero replicas when the cluster is unused, and to me this is a pretty interesting feature that cannot be achieved with vanilla HPA.

How hard is to manage this as an add-on, for example using an umbrella chart?

It is definitely doable, I am currently managing it with Kustomize and ArgoCD. One pain point in having to do this outside of the Chart is that I cannot remove the spec.replicas field in the worker deployment and keep the HPA disabled (as discussed here: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas, this field should not be set when using horizontal scaling). Moreover, I have to explicitly configure ArgoCD so that it ignores this particular field so that it does not try to reconcile it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants