Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested aggregations #20601

Closed
francescomandruvs opened this issue Jan 7, 2025 · 2 comments
Closed

Nested aggregations #20601

francescomandruvs opened this issue Jan 7, 2025 · 2 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@francescomandruvs
Copy link

Description

Following #14361 and #12051 I would like to do nested aggregations into the same expression but this is not permitted, for example:

df.with_columns( 
    pl.col(col).cum_sum().over([some_columns]).alias("cum_sum").last().over([maybe_some_other_columns]).alias("final")
)

InvalidOperationError: window expression not allowed in aggregation

Is there any particular reason why this is not permitted? Maybe related with how polars decide to plan the expression? Are you planning to enable chained aggregations in the future or it's not possible?

@francescomandruvs francescomandruvs added the enhancement New feature or an improvement of an existing feature label Jan 7, 2025
@francescomandruvs francescomandruvs changed the title nested aggregations Nested aggregations Jan 7, 2025
@orlp
Copy link
Collaborator

orlp commented Jan 9, 2025

Is there any particular reason why this is not permitted?

Nested aggregations / window expressions are really hard, and almost impossible to do efficiently in general. We would of course like to support them but to do them right would take a lot of engineering effort, and still would leave it easy to write inefficient queries.

Note that what you wrote isn't a "chained" aggregation, it is a nested one. I don't think we have any plans of changing the semantics of our expression syntax to make this "chained" in some way, you'll have to do it in multiple select/with_column expressions.

@francescomandruvs
Copy link
Author

Thank you, I will close this issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants