-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Universal table transformer combining univariate transformations dispatched on schema #288
Comments
I'm inclined to go with option 2, which is more user-friendly. The other issue ought to be solved on the tables interface side, in my opinion. |
What about the opposite--a way to limit a multivariate transform to a subset of columns? This seems more general, since all multivariate transforms can be used as a univariate transform, but not vice-versa (e.g. PCA). I'm not sure how I'd go about implementing this, though (given only MLJ primitives). Is there a package or interface used by MLJ for messing about with tables? |
| Is there a package or interface used by MLJ for messing about with tables? In MLJ a "table" is anything implementing the Tables.jl interface and satisfying The method MLJModelInterface has methods A package called TableOperations.jl provided some useful tools for tables, but is no longer maintained, as far as I can tell. Aside Another interface, |
It has been proposed on Slack that it be possible to have a single table transformer that transforms individual columns according to user-specified univariate transformations. This sounds like a good idea, which would also force some uniformity that's a little bit lacking in the current collection of table transformers.
In the most general case I can imagine implementing, the univariate transformer that applies to a particular column is defined by a function that operates on both the
name
andscitype
of the the column (as encoded in the tableschema
). This has the disadvantage that the user must specify a function with two arguments - or interact through some other complicated interface.The alternative would be a compositional approach. Each tabular transformer only carries out a single univariate transformer, applying to all specified
names
andscitypes
(or "not"-names and "not"-scitypes, throughignore
Boolean parameter), which would cover all conceivable use-cases. (columns not referred to are left alone). However, as we are currently locked into Tables.jl (which are non-mutable in general) we get a lot more copying of data.Thoughts anyone?
The text was updated successfully, but these errors were encountered: