Universal table transformer combining univariate transformations dispatched on schema #288

ablaom · 2020-08-04T22:06:41Z

It has been proposed on Slack that it be possible to have a single table transformer that transforms individual columns according to user-specified univariate transformations. This sounds like a good idea, which would also force some uniformity that's a little bit lacking in the current collection of table transformers.

In the most general case I can imagine implementing, the univariate transformer that applies to a particular column is defined by a function that operates on both the name and scitype of the the column (as encoded in the table schema). This has the disadvantage that the user must specify a function with two arguments - or interact through some other complicated interface.
The alternative would be a compositional approach. Each tabular transformer only carries out a single univariate transformer, applying to all specified names and scitypes (or "not"-names and "not"-scitypes, through ignore Boolean parameter), which would cover all conceivable use-cases. (columns not referred to are left alone). However, as we are currently locked into Tables.jl (which are non-mutable in general) we get a lot more copying of data.

Thoughts anyone?

The text was updated successfully, but these errors were encountered:

ablaom · 2020-10-13T23:15:17Z

I'm inclined to go with option 2, which is more user-friendly. The other issue ought to be solved on the tables interface side, in my opinion.

ParadaCarleton · 2023-10-24T23:52:43Z

What about the opposite--a way to limit a multivariate transform to a subset of columns? This seems more general, since all multivariate transforms can be used as a univariate transform, but not vice-versa (e.g. PCA).

I'm not sure how I'd go about implementing this, though (given only MLJ primitives). Is there a package or interface used by MLJ for messing about with tables?

ablaom · 2023-10-29T23:02:27Z

| Is there a package or interface used by MLJ for messing about with tables?

In MLJ a "table" is anything implementing the Tables.jl interface and satisfying Tables.istable(X) = true. Unfortunately, the generality of Tables.jl makes it less than ideal for our purposes, as it aims to include out-of-memory tables and tables with an unknown numbers of rows (e.g., lazily iterated). The maintainers are very thoughtful, but reluctant to add any complexity. The API has no method to mutate columns in-place. There is now a Tables.subset method for random access of rows, but this took a very long time to get. Maybe a new specialised pkg is needed, but no-one has ventured to write one.

The method MLJModelInterface.nrows(X) will get you the number of rows, by basically materialising an entire column if necessary (see also "Aside" below).

MLJModelInterface has methods selectrows and selectcols, based on Tables.jl primitives, but I'd now recommend Tables.subset over selectrows. I expect TableTransforms.jl is your best bet for general table manipulations, although it's probably too heavy a dep for MLJModels.jl.

A package called TableOperations.jl provided some useful tools for tables, but is no longer maintained, as far as I can tell.

Aside Another interface, DataAPI, provides the DataAPI.nrow (not DataAPI.nrows) that is implemented by DataFrames.jl and, more recently, some of the table types actually owned by Tables.jl, such as a matrix table wrapper. I'd consider restricting MLJ's definition of table to require implementation of DataAPI.nrow but that would be breaking. One reason for doing so is that tables with this feature also fit into the MLUtils.jl API.

ablaom mentioned this issue Aug 5, 2020

Fixes to FillImputer #289

Merged

1 task

ablaom mentioned this issue Aug 14, 2020

Add UnivariateTimeTypeToContinuous transformer to builtins #245

Closed

ablaom mentioned this issue Oct 14, 2020

Meta issue: lssues for possible collaboration with UCL JuliaAI/MLJ.jl#673

Closed

7 tasks

ablaom mentioned this issue Nov 12, 2020

MLJ Tangent space transformer JuliaManifolds/ManifoldML.jl#5

Open

ablaom mentioned this issue Jul 13, 2021

MinMaxScaler (and more) JuliaAI/MLJ.jl#816

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Universal table transformer combining univariate transformations dispatched on schema #288

Universal table transformer combining univariate transformations dispatched on schema #288

ablaom commented Aug 4, 2020 •

edited

Loading

ablaom commented Oct 13, 2020

ParadaCarleton commented Oct 24, 2023

ablaom commented Oct 29, 2023 •

edited

Loading

Universal table transformer combining univariate transformations dispatched on schema #288

Universal table transformer combining univariate transformations dispatched on schema #288

Comments

ablaom commented Aug 4, 2020 • edited Loading

ablaom commented Oct 13, 2020

ParadaCarleton commented Oct 24, 2023

ablaom commented Oct 29, 2023 • edited Loading

ablaom commented Aug 4, 2020 •

edited

Loading

ablaom commented Oct 29, 2023 •

edited

Loading