-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DesignMatrixBuilders pickleable and saveable #26
Comments
Hi there, I was wondering if there are any news regarding this issue. Many thanks in advance! |
Hey folks, |
Really? That's bad! The problem isn't so much making it work at all, it's making it work in Also, the way dill "works" at the moment is basically to pickle the whole On Wed, Apr 15, 2015 at 3:25 PM, Doron-Wiser [email protected]
Nathaniel J. Smith -- http://vorpus.org |
Yeah. Don't use dill to pickle v0.3 DesignMatrixBuilder objects. Patsy v0.4 will support pickling. (The harder bits are done.) @njsmith I guess now that #25 is taken care of, the remaining bit is adding |
Created pull request #67 to work on this. |
…ignInfo Notable changes: - DesignInfo now exposes lots more metadata about how exactly different factors and terms are coded. - In fact, it exposes enough more metadata that you can now reconstruct a design matrix entirely from a DesignInfo, so DesignMatrixBuilder becomes redundant and is removed in favor of DesignInfo. - DesignInfo's constructor is very different; in particular, removed the option of specifying terms as strings, which was only useful for interoperability with competing formula libraries. Four years later, no such competitors have appeared, so I can't be bothered to keep maintaining this. Will re-add later if someone actually wants to use it. This versions works and passes tests, but a bunch more tests need to be added. This fixes #61, and sets us up to implemented pickling support (#26, #67).
The remaining step is to write unit tests for the serialization objects, to make sure that patsy doesn't (unknowingly) break support for formulas, etc. pickled with past versions. I'm been kept busy with other things, but my goal is to get this finished before PyCon 2016, so I'm starting work on this piece in the coming weekends. |
@chrish42: that would be great! Of course feel free to ask for help as well if you are stalled -- maybe @alexdamour wants to help, for example ;-) |
I'm using the PyCon sprints to start working on this again. As a first step, I'm cleaning up the description of pull request #67 to have an actual list of tasks that must be done to close this. My next step is to update the pull request with enough code so people can see what the approach would look like. |
@chrish42 hey guys, wondering if there's any new update on this front? thanks! |
Sure. I've had a very productive sprint at PyCon. I know the "0 of 11 tasks complete" hasn't moved, but if you go look at the "code" tab of the pull request, you'll now see a pretty fleshed out testing framework for pickling. Once @njsmith is happy with that part, I can start implementing If you want to follow the progress, have a look at the pull request, as this bug report will stay pretty quiet until we close it. |
pleaseee fix this issue for the love of god !
|
+1 Also need to pickle |
+1 For anyone interested, I've made a fork of I'm continuing to follow this thread so that we can switch back to using the |
@christang I tried your branch, but it does not work for me. Can you confirm that this is supposed to work:
I still get the |
@saroele Thanks for the note. I believe this branch only adds support for design matrix/info so it may be your other objects still remain without pickling support. I can confirm that the code does not work for me. |
@christang, can you help me out with this, please. I'm currently facing a NotImplementedError even when I believe to be doing the pickling right. Any advice would be greatly appreciated
|
This is really needed guys. what's blocking this implementation? |
@bertomartin Patsy maintenance is done on a purely-volunteer basis, and I haven't really had time to work on it (or even review PRs) in several years now. If someone needs this and has funding to spend on it, we could talk about some kind of consulting contract... |
@njsmith thanks for the great work so far. Ok, I'll take a stab at it. |
@bertomartin any news? |
I have also tried looking into it, import h5py
def save_patsy(patsy_step, filename):
"""Save the coefficients of a linear model into a .h5 file."""
with h5py.File(filename, 'w') as hf:
hf.create_dataset("design_info", data=patsy_step.design_info_)
def load_coefficients(patsy_step, filename):
"""Attach the saved coefficients to a linear model."""
with h5py.File(filename, 'r') as hf:
design_info = hf['design_info'][:]
patsy_step.design_info_ = design_info
save_patsy(pipe['patsy'], "clf.h5") Perhaps something simple like this? Howver, still not working. |
Hi @petrhrobar . I recommend you check out formulaic if you are wanting support for pickling. |
Here is a partial solution. Before you first import patsy:
Then, to "serialize" a DesignInfo, do:
To "deserialize" a DesignInfo, do:
To get a design matrix from a design info, you can use the method Depending on your usage of patsy, there may be other The author of this library deserves major credit for almost completely implementing |
Use cases:
DesignMatrixBuilder
(and/orDesignInfo
, same issue)center(...)
with different means across the different analyses.The easy part of this is reviewing the inner structure of
DesignMatrixBuilder
(column builders and all that) to make sure it's sensible, and similarly for factor state dicts.The more complicated part is capturing the evaluation environment in a reasonable way.
Precondition: #25
The text was updated successfully, but these errors were encountered: