-
-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added array optimimzation fuse notebook #89
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for writing this up, very useful!
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"inputs_rechunked.blocks[0, :2].visualize(optimize_graph=True)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice trick, didn't know about this :-)
Thanks @alimanfoo, I've applied your suggestions. @mrocklin do you have high-level thoughts on this? Does this feel like we're just documenting a workaround to a weakness of Dask that we should instead be fixing? |
Yes, to me this notebook seems perhaps overly-specific to a single use
case. I'm having trouble finding ways to generalize this notebook to other
situations. I think that a general example of optimization would be
useful. There are plenty of cases where this comes up, such as in ML
workloads where you really want X and y to be co-allocated. That case
might also be a bit simpler.
…On Fri, Jul 19, 2019 at 1:40 PM Tom Augspurger ***@***.***> wrote:
Thanks @alimanfoo <https://github.com/alimanfoo>, I've applied your
suggestions.
@mrocklin <https://github.com/mrocklin> do you have high-level thoughts
on this? Does this feel like we're just documenting a workaround to a
weakness of Dask that we should instead be fixing?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#89?email_source=notifications&email_token=AACKZTE2TXB5RBUDJP3TPBTQAIRCRA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MWRHA#issuecomment-513370268>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACKZTHN7P4E3BPIS3OSTWLQAIRCRANCNFSM4IE3TV3A>
.
|
Although in general in many of these cases I think that we can improve them
just by expanding Blockwise and HighLevelGraph operator fusion out to data
access operations
…On Fri, Jul 19, 2019 at 3:15 PM Matthew Rocklin ***@***.***> wrote:
Yes, to me this notebook seems perhaps overly-specific to a single use
case. I'm having trouble finding ways to generalize this notebook to other
situations. I think that a general example of optimization would be
useful. There are plenty of cases where this comes up, such as in ML
workloads where you really want X and y to be co-allocated. That case
might also be a bit simpler.
On Fri, Jul 19, 2019 at 1:40 PM Tom Augspurger ***@***.***>
wrote:
> Thanks @alimanfoo <https://github.com/alimanfoo>, I've applied your
> suggestions.
>
> @mrocklin <https://github.com/mrocklin> do you have high-level thoughts
> on this? Does this feel like we're just documenting a workaround to a
> weakness of Dask that we should instead be fixing?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#89?email_source=notifications&email_token=AACKZTE2TXB5RBUDJP3TPBTQAIRCRA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MWRHA#issuecomment-513370268>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AACKZTHN7P4E3BPIS3OSTWLQAIRCRANCNFSM4IE3TV3A>
> .
>
|
@TomAugspurger , did you have plans to try to make the story here more general? |
Not at the moment.
…On Wed, Jul 31, 2019 at 2:00 PM Martin Durant ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> , did you have plans to
try to make the story here more general?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#89?email_source=notifications&email_token=AAKAOIXRNST4KKWHAAEYPJTQCHOLXA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IHNFQ#issuecomment-516978326>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITRDXUXDIRQKX5NCDDQCHOLXANCNFSM4IE3TV3A>
.
|
@mrocklin question on the HLG fusion: would you expect adding additional I ask because when I look at just the creation / stacking / rechunking, we don't import dask.array as da
inputs = [da.random.random(size=500_000, chunks=90_000)
for _ in range(5)]
inputs_stacked = da.vstack(inputs)
inputs_rechunked = inputs_stacked.rechunk((50, 90_000))
inputs_rechunked.visualize(optimize_graph=True) So unless adding a |
From dask/dask#5105.
https://mybinder.org/v2/gh/TomAugspurger/dask-examples/array-fuse (building an image now)