Skip to content
This repository has been archived by the owner on Dec 18, 2024. It is now read-only.

Running on large images #71

Open
carsonswope opened this issue Jun 28, 2022 · 0 comments
Open

Running on large images #71

carsonswope opened this issue Jun 28, 2022 · 0 comments

Comments

@carsonswope
Copy link

Hi,

I want to run inference with the MiDaS model (DPT-large) on large images (2k, 4k, etc.). My GPU memory maxes out just before reaching the 2k image size.

For a CNN my solution would be to run the model on smaller patches and then assemble a larger image from those patches. To avoid artifacts from stitching the images back together, I would run the model on the full receptive area of each output patch.

It's not clear to me whether it's possible to do that with the transformer architecture. Does each output pixel have a cleanly defined 'receptive area' of input pixels?

Or if not, would you have any recommended approach for running the model on large images?

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant