You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 18, 2024. It is now read-only.
I want to run inference with the MiDaS model (DPT-large) on large images (2k, 4k, etc.). My GPU memory maxes out just before reaching the 2k image size.
For a CNN my solution would be to run the model on smaller patches and then assemble a larger image from those patches. To avoid artifacts from stitching the images back together, I would run the model on the full receptive area of each output patch.
It's not clear to me whether it's possible to do that with the transformer architecture. Does each output pixel have a cleanly defined 'receptive area' of input pixels?
Or if not, would you have any recommended approach for running the model on large images?
Thank you!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
I want to run inference with the MiDaS model (DPT-large) on large images (2k, 4k, etc.). My GPU memory maxes out just before reaching the 2k image size.
For a CNN my solution would be to run the model on smaller patches and then assemble a larger image from those patches. To avoid artifacts from stitching the images back together, I would run the model on the full receptive area of each output patch.
It's not clear to me whether it's possible to do that with the transformer architecture. Does each output pixel have a cleanly defined 'receptive area' of input pixels?
Or if not, would you have any recommended approach for running the model on large images?
Thank you!
The text was updated successfully, but these errors were encountered: