Running on large images #71

carsonswope · 2022-06-28T19:32:07Z

Hi,

I want to run inference with the MiDaS model (DPT-large) on large images (2k, 4k, etc.). My GPU memory maxes out just before reaching the 2k image size.

For a CNN my solution would be to run the model on smaller patches and then assemble a larger image from those patches. To avoid artifacts from stitching the images back together, I would run the model on the full receptive area of each output patch.

It's not clear to me whether it's possible to do that with the transformer architecture. Does each output pixel have a cleanly defined 'receptive area' of input pixels?

Or if not, would you have any recommended approach for running the model on large images?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on large images #71

Running on large images #71

carsonswope commented Jun 28, 2022

Running on large images #71

Running on large images #71

Comments

carsonswope commented Jun 28, 2022