Skip to content
This repository has been archived by the owner on Dec 18, 2024. It is now read-only.

The steps to obtain absolute depth on custom dataset #48

Open
07hyx06 opened this issue Sep 16, 2021 · 8 comments
Open

The steps to obtain absolute depth on custom dataset #48

07hyx06 opened this issue Sep 16, 2021 · 8 comments

Comments

@07hyx06
Copy link

07hyx06 commented Sep 16, 2021

Hi! I look through some discussions in the MiDaS repo's issue and summarize the steps to obtain absolute depth from the estimated dense inverse depth. Am I right?

Step 0: Run SfM to get some sparse 3D points with correct absolute depth, e.g. (x1,y1,d1), ..., (xn,yn,dn)
Step 1: Inverse the 3rd dimension to get 3D points with correct inverse depth, e.g. (x1,y1,1/d1), ..., (xn,yn,1/dn)
Step 2: Run DPT model to estimate the dense inverse depth map D
Step 3: Compute scale S and shift T to Align D with {(x1,y1,1/d1), ..., (xn,yn,1/dn)}
Step 4: Output 1/(SxD+T) as the depth

@ranftlr
Copy link
Contributor

ranftlr commented Sep 16, 2021

The steps are correct to align the estimates to the SfM construction, but SfM is unable to recover absolute depth too. So the aligned depth maps would have a consistent scale for the given scene as opposed to an arbitrary scale per image before the alignment, but there would still be a missing global scale to get absolute metric measurements.

These slides give a good overview about SfM ambiguities: https://slazebni.cs.illinois.edu/spring19/lec17_sfm.pdf. Slide 7 shows the relevant issue.

@07hyx06
Copy link
Author

07hyx06 commented Sep 16, 2021

Got it. Thanks for your kind reply!

Another question is, in EVALUATION.md I notice that when evaluate on KITTI, the argument absolute_depth is specified and the prediction is scaled by 256. There are no more post-processing steps to compute the scale and shift because the dpt_hybrid_kitti-cb926ef4.pt model is trained (or finetuned) specifically on KITTI?

if model_type == "dpt_hybrid_kitti":

prediction *= 256

DPT/util/io.py

Line 180 in f43ef9e

if absolute_depth:

DPT/util/io.py

Line 181 in f43ef9e

out = depth

If I want to evaluate the dpt_large model on KITTI, do I still need to follow the above step0~step4 to convert the inverse-depth map?

@ranftlr
Copy link
Contributor

ranftlr commented Sep 16, 2021

Yes to both questions.

A word of caution for evaluating the large model this way: when evaluating the existing large model, which doesn't estimate absolute depth, the numbers are not directly comparable anymore to the numbers in Table 3 since the alignment step will "remove" part of the error. The numbers will be only comparable to Table 1 (or Table 11 in the MiDaS paper) where we did the alignment for all methods to have a fair comparison.

@AlexeyAB
Copy link
Contributor

@07hyx06

Another question is, in EVALUATION.md I notice that when evaluate on KITTI, the argument absolute_depth is specified and the prediction is scaled by 256. There are no more post-processing steps to compute the scale and shift ...

Additionally there are used invert=True, scale and shift parameters which depend on the model-weights and dataset (or for real cases - depend on model-weights, camera intrinsics and unit of measurement for depth):

  • when you use Kitti-weights:

    DPT/run_monodepth.py

    Lines 53 to 65 in f43ef9e

    elif model_type == "dpt_hybrid_kitti":
    net_w = 1216
    net_h = 352
    model = DPTDepthModel(
    path=model_path,
    scale=0.00006016,
    shift=0.00579,
    invert=True,
    backbone="vitb_rn50_384",
    non_negative=True,
    enable_attention_hooks=False,
    )

  • or NYU-weights:

    DPT/run_monodepth.py

    Lines 68 to 80 in f43ef9e

    elif model_type == "dpt_hybrid_nyu":
    net_w = 640
    net_h = 480
    model = DPTDepthModel(
    path=model_path,
    scale=0.000305,
    shift=0.1378,
    invert=True,
    backbone="vitb_rn50_384",
    non_negative=True,
    enable_attention_hooks=False,
    )

@07hyx06
Copy link
Author

07hyx06 commented Sep 17, 2021

@ranftlr @AlexeyAB Thanks for your help!

@07hyx06 07hyx06 closed this as completed Sep 17, 2021
@07hyx06
Copy link
Author

07hyx06 commented Sep 18, 2021

Hi! I did some experiments on the flower dataset these days. Can you give me some help to improve the alignment result?

I run COLMAP with the default configuration to obtain the camera parameters and sparse 3D points (I have already converted them to the camera coordinate). The goal is to align the estimation of DPT to the SfM scale and get a dense depth map for every image.

Denote the depth map outputted by DPT as D, with shape of [h,w]; the collection of sparse 3D points as {[x_i,y_i,d_i]}. Firstly I extract the corresponding value of {[x_i,y_i]} from D and get {[D_i]}. Then I simply compute a scale and shift to align {[D_i]} and {[1/d_i]} by np.linalg.lstsq. The fitting result as shown in the figure below. The blue points are (D_i, scale * D_i + shift) and the orange points are (D_i, d_i).

lst

I use the aligned inverse depth map, combined with the SfM scaled camera parameters, to warp a source image to the target viewpoint, the results (warped vs target image) are shown below:

midas_switch

It seems that some pixels are misaligned between the warped image and the target image. Is it a reasonable result? Can I do something to improve the fitting process?

@07hyx06 07hyx06 reopened this Sep 18, 2021
@ranftlr
Copy link
Contributor

ranftlr commented Sep 20, 2021

As the results of the model are not perfect a residual error is expected. How much error, will likely vary per image.

Here are some works that try to address the consistency issue:

https://roxanneluo.github.io/Consistent-Video-Depth-Estimation/
https://robust-cvd.github.io/

These works tackle the case of dynamic objects in the reconstruction. If you expect no independently moving objects in the scene, you can also directly use MVS which will lead to consistent results out of the box.

@tdsuper
Copy link

tdsuper commented Oct 21, 2021

It seems that some pixels are misaligned between the warped image and the target image. Is it a reasonable result? Can I do something to improve the fitting process?

@07hyx06 Have you found a solution to this problem?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants