-
Notifications
You must be signed in to change notification settings - Fork 262
The steps to obtain absolute depth on custom dataset #48
Comments
The steps are correct to align the estimates to the SfM construction, but SfM is unable to recover absolute depth too. So the aligned depth maps would have a consistent scale for the given scene as opposed to an arbitrary scale per image before the alignment, but there would still be a missing global scale to get absolute metric measurements. These slides give a good overview about SfM ambiguities: https://slazebni.cs.illinois.edu/spring19/lec17_sfm.pdf. Slide 7 shows the relevant issue. |
Got it. Thanks for your kind reply! Another question is, in EVALUATION.md I notice that when evaluate on KITTI, the argument absolute_depth is specified and the prediction is scaled by 256. There are no more post-processing steps to compute the scale and shift because the dpt_hybrid_kitti-cb926ef4.pt model is trained (or finetuned) specifically on KITTI? Line 165 in f43ef9e
Line 166 in f43ef9e
Line 180 in f43ef9e
Line 181 in f43ef9e
If I want to evaluate the dpt_large model on KITTI, do I still need to follow the above step0~step4 to convert the inverse-depth map? |
Yes to both questions. A word of caution for evaluating the large model this way: when evaluating the existing large model, which doesn't estimate absolute depth, the numbers are not directly comparable anymore to the numbers in Table 3 since the alignment step will "remove" part of the error. The numbers will be only comparable to Table 1 (or Table 11 in the MiDaS paper) where we did the alignment for all methods to have a fair comparison. |
Additionally there are used
|
Hi! I did some experiments on the flower dataset these days. Can you give me some help to improve the alignment result? I run COLMAP with the default configuration to obtain the camera parameters and sparse 3D points (I have already converted them to the camera coordinate). The goal is to align the estimation of DPT to the SfM scale and get a dense depth map for every image. Denote the depth map outputted by DPT as D, with shape of [h,w]; the collection of sparse 3D points as {[x_i,y_i,d_i]}. Firstly I extract the corresponding value of {[x_i,y_i]} from D and get {[D_i]}. Then I simply compute a scale and shift to align {[D_i]} and {[1/d_i]} by I use the aligned inverse depth map, combined with the SfM scaled camera parameters, to warp a source image to the target viewpoint, the results (warped vs target image) are shown below: It seems that some pixels are misaligned between the warped image and the target image. Is it a reasonable result? Can I do something to improve the fitting process? |
As the results of the model are not perfect a residual error is expected. How much error, will likely vary per image. Here are some works that try to address the consistency issue: https://roxanneluo.github.io/Consistent-Video-Depth-Estimation/ These works tackle the case of dynamic objects in the reconstruction. If you expect no independently moving objects in the scene, you can also directly use MVS which will lead to consistent results out of the box. |
@07hyx06 Have you found a solution to this problem? |
Hi! I look through some discussions in the MiDaS repo's issue and summarize the steps to obtain absolute depth from the estimated dense inverse depth. Am I right?
Step 0: Run SfM to get some sparse 3D points with correct absolute depth, e.g. (x1,y1,d1), ..., (xn,yn,dn)
Step 1: Inverse the 3rd dimension to get 3D points with correct inverse depth, e.g. (x1,y1,1/d1), ..., (xn,yn,1/dn)
Step 2: Run DPT model to estimate the dense inverse depth map D
Step 3: Compute scale S and shift T to Align D with {(x1,y1,1/d1), ..., (xn,yn,1/dn)}
Step 4: Output 1/(SxD+T) as the depth
The text was updated successfully, but these errors were encountered: