DarkVGGT: Seeing Through Darkness Using Thermal Geometry without Daylight Tax

Kweon, Minseong; Zhao, Wenyuan; Chen, Nuo; Liu, Lulin; Han, Huiwen; Zhu, Zihao; Shakkottai, Srinivas; Tian, Chao; Fan, Zhiwen

DarkVGGT: Seeing Through Darkness Using Thermal Geometry without Daylight Tax

Minseong Kweon¹, Wenyuan Zhao², Nuo Chen², Lulin Liu¹
Huiwen Han³, Zihao Zhu², Srinivas Shakkottai², Chao Tian², Zhiwen Fan^2*

¹University of Minnesota ²Texas A&M University ³Stanford University
Under Review
^*Corresponding Author

Paper arXiv Code

While VGGT fails in dark scenes due to degraded RGB cues, DarkVGGT leverages thermal geometry to produce reliable 3D reconstruction.

Abstract

Recent feed-forward 3D reconstruction methods have demonstrated strong performance and flexibility in efficient end-to-end scene geometry estimation from image streams. However, their reliance on visible-light appearance makes them vulnerable in dark and low-visibility environments, where RGB cues are severely degraded and geometric evidence becomes ambiguous. To address this challenge, we propose DarkVGGT, an RGB-T feed-forward geometry framework that uses physics-aware thermal modeling for robust 3D estimation in low-light scenes. DarkVGGT introduces two complementary modules. First, physics-inspired thermal factorization extracts emissive-dominant, geometry-consistent thermal cues while isolating sparse reflective residuals that may introduce geometric ambiguity. Second, geometry-shared thermal routing isolates modality-invariant geometric structures from thermal-specific patterns, selectively injecting reliability-aware structural guidance into the RGB stream. Together, these components enable accurate thermal-informed geometry estimation under degraded RGB conditions while largely preserving performance in well-lit environments. Experiments on low-visibility RGB-T benchmarks demonstrate consistent improvements in both depth and camera pose estimation over existing feed-forward geometry baselines.

Interactive 3D Demo

Compare VGGT base and DarkVGGT point cloud reconstructions. Drag to rotate, scroll to zoom.

Qualitative Visualization

**Qualitative comparison of nighttime 3D geometry estimation.** DarkVGGT outperforms RGB-only baselines (Dark3R, VGGT) and produces more GT-aligned poses and geometrically consistent point clouds than the RGB-T model SEAR.

Reconstruction quality comparison between SEAR and DarkVGGT — **More reconstruction quality vs. SEAR.** By routing only geometry-relevant thermal cues into the RGB branch, DarkVGGT consistently surpasses SEAR, with the advantage most evident in aggressively dark indoor scenes.

Methodology

**Overview of the DarkVGGT framework.** DarkVGGT factorizes thermal embeddings into emissive and reflective components using a high-pass (HP) filter and projection branches. Building on this factorization, geometry-shared thermal routing (GSTR) separates shared and private thermal tokens, and selectively injects geometry-relevant thermal content into the RGB stream.

Quantitative Results

Depth and camera pose estimation on darkness scene — **Depth and camera pose estimation performance on dark scenes.** Our method consistently outperforms both the RGB-T fusion baseline SEAR and RGB-only methods including DUSt3R, MASt3R, Dark3R, and DepthAnything3 across low-visibility benchmarks.

Depth and camera pose estimation on well-lit scene — **Depth and camera pose estimation performance on well-lit scenes.** The proposed thermal-to-RGB feature fusion modules selectively route only geometry-relevant information from the thermal stream, minimizing performance degradation on well-lit scenes while improving 3D geometry estimation under dark conditions.

**Ablation study.** Effectiveness of each proposed module and computational cost for RGB-T multimodal nighttime 3D geometry estimation.

BibTeX

@article{YourPaperKey2024,
  title={Your Paper Title Here},
  author={First Author and Second Author and Third Author},
  journal={Conference/Journal Name},
  year={2024},
  url={https://your-domain.com/your-project-page}
}