DarkVGGT: Seeing Through Darkness Using Thermal Geometry without Daylight Tax

1University of Minnesota    2Texas A&M University    3Stanford University
Under Review
*Corresponding Author
While VGGT fails in dark scenes due to degraded RGB cues, DarkVGGT leverages thermal geometry to produce reliable 3D reconstruction.

Abstract

Recent feed-forward 3D reconstruction methods have demonstrated strong performance and flexibility in efficient end-to-end scene geometry estimation from image streams. However, their reliance on visible-light appearance makes them vulnerable in dark and low-visibility environments, where RGB cues are severely degraded and geometric evidence becomes ambiguous. To address this challenge, we propose DarkVGGT, an RGB-T feed-forward geometry framework that uses physics-aware thermal modeling for robust 3D estimation in low-light scenes. DarkVGGT introduces two complementary modules. First, physics-inspired thermal factorization extracts emissive-dominant, geometry-consistent thermal cues while isolating sparse reflective residuals that may introduce geometric ambiguity. Second, geometry-shared thermal routing isolates modality-invariant geometric structures from thermal-specific patterns, selectively injecting reliability-aware structural guidance into the RGB stream. Together, these components enable accurate thermal-informed geometry estimation under degraded RGB conditions while largely preserving performance in well-lit environments. Experiments on low-visibility RGB-T benchmarks demonstrate consistent improvements in both depth and camera pose estimation over existing feed-forward geometry baselines.

Interactive 3D Demo

Compare VGGT base and DarkVGGT point cloud reconstructions. Drag to rotate, scroll to zoom.

Qualitative Visualization

Qualitative comparison of nighttime 3D geometry estimation
Qualitative comparison of nighttime 3D geometry estimation. DarkVGGT outperforms RGB-only baselines (Dark3R, VGGT) and produces more GT-aligned poses and geometrically consistent point clouds than the RGB-T model SEAR.
Reconstruction quality comparison between SEAR and DarkVGGT
More reconstruction quality vs. SEAR. By routing only geometry-relevant thermal cues into the RGB branch, DarkVGGT consistently surpasses SEAR, with the advantage most evident in aggressively dark indoor scenes.

Methodology

Overview of the DarkVGGT framework
Overview of the DarkVGGT framework. DarkVGGT factorizes thermal embeddings into emissive and reflective components using a high-pass (HP) filter and projection branches. Building on this factorization, geometry-shared thermal routing (GSTR) separates shared and private thermal tokens, and selectively injects geometry-relevant thermal content into the RGB stream.

Quantitative Results

Depth and camera pose estimation on darkness scene
Depth and camera pose estimation performance on dark scenes. Our method consistently outperforms both the RGB-T fusion baseline SEAR and RGB-only methods including DUSt3R, MASt3R, Dark3R, and DepthAnything3 across low-visibility benchmarks.
Depth and camera pose estimation on well-lit scene
Depth and camera pose estimation performance on well-lit scenes. The proposed thermal-to-RGB feature fusion modules selectively route only geometry-relevant information from the thermal stream, minimizing performance degradation on well-lit scenes while improving 3D geometry estimation under dark conditions.
Ablation study
Ablation study. Effectiveness of each proposed module and computational cost for RGB-T multimodal nighttime 3D geometry estimation.

BibTeX

@article{YourPaperKey2024,
  title={Your Paper Title Here},
  author={First Author and Second Author and Third Author},
  journal={Conference/Journal Name},
  year={2024},
  url={https://your-domain.com/your-project-page}
}