Grounding Keypoint Descriptors into 3D-Gaussian Splatting for Visual Localization in Dynamic Indoor/Outdoor Environments
https://doi.org/10.17586/0021-3454-2025-68-9-781-791
Abstract
Robust visual localization in real-world conditions remains a challenging task, particularly in the presence of dynamic objects and transient distractors. While neural scene representations such as 3D Gaussian Splatting (3DGS) or NeRF offer compact encoding of scene geometry and appearance, they are sensitive to static world assumption due to their reliance on photometric consistency. In this work, we present a robust visual localization framework that leverage 3DGS with semantic-aware masking strategy to improve accuracy in dynamic scenes. Our approach extends GSplatLoc, which is a two-stage pipeline: first integrate dense and lightweight keypoint descriptors from the XFeat network into the 3DGS representation, enabling efficient 2D-3D matching for coarse pose estimation. To mitigate the impact of dynamic distractors, we incorporate semantic masks generated from a classifier that utilizes a pre-trained diffusion model to exclude inconsistent regions during 3D modeling. In the second stage, the initial pose is refined using a rendering-based photometric alignment loss. Experiments on both indoor and outdoor dynamic benchmarks demonstrate that our method achieves superior performance compared to baseline method in challenging dynamic environments.
Keywords
About the Authors
M. MohratRussian Federation
Malik Mohrat — Post-Graduate Student; Faculty of Control Systems and Robotics
St. Petersburg
G. K. Sidorov
Russian Federation
Gennady K. Sidorov — Graduate Student; IFaculty of Control Systems and Robotics
St. Petersburg
D. D. Gridusov
Russian Federation
Denis D. Gridusov — Bachelor Student; Faculty of Control Systems and Robotics
St. Petersburg
S. A. Kolyubin
Russian Federation
Sergey A. Kolyubin — Dr. Sci., Professor; ITMO University, Faculty of Control Systems and Robotics
St. Petersburg
References
1. Dong Z., Zhang G., Jia J., and Bao H. IEEE 12th International Conference on Computer Vision, Sep. 2009, pp. 1538– 1545, DOI: 10.1109/ICCV.2009.5459273.
2. Heng L. et al. International Conference on Robotics and Automation (ICRA), May 2019, pp. 4695–4702, DOI: 10.1109/ICRA.2019.8793949.
3. Mildenhall B., Srinivasan P.P., Tancik M., Barron J.T., Ramamoorthi R., and Ng R. Commun. ACM, 2022, no. 1(65), pp. 99–106, DOI: 10.1145/3503250.
4. Kerbl B., Kopanas G., Leimkühler T., and Drettakis G. ACM Trans Graph, 2023, no. 4(42), pp. 139.
5. Sabour S., Vora S., Duckworth D., Krasin I., Fleet D.J., and Tagliasacchi A. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 20626–20636, http://openaccess.thecvf.com/content/CVPR2023/html/Sabour_RobustNeRF_Ignoring_Distractors_With_Robust_Losses_CVPR_2023_paper.html.
6. Tang L., Jia M., Wang Q., Phoo C.P., and Hariharan B. Adv. Neural Inf. Process. Syst., 2023, vol. 36, pp. 1363–1389.
7. Martin-Brualla R., Radwan N., Sajjadi M.S., Barron J.T., Dosovitskiy A., and Duckworth D. Proceedings of the IEEE/ CVF conference on computer vision and pattern recognition, 2021, pp. 7210–7219, https://openaccess.thecvf.com/content/CVPR2021/html/Martin-Brualla_NeRF_in_the_Wild_Neural_Radiance_Fields_for_Unconstrained_Photo_CVPR_2021_paper.html?ref=labelbox.ghost.io.
8. Ren W., Zhu Z., Sun B., Chen J., Pollefeys M., and Peng S. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8931–8940, https://openaccess.thecvf.com/content/CVPR2024/html/Ren_NeRF_On-the-go_Exploiting_Uncertainty_for_Distractor-free_NeRFs_in_the_Wild_CVPR_2024_paper.html.
9. Oquab M. et al. arXiv: arXiv:2304.07193, Feb. 02, 2024, DOI: 10.48550/arXiv.2304.07193.
10. Dahmani H., Bennehar M., Piasco N., Roldão L., and Tsishkou D. Computer Vision — ECCV 2024, Lecture Notes in Computer Science, Cham, Springer Nature Switzerland, 2025, vol. 15134, pp. 325–340, DOI: 10.1007/978-3-03173116-7_19.
11. Zhang D., Wang C., Wang W., Li P., Qin M., and Wang H. Computer Vision — ECCV 2024, Lecture Notes in Computer Science, Cham, Springer Nature Switzerland, 2025, vol. 15134, pp. 341–359, DOI: 10.1007/978-3-031-73116-7_20.
12. Wang Y., Wang J., and Qi Y. arXiv: arXiv:2406.02407, Jun. 04, 2024, DOI: 10.48550/arXiv.2406.02407.
13. Zhou Q., Maximov M., Litany O., and Leal-Taixé L. Computer Vision — ECCV 2024, Lecture Notes in Computer Science, Cham, Springer, Nature Switzerland, 2025, vol. 15082, pp. 108–127, DOI: 10.1007/978-3-031-72691-0_7.
14. Sabour S. et al. ACM Trans. Graph., 2025, no. 2(44), pp. 1–11, DOI: 10.1145/3727143.
15. Chen S., Li X., Wang Z., and Prisacariu V.A. Computer Vision — ECCV 2022, Lecture Notes in Computer Science, Cham, Springer Nature Switzerland, 2022, vol. 13670, pp. 1–17, DOI: 10.1007/978-3-031-20080-9_1.
16. Chen S. et al. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20987–20996, http://openaccess.thecvf.com/content/CVPR2024/html/Chen_Neural_Refinement_for_Absolute_Pose_Regression_with_Feature_Synthesis_CVPR_2024_paper.html.
17. Yen-Chen L., Florence P., Barron J.T., Rodriguez A., Isola P., and Lin T.-Y. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 1323–1330, https://ieeexplore.ieee.org/abstract/document/9636708/.
18. Kobayashi S., Matsumoto E., and Sitzmann V. Adv. Neural Inf. Process. Syst., 2022, vol. 35, pp. 23311–23330.
19. Tschernezki V., Laina I., Larlus D., and Vedaldi A. International Conference on 3D Vision (3DV), IEEE, 2022, pp. 443–453, https://ieeexplore.ieee.org/abstract/document/10044452/.
20. Zhao B., Yang L., Mao M., Bao H., and Cui Z. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 7450–7459, https://ojs.aaai.org/index.php/AAAI/article/view/28576.
21. Sun Y. et al. arXiv: arXiv:2312.09031, Mar. 20, 2024, DOI: 10.48550/arXiv.2312.09031.
22. Botashev K., Pyatov V., Ferrer G., and Lefkimmiatis S. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2024, pp. 5664–5671, https://ieeexplore.ieee.org/abstract/document/10801919/.
23. DeTone D., Malisiewicz T., and Rabinovich A. Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236, https://openaccess.thecvf.com/content_cvpr_2018_workshops/w9/html/DeTone_SuperPoint_Self-Supervised_Interest_CVPR_2018_paper.html.
24. Dusmanu M. et al. Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2019, pp. 8092–8101, http://openaccess.thecvf.com/content_CVPR_2019/html/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.html.
25. Revaud J., De Souza C., Humenberger M., and Weinzaepfel P. Adv. Neural Inf. Process. Syst., 2019, vol. 32, https://proceedings.neurips.cc/paper/2019/hash/3198dfd0aef271d22f7bcddd6f12f5cb-Abstract.html.
26. Lindenberger P., Sarlin P.-E., and Pollefeys M. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17627–17638, http://openaccess.thecvf.com/content/ICCV2023/html/Lindenberger_LightGlue_Local_Feature_Matching_at_Light_Speed_ICCV_2023_paper.html.
27. Sun J., Shen Z., Wang Y., Bao H., and Zhou X. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8922–8931, http://openaccess.thecvf.com/content/CVPR2021/html/Sun_LoFTR_Detector-Free_Local_Feature_Matching_With_Transformers_CVPR_2021_paper.html.
28. Potje G., Cadar F., Araujo A., Martins R., and Nascimento E.R. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2682–2691, http://openaccess.thecvf.com/content/CVPR2024/html/otje_XFeat_Accelerated_Features_for_Lightweight_Image_Matching_CVPR_2024_paper.html.
29. Lindenberger P., Sarlin P.-E., Larsson V., and Pollefeys M. Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5987–5997, http://openaccess.thecvf.com/content/ICCV2021/html/Lindenberger_Pixel-Perfect_Structure-From-Motion_With_Featuremetric_Refinement_ICCV_2021_paper.html.
30. Sidorov G., Mohrat M., Gridusov D., Rakhimov R., and Kolyubin S. arXiv: arXiv:2409.16502, Mar. 20, 2025, DOI: 10.48550/arXiv.2409.16502.
31. Zhou S. et al. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21676–21685, http://openaccess.thecvf.com/content/CVPR2024/html/Zhou_Feature_3DGS_Supercharging_3D_Gaussian_Splatting_to_Enable_Distilled_Feature_CVPR_2024_paper.html.
32. Liu H.-T.D., Williams F., Jacobson A., Fidler S., and Litany O. Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings, Vancouver BC Canada: ACM, Aug. 2022, pp. 1–13, DOI: 10.1145/3528233.3530713.
33. Shavit Y., Ferens R., and Keller Y. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2733–2742, http://openaccess.thecvf.com/content/ICCV2021/html/Shavit_Learning_Multi-Scene_Absolute_Pose_Regression_With_Transformers_ICCV_2021_paper.html.
Review
For citations:
Mohrat M., Sidorov G.K., Gridusov D.D., Kolyubin S.A. Grounding Keypoint Descriptors into 3D-Gaussian Splatting for Visual Localization in Dynamic Indoor/Outdoor Environments. Journal of Instrument Engineering. 2025;68(9):781-791. (In Russ.) https://doi.org/10.17586/0021-3454-2025-68-9-781-791






















