Stylization Results on Tanks and Temples dataset.
Overview of Styl3R. Our model comprises a structure and an appearance branch, each predicting different Gaussian attributes. The structure branch encodes sparse, unposed images with a shared content encoder, then feeds the resulting tokens into per-view structure decoders with cross-view information sharing. Structure-related attributes are regressed from decoder outputs. The appearance branch encodes a style image into tokens, which attend to content tokens from all views in a stylization decoder. The resulting blended tokens predict Gaussian colors. Alternatively, a content image can provide original colors, enabling stylization or reconstruction.
We present qualitative comparisons with the following state-of-the-art models:
Stylization Results on Tanks and Temples dataset.
More scene style combinations on out-of-domain data.
Train from Tanks and Temples dataset.
Room from NeRF LLFF dataset.
Stylization results interpolated inbetween 3 styles.
@misc{wang2025styl3rinstant3dstylized,
title={Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles},
author={Peng Wang and Xiang Liu and Peidong Liu},
year={2025},
eprint={2505.21060},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.21060},
}