Image2Garment: Simulation-ready Garment Generation from a Single Image

1Stanford 2Google
Method Teaser

Our method enables generation of simulation-ready garments from a single image. We obtain both the garment geometry represented by a 3D mesh and the physical parameters required for simulation. With these we can animate the garment in a simulator showing it interacting with the environment in a realistic manner. Our prediction is optimization-free and can be computed in seconds.

Abstract

Estimating physically accurate, simulation-ready garments from a single image is challenging due to the absence of image-to-physics datasets and the ill-posed nature of this problem. Prior methods either require multi-view capture and expensive differentiable simulation or predict only garment geometry without the material properties required for realistic simulation.

We propose a feed-forward framework that sidesteps these limitations by first fine-tuning a vision–language model to infer material composition and fabric attributes from real images, and then training a light-weight predictor that maps these attributes to the corresponding physical fabric parameters using a small dataset of material–physics measurements.

Our approach introduces two new datasets (FTAG and T2P) and delivers simulation-ready garments from a single image without iterative optimization. Experiments show that our estimator achieves superior accuracy in material composition estimation and fabric attribute prediction, and by passing them through our physics parameter estimator, we further achieve higher fidelity simulations compared to state-of-the-art image-to-garment methods.

Method

Method Overview

We first estimate the garment geometry using an off-the-shelf fine-tuned vision-language-model. Then, we utilize another vision-language-model trained to predict the material properties of the garments. Finally, we use a light-weight predictor to map the estimated garment materials to physical properties of a simulator.

Baseline Comparisons (Simulations)

Input Frame
GarmentRecovery* [1]
AIpparel* [2]
ChatGarment* [3]
Image2Garment (ours)
Ground Truth
Static Input
Static Input
Static Input
Static Input

GarmentRecovery*, AIpparel*, and ChatGarment* use the randomly sampled physical parameters. GarmentRecovery* uses the only published checkpoint.

Image2Garment uses the estimated physics parameters and the same garment geometry as ChatGarment.

In-the-Wild Comparisons

ChatGarment* [3]
Image2Garment
Ground Truth

Ablation Study

Random Params
Our Estimated Params
Ground Truth

References

[1] Li Ren, et al. "Single View Garment Reconstruction Using Diffusion Mapping Via Pattern Coordinates." Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. 2025.
[2] Nakayama Kiyohiro, et al. "AIpparel: A Multimodal Foundation Model for Digital Garments." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025.
[3] Bian Siyuan, et al. "Chatgarment: Garment estimation, generation and editing via large language models." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025.

BibTeX

BibTeX entry will be added once available.