"Guiding Text-to-Image Diffusion Model Towards Grounded Generation. (arXiv:2301.05221v1 [cs.CV])" — Enhance a pre-trained text-to-image diffusion model to simultaneously generate images and segmentation masks for the corresponding visual entities described in the text prompt.
Paper: http://arxiv.org/abs/2301.05221
Code: No code in linked repo (yet)
#AI #CV #NewPaper #DeepLearning #MachineLearning
<<Find this useful? Please boost so that others can benefit too >>