A collaborative approach to image generation : PASTA

Google Research introduce PASTA, a reinforcement learning agent that refines text-to-image output over multiple turns of interaction with a user by learning their unique preferences. This process is made possible by a novel user simulation technique.

You have a perfect image in your mind. You enter a prompt, hit generate, and the result is close to what you were thinking, but not quite right. You try refining the prompt, adding more detail, but you can’t seem to bridge the gap between your idea and the final image.

This is a common experience. While text-to-image (T2I) models are incredibly powerful, they often struggle to capture the nuance and specificity of an individual’s unique creative intent given just a single prompt. What if we could turn image generation into a collaborative conversation?

With the same starting prompt, “An image of happiness”, PASTA produces dramatically different results for two distinct user types (User Type A and User Type B), showcasing its ability to adapt to an individual’s unique creative style. For example, the result for Type A corresponds to a prompt like “Abstract happy faces, Art Deco inspired geometric shapes, muted jewel-toned background.”

more on Google Research Blog