Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is that an artifact of the training data? Where are all these original images with that cartoony look that it was trained on?


A large part of deviantart.com would fit that description. There are also a lot of cartoony or CG images in communities dedicated to fanart. Another component in there is probably the overly polished and clean look of stock images, like the front page results of shutterstock.

"Typical" AI images are this blend of the popular image styles of the internet. You always have a bit of digital drawing + cartoon image + oversaturated stock image + 3d render mixed in. Models trained on just one of these work quite well, but for a generalist model this blend of styles is an issue


> There are also a lot of cartoony or CG images in communities dedicated to fanart.

Asian artists don't color this way though; those neon oversaturated colors are a Western style.

(This is one of the easiest ways to tell a fake-anime western TV show, the colors are bad. The other way is that action scenes don't have any impact because they aren't any good at planning them.)


Wild speculation: video game engines. You want your model to understand what a car looks like from all angles, but it’s expensive to get photos of real cars from all angles, so instead you render a car model in UE5, generating hundreds of pictures of it, from many different angles, in many different colors and styles.


I've heard this is downstream of human feedback. If you ask someone which picture is better, they'll tend to pick the more saturated option. If you're doing post-training with humans, you'll bake that bias into your model.


Ever since Midjourney popularized it, image generation models are often posttrained on more "aesthetic" subsets of images to give them a more fantasy look. It also help obscure some of the imperfections of the AI.


.. either that or they are padding out their training data with scads of relatively inexpensive to produce 3d rendered images</speculation>


It's largely an artifact of classifier-free guidance used in diffusion models. It makes the image generation more closely follow the prompt but also makes everything look more saturated and extreme.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: