Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well there are 2 types of models, CLIP and diffusion models. With VoC, Disco, etc. latent diffusion, you pick multiple CLIP models and a single diffusion model. The CLIP models are the big gigabyte ones like ViT and RN, and you can use CLIP search engines that search on the LAION datasets to give you a rough idea what will happen when you use those words in your prompts: https://rom1504.github.io/clip-retrieval

I will otherwise refer you to the "Bible" of latent diffusion: https://sweet-hall-e72.notion.site/A-Traveler-s-Guide-to-the...

Whatever isn't covered in there is probably in the Disco Diffusion cheatsheet: https://botbox.dev/disco-diffusion-cheatsheet/

There are tons of resources out there, and it's a nonstop learning and experimenting process to try to achieve what you want.



Thanks again. Now I got my first image out and it ended up being a complete failure. :) I'll keep experimenting / learning.


Welcome to the party! My first image was also a total failure, it can only get better from here ;) Prepare to spend a lot of time reading before you start to make sense of things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: