Apologies if I have a bad tag here, I can change it.
Thus far I have “dabbled” with Stable-Diffusion, whisper, and deepspeech. Opening up a thread for idea generation on my side, or what are additional models I can add to my tool box. What references have you referred to/read to learn. Etc etc. I feel like AI is peeling away at an onion and for a new entrant there is just too much information.
For example one item I never understood. How to people generate photos based on TV characters (whether real or animated.) Does that come down to training? As it stands now, text to image interests me the most based on the amazing things I’ve seen online.
Aside from y’alls thoughts from above I need to explore a way for character recognition in a video - capture a screenshot - then perhaps get a large enough sample for training but that may be worth its own thread
Please, talk about yourself, what you’ve done and your experiences have been. Here to listen (read).
Best,
WWED
PS:
Providing my hardware below in the event there is a chance there is a hardware constraint.
Tl;dr large labeled datasets of examples. You’ll notice that for some prompts stable diffusion doesn’t know/isn’t able to generate viable images. @rv6502 was posting some of their stable diffusion generations on the L1 community discord
I cannot recommend enough https://course.fast.ai/ if you really want to understand every part of what is deep learning/Machine learning. This is a proper course .
Concerning generating images based on character (real or fictitious) you might want to read up on textual inversion, hypernetwork and LoRA . Automatic1111’s documentation is slightly better then before.
In term of real world application i wrote a small keras based REST end point that i can just push images and get enumeration of items in them (very simple yet useful tool for investigation) and an auto transcriber based on whisper .