Jump to content

How does your avatar look when enhanced by AI?


Danielle Atheria
 Share

Recommended Posts

I was able to get controlnet working for a short time, but only was able to get a few images before I started to get more error message

The original image:

Snapshot_120.png.dc9c760b71f9cf8d44c9509939e4b305.png

A watercolor prompt:

Beautiful_Lighting__Watercolor_S0_St50_G7.5(1).jpeg.c615c4972ef486db1f9dd03f310881f9.jpeg

Here is a realistic prompt below:

Beautiful_Lighting__Realistic_S2315265894_St50_G1.1.jpeg.a896726b86a445a8358f53d202c0700c.jpeg

I had to use a LORA to ensure the faces did not turn out funky,  the watercolor image had a higher Guidance Scale of 7.5, which is why you see a different outfit.  I find if you lower it, the outfit will change less.  Both of these were done with 50 steps, using cyberrealistic for the model.  I would have liked to have done more fine touching, and added more steps but my video card is older, and it takes forever to render, especially with controlnet.  

  • Like 6
  • Thanks 1
Link to comment
Share on other sites

2 minutes ago, Istelathis said:

I was able to get controlnet working for a short time, but only was able to get a few images before I started to get more error message

The original image:

Snapshot_120.png.dc9c760b71f9cf8d44c9509939e4b305.png

A watercolor prompt:

Beautiful_Lighting__Watercolor_S0_St50_G7.5(1).jpeg.c615c4972ef486db1f9dd03f310881f9.jpeg

Here is a realistic prompt below:

Beautiful_Lighting__Realistic_S2315265894_St50_G1.1.jpeg.a896726b86a445a8358f53d202c0700c.jpeg

I had to use a LORA to ensure the faces did not turn out funky,  the watercolor image had a higher Guidance Scale of 7.5, which is why you see a different outfit.  I find if you lower it, the outfit will change less.  Both of these were done with 50 steps, using cyberrealistic for the model.  I would have liked to have done more fine touching, and added more steps but my video card is older, and it takes forever to render, especially with controlnet.  

Interesting how In the watercolor, your breasts were replaced with a tasteful blouse!

Edited by Love Zhaoying
  • Like 2
Link to comment
Share on other sites

2 minutes ago, Marianne Little said:

And not just that. The cap disappeared, and the hair changed. I also think the head has a slight tilt forward, and the eyes down.

It's like one of those "Spot the Differences" challenges! I was thinking it was the AI making editorial changes because "watercolors must be just so, to be proper; one must maintain standards of decorum!"

  • Haha 1
Link to comment
Share on other sites

29 minutes ago, Love Zhaoying said:

It's like one of those "Spot the Differences" challenges! I was thinking it was the AI making editorial changes because "watercolors must be just so, to be proper; one must maintain standards of decorum!"

It has totally changed the avatar. Maybe it is no caps or crop tops in what the AI is using as reference for "watercolor"?

  • Like 4
  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Marianne Little said:

And not just that. The cap disappeared, and the hair changed. I also think the head has a slight tilt forward, and the eyes down.

 

1 hour ago, Love Zhaoying said:

It's like one of those "Spot the Differences" challenges! I was thinking it was the AI making editorial changes because "watercolors must be just so, to be proper; one must maintain standards of decorum!"

From what experimentation I have done, it has to do with guidance scale as well as prompt strength in easy diffusion, at least that has been my experience.  Here is one using img2img with a GS 7.5 and PS of .55, and a prompt for a woman holding a cat, not a great picture but using the same original picture I had posted above. I ran this one with 100 steps trying to get the cat legs to form, but alas, they decided to remain hidden.

extra.jpeg.774847c9a4a41bdd5a053828f58ceaf9.jpeg

 

 

Another one using the same img2img with the prompt strength raised to .99 and guidance scale of 15

image.jpeg.a4b7410f30d16dc5b5f716279411622d.jpeg

 

 

Edited by Istelathis
  • Like 7
Link to comment
Share on other sites

1 hour ago, Marianne Little said:

It has totally changed the avatar. Maybe it is no caps or crop tops in what the AI is using as reference for "watercolor"?

I totally agree with you. AI, as we all know, is only as good as whatever has been programmed into it, and it would make sense to make a more demure image. I guess if we all were using the same software, we'd start to see more similarities, like there was that fun software a couple of years ago (Toon Me) where you'd upload a photo of your avatar (or real life self), and be given four different cartoon versions. 

It all seems very clever, and must have taken plenty of man (woman/person) hours to program. It's hard to go back into Second Life itself and see myself without all the make-up, as it were, after creating an AI representation of my avatar. 

  • Like 3
  • Thanks 1
Link to comment
Share on other sites

If I may - a few words on the various settings and what they do:

CFG or Guidance Scale - Governs how much you let the AI off the leash. Low CFG values means it all but ignores your prompt while high CFG means it will strictly adhere to the prompt but neglect any lessons it knows about image composition and such. A value of 6.5 - 7 is usually ideal unless you explicitly want it to go wild or strict.

Steps and Sampler - These need to be configured in unison. A sampler is what turns an image from noise into the result. Each sampler needs at least a certain amount of steps to work but anything past that is just pointlessly blowing out GPU power. Exception so called ancestral samplers (usually marked with an a in their name like "Euler a"), these can run forever, continously adding new noise. As a rule of thumb: Most will do fine with 25 Steps. DDIM will work with 10, Euler is older and may need 80 or so.

Denoise or Denoising - How much the image is meant to change from the source image you give it, ranging from 0 to 1,0. A value of 1 will all but throw out the image and a 0 will do no changes at all. This is what you will need to find a decent balance with.

The choice of the model is also an important one and it helps to understand how and why things came to be. There are three major revisions of Stable Diffusion. 1.5, 2.x and SDXL. Out of these you can outright dismiss 2.x, it was a failed attempt. 2.x tried to remove the nudity from the model and very quickly realised that nudity was what made the model understand human anatomy. The resulting images were a mess of uncanny horrors.

If you've got a good PC, use SDXL and 1.5 If you have got a less good PC, use 1.5. That leaves the choice of the model and here it helps to understand that there were for the most part three major development paths that have all converged by now. Anime, based on leaked Anime models (usually NovalAI), base stable diffusion and... asian girls. There was a lot of hype from korea early on. These have all merged into one by now just with various weights attributed to this or that and maybe some additional training in certain aspects.

Newer models generally work much better than older models. NSFW models will do better with anatomy (including hands) but need heavy counter prompting to put on clothes. The merge with anime models means that unless you want massive tracts of land, you'll need to actually prompt OUT certain anatomic sizes. Furthermore, it helps to understand that the Anime models used a different style of prompting, based on booru tags (no, that is not safe for work to google).

If you want certain clothes, you need to prompt for them.

Last but not least, keep prompt length and image resolution in mind. SD 1.5 was trained on 512x512, you can go up to 768 but anything beyond that goes kaputt. Use upscaling. SDXL was trained even more rigidly on a handful of resolutions around 1024x1024 and can also go +-256 in either direction (bit more but eh, find a cheatsheet yourself :P). Prompt length then, each prompt is split into one or several brackets of 75 tokens. This is important to know because it will govern things pretty strongly. Let's say you've got 76 tokens in your prompt, then the first 75 tokens will form one bracket - and the remaining token will form a bracket on its own. Then both brackets will be weighed equally - resulting in that one token coming on super strong. So stay below 75 tokens ideally or look into "BREAK"ing prompts.

Alright, enough crash course SD. Took some time to try a more direct conversion without my LORA. This was with a denoising value of 0.6, some prompting and a lot of cherry picking on SDXL. SDXL tends to make faces come out a bit strong on bone structure and none of the stable diffusion models can deal with my eye colour in any reliable shape or form.

sltoai.thumb.png.bc91638c1ba6a3a142a1c1c3058a92f0.pngsltoaires.thumb.png.4c0f4d28b94f31e2beb7044fd175c12b.png

lf you want more consistency than that with Stable Diffusion, you either want to make heavy use of controlnets or train your own LORA. Something I've done together with a friend of mine. That's also what allows you to translate images into different styles more easily. Above result is cherry-picked, below result is not (but could be better):

sltoaires2.thumb.png.5d73271ffe51f438fdda3c56a4bf1549.png

Footnote: Talking about bias in training data, there's a funny social one and that's age. Most of us want to appear younger than we are, as we get older. Some won't answer the age question truthfully. Look, long story short, the training data thinks a fourty year old person will look like an actually sixty year old person, because so many sixty year olds lied about their age being fourty - and furthermore, so did the fourty year olds, which you can find in thirty year old category and if you want thirty year olds, you're gonna find them in the 20-25 age bracket. I found that funny :D.
 

Edit:

Oh, oh, an interesting use case for sl photography! You can push a sl screenshot through image to image, mid denoise and then take cues on lighting your scene more believably. If you compare the three shots, the biggest flaw of my sl one is the flat frontal lighting.

Heck, you can even use it as a little indicator where to manually add highlights in post.

Edited by ValKalAstra
  • Like 5
  • Thanks 5
Link to comment
Share on other sites

5 hours ago, Istelathis said:

I ran this one with 100 steps trying to get the cat legs to form, but alas, they decided to remain hidden.

I had seen some articles saying AI is bad at getting "human hands" right (# of fingers, positions, etc.) so you'd think that a cat's legs would be easier (or maube not).

  • Like 1
Link to comment
Share on other sites

3 minutes ago, Love Zhaoying said:

I had seen some articles saying AI is bad at getting "human hands" right (# of fingers, positions, etc.) so you'd think that a cat's legs would be easier (or maube not).

Hands need some heavy lifting to get right. It's a definitive weakness for the reason you've mentioned. It can be done though. As for why the cat is a bit of a mess - that happens with multiple subjects in an image. You'll need to either do so called regional prompting or inpainting. To compare, here is a majestic chonker with the right amount of legs.

00018-87301247.thumb.png.834c6d5b658cee9547b81f42000e83db.png

  • Like 6
Link to comment
Share on other sites

5 hours ago, Maitimo said:

These all look lovely but sadly I can't participate - the video instructions lost me in the first minute. Is there an easier way without all the Github and low-level installation?

I think the same. Tomorrow I will try to get some help, and see if they can install it for me.

  • Like 1
Link to comment
Share on other sites

img2img and ControlNet Image:

Snapshot_142.png.cd953bd1c5a3d6695cca586c400dfd90.png

Software used:

Easy Diffusion

Prompt:

Illustration, Character Design

Negative Prompt:
Naked

Extra Details:

Seed: 2220702992, Dimensions: 768x400, Sampler: unipc_tu, Inference Steps: 25, Guidance Scale: 1.1, Model: dreamshaper_8, Negative Prompt: naked, Prompt Strength: 0.55, Preserve Color Profile: true, ControlNet Model: control_v11f1p_sd15_depth
Processed 1 images in 1 minute 3 seconds

Illustration__Character_Design_S2220702992_St25_G1.1.jpeg.fa58ff55aa3efe7d4a0b030395a543b5.jpeg

 

Edited by Istelathis
  • Like 4
Link to comment
Share on other sites

2 hours ago, Marianne Little said:

Did you ask for it to change so much? Only the face is a bit similar to SL face, but more realistic.

And why did it slim you? Give you pink hair? Set you in a different enviroment? Turned your clothes to something cosplay Aladdin style?

I used the fantasy prompt or whatever it's called.  Style.

Edited by Rowan Amore
  • Like 2
Link to comment
Share on other sites

On 11/7/2023 at 10:49 AM, ValKalAstra said:

If I may - a few words on the various settings and what they do:

CFG or Guidance Scale - Governs how much you let the AI off the leash. Low CFG values means it all but ignores your prompt while high CFG means it will strictly adhere to the prompt but neglect any lessons it knows about image composition and such. A value of 6.5 - 7 is usually ideal unless you explicitly want it to go wild or strict.

Steps and Sampler - These need to be configured in unison. A sampler is what turns an image from noise into the result. Each sampler needs at least a certain amount of steps to work but anything past that is just pointlessly blowing out GPU power. Exception so called ancestral samplers (usually marked with an a in their name like "Euler a"), these can run forever, continously adding new noise. As a rule of thumb: Most will do fine with 25 Steps. DDIM will work with 10, Euler is older and may need 80 or so.

Denoise or Denoising - How much the image is meant to change from the source image you give it, ranging from 0 to 1,0. A value of 1 will all but throw out the image and a 0 will do no changes at all. This is what you will need to find a decent balance with.

The choice of the model is also an important one and it helps to understand how and why things came to be. There are three major revisions of Stable Diffusion. 1.5, 2.x and SDXL. Out of these you can outright dismiss 2.x, it was a failed attempt. 2.x tried to remove the nudity from the model and very quickly realised that nudity was what made the model understand human anatomy. The resulting images were a mess of uncanny horrors.

If you've got a good PC, use SDXL and 1.5 If you have got a less good PC, use 1.5. That leaves the choice of the model and here it helps to understand that there were for the most part three major development paths that have all converged by now. Anime, based on leaked Anime models (usually NovalAI), base stable diffusion and... asian girls. There was a lot of hype from korea early on. These have all merged into one by now just with various weights attributed to this or that and maybe some additional training in certain aspects.

Newer models generally work much better than older models. NSFW models will do better with anatomy (including hands) but need heavy counter prompting to put on clothes. The merge with anime models means that unless you want massive tracts of land, you'll need to actually prompt OUT certain anatomic sizes. Furthermore, it helps to understand that the Anime models used a different style of prompting, based on booru tags (no, that is not safe for work to google).

If you want certain clothes, you need to prompt for them.

Last but not least, keep prompt length and image resolution in mind. SD 1.5 was trained on 512x512, you can go up to 768 but anything beyond that goes kaputt. Use upscaling. SDXL was trained even more rigidly on a handful of resolutions around 1024x1024 and can also go +-256 in either direction (bit more but eh, find a cheatsheet yourself :P). Prompt length then, each prompt is split into one or several brackets of 75 tokens. This is important to know because it will govern things pretty strongly. Let's say you've got 76 tokens in your prompt, then the first 75 tokens will form one bracket - and the remaining token will form a bracket on its own. Then both brackets will be weighed equally - resulting in that one token coming on super strong. So stay below 75 tokens ideally or look into "BREAK"ing prompts.

Alright, enough crash course SD. Took some time to try a more direct conversion without my LORA. This was with a denoising value of 0.6, some prompting and a lot of cherry picking on SDXL. SDXL tends to make faces come out a bit strong on bone structure and none of the stable diffusion models can deal with my eye colour in any reliable shape or form.

sltoai.thumb.png.bc91638c1ba6a3a142a1c1c3058a92f0.pngsltoaires.thumb.png.4c0f4d28b94f31e2beb7044fd175c12b.png

lf you want more consistency than that with Stable Diffusion, you either want to make heavy use of controlnets or train your own LORA. Something I've done together with a friend of mine. That's also what allows you to translate images into different styles more easily. Above result is cherry-picked, below result is not (but could be better):

sltoaires2.thumb.png.5d73271ffe51f438fdda3c56a4bf1549.png

Footnote: Talking about bias in training data, there's a funny social one and that's age. Most of us want to appear younger than we are, as we get older. Some won't answer the age question truthfully. Look, long story short, the training data thinks a fourty year old person will look like an actually sixty year old person, because so many sixty year olds lied about their age being fourty - and furthermore, so did the fourty year olds, which you can find in thirty year old category and if you want thirty year olds, you're gonna find them in the 20-25 age bracket. I found that funny :D.
 

Edit:

Oh, oh, an interesting use case for sl photography! You can push a sl screenshot through image to image, mid denoise and then take cues on lighting your scene more believably. If you compare the three shots, the biggest flaw of my sl one is the flat frontal lighting.

Heck, you can even use it as a little indicator where to manually add highlights in post.

I like what you did here.

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

On 11/7/2023 at 12:22 PM, Maitimo said:

These all look lovely but sadly I can't participate - the video instructions lost me in the first minute. Is there an easier way without all the Github and low-level installation?

Oh, sorry - I kind of totally missed your post there. There are somewhat easier ways but at the end of the day, the technology is still pretty early and user unfriendly. For local installs, Some do swear by EasyDiffusion, with the focus of it being a "one click installer" that takes care of the technical aspects. I can't vouch for it as I never used it but @Istelathis seems to be using it https://github.com/easydiffusion/easydiffusion

But... yah. It's still user unfriendly. There are also some websites that aim to provide a more user friendly alternative. I haven't kept up with them but most offer some free generations per day and then ask you to pay but as long as you stay below the threshold, it should be fine. One example might be https://playgroundai.com/ (caveat: You need to remember to mark your session private there and they tend to label some things differently, such as denoising to image strength). Good enough to toy around with though.

/edit: PlaygroundAi seems to be 500 free images per day

 

Edited by ValKalAstra
  • Like 2
  • Thanks 1
Link to comment
Share on other sites

1 hour ago, ValKalAstra said:

Oh, sorry - I kind of totally missed your post there. There are somewhat easier ways but at the end of the day, the technology is still pretty early and user unfriendly. For local installs, Some do swear by EasyDiffusion, with the focus of it being a "one click installer" that takes care of the technical aspects. I can't vouch for it as I never used it but @Istelathis seems to be using it https://github.com/easydiffusion/easydiffusion

But... yah. It's still user unfriendly. There are also some websites that aim to provide a more user friendly alternative. I haven't kept up with them but most offer some free generations per day and then ask you to pay but as long as you stay below the threshold, it should be fine. One example might be https://playgroundai.com/ (caveat: You need to remember to mark your session private there and they tend to label some things differently, such as denoising to image strength). Good enough to toy around with though.

/edit: PlaygroundAi seems to be 500 free images per day

 

Just tried playgroundai, on my phone now. Would post my results, but I need to work on my prompts for it... also, the results ended up coming out x rated ish so nope, not here lol!

Will try Easy Diffusion later. The minimum specs aren't too high. I can actually run that on my 5 year old laptop (still my main PC for SL... *sniff*). Linux version is a huge plus for me, better OS overhead there vs Win11. Thanks!

  • Like 2
Link to comment
Share on other sites

2 minutes ago, JeromFranzic said:

Just tried playgroundai, on my phone now. Would post my results, but I need to work on my prompts for it... also, the results ended up coming out x rated ish so nope, not here lol!

Will try Easy Diffusion later. The minimum specs aren't too high. I can actually run that on my 5 year old laptop (still my main PC for SL... *sniff*). Linux version is a huge plus for me, better OS overhead there vs Win11. Thanks!

Aye - that's where the negative prompts come in handy. Put "Nude" or "Naked" in the negative prompt field (I think you need to toggle that to active on playgroundai). If it still comes out undressed, you can increase the weight of those words like so: (nude:1.5).

 

  • Like 1
  • Thanks 2
Link to comment
Share on other sites

3 minutes ago, ValKalAstra said:

Aye - that's where the negative prompts come in handy. Put "Nude" or "Naked" in the negative prompt field (I think you need to toggle that to active on playgroundai). If it still comes out undressed, you can increase the weight of those words like so: (nude:1.5).

 

Right... need to add extra negative prompts. Ok. :^)

  • Like 3
Link to comment
Share on other sites

All right then... I really need to work on my prompts, but I was able to run Easy Diffusion on my PC. Installed it in Win11 and Linux. The results are from Linux (Ubuntu Studio LTS), will test Win11 later. Running it heats up my laptop about as much as SL does, in shorter bursts.

Original SL photo


image.thumb.jpeg.494ed773fec1849d38feb6bafa09fcc6.jpeg

Results, first one is at 768 res, the second one is at 960. Limited to that because of GPU memory (4 GB) and the GPU used (GTX 1050). First result took 2 minutes to generate, second result was a little over 4 minutes to generate.

image.thumb.jpeg.9305a4f77410506143648477be92f1d0.jpeg

image.thumb.jpeg.74f2c3a2a850ef4f80fbb5e6d37f4232.jpeg

  • Like 3
  • Thanks 1
  • Haha 1
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...