Jump to content

How does your avatar look when enhanced by AI?


Danielle Atheria
 Share

Recommended Posts

7 hours ago, Bagnu said:

AI seems to have a problem with hands.

Huge problem with many of the existing AI picture generators. Keep trying. I think the training sets need lots of pictures of dance, sports, and wrestling, so the model learns more of what human bodies look like in odd positions.

I've been playing with Stable Diffusion, trying to get action shots.

feral34.thumb.png.a4f986ac9f7d3827d5c6c83c767fe803.png

Now what would it take to make an SL avatar look like that?

  • Like 8
Link to comment
Share on other sites

1 hour ago, animats said:

Huge problem with many of the existing AI picture generators. Keep trying. I think the training sets need lots of pictures of dance, sports, and wrestling, so the model learns more of what human bodies look like in odd positions.

I've been playing with Stable Diffusion, trying to get action shots.

feral34.thumb.png.a4f986ac9f7d3827d5c6c83c767fe803.png

Now what would it take to make an SL avatar look like that?

I would love to know what settings, model and prompts you used in SD to get something that stunning. I have it installed locally but I have fallen down a rabbit hole of Youtube videos and have not gotten into it much so far.

  • Like 4
Link to comment
Share on other sites

2 hours ago, animats said:

Huge problem with many of the existing AI picture generators. Keep trying. I think the training sets need lots of pictures of dance, sports, and wrestling, so the model learns more of what human bodies look like in odd positions.

I've been playing with Stable Diffusion, trying to get action shots.

feral34.thumb.png.a4f986ac9f7d3827d5c6c83c767fe803.png

Now what would it take to make an SL avatar look like that?

Getting a prompt from https://stable-diffusion.site/image-to-prompt/ from your image:

"a woman riding a motorcycle in the rain, cinematic. by leng jun, movie still 8 k, closeup portrait shot, by Raymond Han, by Fei Danxu, ross tran 8 k, beauty blade runner woman, shot on nikon z9, 8 k movie still, soaked, of taiwanese girl with tattoos, 8 0 s asian neon movie still, wet streets"

I goofed around for a couple of minutes, "heavy rain to the prompt, took a screenshot using my scooter as a bike:

Initial Image / ControlNet

Snapshot_214.png.c5dd2bbb227c562e084cb93276e5e33f.png

 

a_woman_riding_a_motorcycle_in_heavy_rain__cinematic__by_leng_jun__mov_S122877413_St25_G13.8.jpeg.63da6e0fde1692e68ebd98f4413c5228.jpeg

 

Dimensions: 512x512, Sampler: dpmpp_2m_sde, Inference Steps: 25, Guidance Scale: 13.8, Model: cyberrealistic_v33, VAE: vae-ft-mse-840000-ema-pruned, Prompt Strength: 0.99, Preserve Color Profile: false, ControlNet Model: control_v11p_sd15_canny

Pushing through another 25 steps using the above settings produced this:

50steps.jpeg.ee1bb8f9144cd859d8e680c61ee3d0a4.jpeg

I had to raise the prompt strength and guidance scale quite a lot to get the background to change more significantly, as well as the moped.  I'm also still having problems upscaling, so the image is stuck at 512x512

I think you would want to setup your avatar for the action shot first, then work around that screenshot playing around with prompts, the strength of the prompt and guidance scale.  You probably would want an outfit that closely resembled the image you wanted diffusion to produce, at least that is a guess on my part, as well bike.

Edited by Istelathis
  • Like 7
Link to comment
Share on other sites

Very work nice reversing and duplicating that.

My actual prompt was close to "Feral fit muscular aroused angry wet dirty Asian biker girl with no shirt and bare midriff riding hard in dangerous alley in heavy rain at night". Emotional adjectives do a lot with these models. That's what generated the facial expression.

How to make SL itself look this good? Once we have full PBR for skin layers, and lots of lights, we can get closer. Water droplets on skin may be possible with normals and ambient reflection in a tattoo layer. We still need subsurface scattering, the subtle effect where light going into the skin comes out nearby, a little reddish. The lack of that is why SL skin is stuck in a range from "dead" to "plastic".

Wild feral animated hair is a ways off. Needs too much compute. The other stuff is all do-able with current technology.

Just for fun, another picture in that series.

feral40.thumb.png.db5fde38b3d3f0eec9e7431d2b841cf5.png

Look at the facial expression.

Edited by animats
  • Like 6
Link to comment
Share on other sites

I started with an old picture of myself (I have adjusted my appearance slightly since then, mainly by fixing some issues with my skin, but I digress).  The image I used was a close up from a larger image so it is, as you can see, a bit pixellated.

53346420604_fc7eedebb4_m.jpg

Here are some of the variations I was able to come up with:

53346417424_45372157ef.jpg

53345217712_582bcb90ac.jpg

53346097226_45e9e59db1.jpg

53346417429_2ef67fa0a7.jpg

53345217662_e13ab5c0d7.jpg

53346097216_3a0eb4be7a.jpg

 

Edited by Possum888
typo
  • Like 7
Link to comment
Share on other sites

The one thing I noticed was that there are only so many times you can manipulate the images before strange things start happening, like growing a third leg or a second head.

One amusing thing was that, when I added "bikini" as a prompt or whatever they call it, my boobs doubled in size for every second image.........

  • Haha 4
Link to comment
Share on other sites

6 hours ago, Possum888 said:

I started with an old picture of myself (I have adjusted my appearance slightly since then, mainly by fixing some issues with my skin, but I digress).  The image I used was a close up from a larger image so it is, as you can see, a bit pixellated.

53346420604_fc7eedebb4_m.jpg

Here are some of the variations I was able to come up with:

53346417424_45372157ef.jpg

53345217712_582bcb90ac.jpg

53346097226_45e9e59db1.jpg

53346417429_2ef67fa0a7.jpg

53345217662_e13ab5c0d7.jpg

53346097216_3a0eb4be7a.jpg

 

I wish I could understand why the AI makes certain mistakes, most often on body details, it seems to me.  It gives you crooked teeth in your smiles, adds an extra finger to your left hand in the next to last generated image (plus an odd lump on your right wrist) and then seems take the extra finger away in the last image, perhaps removing one more as well.  How hard is it for software to understand "five fingers" and "five toes".  Or is that too much math for it?  🤣

  • Like 3
  • Haha 1
Link to comment
Share on other sites

Just now, Leora Greenwood said:

It gives you crooked teeth in your smiles

..because realism..?

1 minute ago, Leora Greenwood said:

, adds an extra finger to your left hand in the next to last generated image

Because AI assumes an extra finger is needed for all the pictures "shooting the bird" / "giving the finger"?

1 minute ago, Leora Greenwood said:

(plus an odd lump on your right wrist)

Because AI thinks it must be needed for "fist bumps"?

2 minutes ago, Leora Greenwood said:

How hard is it for software to understand "five fingers" and "five toes".  Or is that too much math for it?  🤣

Because AI doesn't understand the "thumb" is one of the "five fingers"?  Or, because AI really likes polydactyly?

 

  • Haha 1
Link to comment
Share on other sites

27 minutes ago, Leora Greenwood said:

I wish I could understand why the AI makes certain mistakes, most often on body details, it seems to me.  It gives you crooked teeth in your smiles, adds an extra finger to your left hand in the next to last generated image (plus an odd lump on your right wrist) and then seems take the extra finger away in the last image, perhaps removing one more as well.  How hard is it for software to understand "five fingers" and "five toes".  Or is that too much math for it?  🤣

And I thought I had weeded out the images with little oddities like that......

  • Like 4
  • Thanks 1
Link to comment
Share on other sites

For now, I'm going to use a web based upscaler.  I tried https://www.anyrec.io/image-upscaler/ on a couple of the pictures I had generated earlier and was impressed with the quality vs what I can get out of Easy Diffusion's upscaler:

50steps(1).thumb.jpeg.70c09ce9f3a3205052027502a049f8b3.jpeg

438a3e34-3213-494e-8e32-b7c6239929a0.thumb.jpg.03dd88346a5392b7cd88cafb75b3d188.jpg

 

Looking forward to being able to do this on my own computer in hopefully the near future though. The upscaler that works on Easy Diffusion with my video card, makes a mess of  the face.

Edited by Istelathis
  • Like 4
  • Thanks 1
Link to comment
Share on other sites

Snapshot_216.png.4bed23a8433c8faa7249bfb564a02b2c.png

Original Image / ControlNet

test3Upscaled.thumb.jpeg.85868b808b8b1bee3adad1cd83d1c6d2.jpeg

 

This one was upscaled here

I had to do a little inpainting on the original results as the face was doing a little swirl and one of the eyes was not rendered properly.  

The prompt was simply anime

 Dimensions: 512x512, Sampler: dpmpp_2m_sde, Inference Steps: 25, Guidance Scale: 1.1, Model: cyberrealistic_v33, VAE: vae-ft-mse-840000-ema-pruned, Negative Prompt: closed eye, Prompt Strength: 0.6, Preserve Color Profile: false, Strict Mask Border: false, ControlNet Model: control_v11p_sd15_canny

  • Like 5
  • Thanks 1
Link to comment
Share on other sites

1 minute ago, Istelathis said:

Snapshot_216.png.4bed23a8433c8faa7249bfb564a02b2c.png

Original Image / ControlNet

test3Upscaled.thumb.jpeg.85868b808b8b1bee3adad1cd83d1c6d2.jpeg

 

This one was upscaled here

I had to do a little inpainting on the original results as the face was doing a little swirl and one of the eyes was not rendered properly.  

The prompt was simply anime

 Dimensions: 512x512, Sampler: dpmpp_2m_sde, Inference Steps: 25, Guidance Scale: 1.1, Model: cyberrealistic_v33, VAE: vae-ft-mse-840000-ema-pruned, Negative Prompt: closed eye, Prompt Strength: 0.6, Preserve Color Profile: false, Strict Mask Border: false, ControlNet Model: control_v11p_sd15_canny

Except missing the pointy ears, I like this one! 

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Leora Greenwood said:

I wish I could understand why the AI makes certain mistakes, most often on body details, it seems to me.  It gives you crooked teeth in your smiles, adds an extra finger to your left hand in the next to last generated image (plus an odd lump on your right wrist) and then seems take the extra finger away in the last image, perhaps removing one more as well.  How hard is it for software to understand "five fingers" and "five toes".  Or is that too much math for it?  🤣

Any explanation I can give is going to be a gross oversimplification because it's all a lot more complex than that. However, let me give it a try. The issue is multi-facetted. First. Calling it AI is giving the wrong impression, it doesn't understand. To the "AI", a hand is not a hand. It has got no idea what it is. It only knows noise. To train it, it was shown how images were turned to noise and was then told to learn to predict the noise. 

What we do when we make these images is to tell the AI to create a noise image and then substract the predicted noise from it. Then we let it do that for several "Steps" until we get out an image. Note: an image, not the image. Noise and randomness are inherent and the image is not stored anywhere in the model, only what it learned from predicting the noise added to it. A model is an amalgamation of this process over several billion of images over multiple epoches. It has done this again and again and again until it got pretty good at predicting what noise would be added based on what fed into it.

To get out an image in a direction we want, we use tokens (prompts) which are turned into vectors via embeddings and which are then fed into the noise predictor to guide. Very simplified: We put a finger on the noise prediction it has learned and tell it to predict more in THAT direction.

So why does it struggle so much with some concepts and not so much with others? Three reasons of the top of my head.

  1. Complexity
  2. Quantity of available data
  3. Consumer Hardware

Complexity

If you look at guides on how to draw, you will often see human bodies broken down into simple shapes. For example:

A head becomes a circle and an oval.

An arm becomes a sphere (shoulder), cylinder (upper arm), sphere (ellbow), cylinder (lower arm).

Now using just simple shapes, how would you describe a hand?

Maybe sphere (hand) going into a smaller sphere (knuckle), cylinder (index finger), sphere (2nd knuckle), cylinder (index finger continued), sphere (3rd knuckle), cone (finger tip). It's immediately more complex to describe and some say, even famous comic book artists never quite learned how to draw feet (hi Rob Leifeld <3). While the AI doesn't understand what a hand is, it does however run into issues because of the complexity of hands.

This somewhat leads into 2.

Quantity of available data

Models were trained on ginormous datasets of publicly hosted images. We're talking billions upon billions of images. From here, you can do some funny experiments to figure out why certain aspects work much better in AI than others. For example, if you do an image search for woman... you're going to find a lot of portraits, usually in a certain type of posture. A mid ranged shot is going to be rarer and a full body shot even more rare.

As a result you're going to have the most success in getting an image that is a portrait of a woman. Those work really well with little mistakes. You're going to run into some struggles with mid range shots and full body shots - oh boy, good luck. Now action shots? Oh hell naw, you're in for a world of trouble now. Very little data combined with high complexity of motions and even skin deformation (stretch, compression, etc).

Now hands. Just do an image search for hands and you're going to find hands going all directions, disembodied, drawn, fingers curled or straightened, fingers locked, hands touching things. It's an incredible complex data set that is hard to describe and doesn't have a lot of training data either. There are also follow up issues.

For example breast size. Look at drawings of women on the internet. If you were to say that a lot of them come out rather busty, you'd be correct. If you were to add that anime drawings especially tend to defy gravity, you'd get bonus points. Now consider that these are also part of the training data and you get the reason why the model is so dang thirsty. Version 1.5 especially because it was blended with a leak of an anime model early on. So if you want better hands, you would need a lot more images of hands in the dataset. Something that quickly runs into a different problem.

Consumer Hardware

Right now, the technology needs a beast of a computer to create images. It just so is within the range of consumer computers. There's a bit of a problem with that, aside from excluding large amounts of people. The current implementation on 1.5 isn't using 512x512, the size of the images it was trained on. It's using a latent space of 64x64 times 4 channels.

It's going to work out okay with portraits, because the subject matter is big enough. It'll work with buildings and other things. Now take eyes. They're going to be extremely tiny at the range of a portrait and even smaller in a mid range shot. At some point, they're not even blips anymore and hands suffer the same fate. They're comparatively small while also being way more complex than a face and having way less training in the model because there were less images available.

You can see this quite clearly if you toy around with Dalle3 or midjourney which can throw a lot more power at the problem, using models with more training data and (probably) larger latent spaces.

That's why the AI does weird stuff. All it does is predict noise and we put a finger on the prediction. Sometimes what we want is too complex or too small in the composition to come out well.

Workarounds

Not all is lost. There ARE hands in the dataset. There ARE closeups of eyes in the dataset. What we can do is use further guidance, higher resolutions or targeted inpainting. Further guidance is achieved by using Controlnets. These weigh on the prediction and guide it towards a certain pose or concept. It's not perfect but much more reliable than not using it.

Higher resolution increases the space it can work and generate with, to a degree (these models have upper limits). More than that, use targeted inpainting. Same logic really. If the hand is coming out too small and thus scuffed, we inpaint just the hand and then tell it to try that area again. To the model, it only needs to look at the hand now and can draw from the many close up sketches of it. 

If your computer crashes out at that step, you could always cut out the part you want to try again in an image editor and only put that back into Image 2 Image. Then when done, paste it back in. If you're feeling brave, a1111 has got a multitude of extensions that make the VRAM issue non existent. There's the multidiffusion extension with the combined Tiled VAE. The latter allows you to essentially "split" part of the process, letting you go much further than you would otherwise. There's also the Ultimate SD Upscale script or the option of using a tile controlnet to upscale an image. These are pretty powerful and allow you to upscale a 512x512 image to whatever resolution you want.

Other work arounds: Use other models like Dalle3 for a source image, then blend with control net on your SL image.

  • Like 3
  • Thanks 3
Link to comment
Share on other sites

7 hours ago, ValKalAstra said:

Any explanation I can give is going to be a gross oversimplification because it's all a lot more complex than that. However, let me give it a try. The issue is multi-facetted. First. Calling it AI is giving the wrong impression, it doesn't understand. To the "AI", a hand is not a hand. It has got no idea what it is. It only knows noise. To train it, it was shown how images were turned to noise and was then told to learn to predict the noise. 

What we do when we make these images is to tell the AI to create a noise image and then substract the predicted noise from it. Then we let it do that for several "Steps" until we get out an image. Note: an image, not the image. Noise and randomness are inherent and the image is not stored anywhere in the model, only what it learned from predicting the noise added to it. A model is an amalgamation of this process over several billion of images over multiple epoches. It has done this again and again and again until it got pretty good at predicting what noise would be added based on what fed into it.

To get out an image in a direction we want, we use tokens (prompts) which are turned into vectors via embeddings and which are then fed into the noise predictor to guide. Very simplified: We put a finger on the noise prediction it has learned and tell it to predict more in THAT direction.

So why does it struggle so much with some concepts and not so much with others? Three reasons of the top of my head.

  1. Complexity
  2. Quantity of available data
  3. Consumer Hardware

Part of the problem is the quality of available data. Much of this stuff is from scraping web sites. This is sometimes obvious.sourceinfoproblem.thumb.png.ded9d867de31131d7f85e67e70e66e63.png

Some copyright notice survived Stable Diffusion. Oops.

The data set is strongly biased towards the stuff people post on the Internet.

What's lacking are stills from motion. What's needed are lots of clean unblurred pictures of moving bodies. Stills from most video are not that good. It's possible to record video with no rolling shutter, no compression, no blur, very short exposure times, and every frame clean, but that's not the norm for random video.

training3.png.d92be53128d5332e2a5b20f382cd80f0.png

 

training4.thumb.png.4b24ee6028438563ba9582e3a9be2d9a.png

Stable Diffusion needs a few hundred thousand images like this in the training set.

Then, with examples to follow, it will get arms, legs and fingers in unusual positions right. Somebody is probably working on this right now. Probably somebody well-funded.

This is the part SL avatars get right, since they have a 3D body model. But they don't look as good.

  • Like 5
  • Thanks 1
Link to comment
Share on other sites

On 11/6/2023 at 12:31 PM, UnilWay SpiritWeaver said:

I missed Quartz's note. Did Quartz have a 'how to get started in this' guide?

If not...:

Slight tangent for a second but this is a guide to installing AI art tools on your own computer, and where to get training resources for them. It's about a year old but still points to the best sources and goes through some complex things (like installing python and running a git command to download something) in very simple step by step terms. As long as you follow the steps exactly and don't rush ahead, it works:

The training models site can be used to get everything from hyper-real styles, oil painting, anime, etc. It does have a comically large amount of anime like content, but the other things are there as well...

The tool in stable diffusion for making an AI image of an SL image is 'img2img' - you put your SL image into a window on one side, put in some prompts to customize, load up some 'training models' to filter the style, and have it do the work.

I imagine there are online website to do this also. :)

 

There is an online site for img2img. I tried it, but it didn't exactly work out right. Of course i didn't actually know what i was doing. here are 3 pictures it created.I'd post the original, but I'm topless 😁in it.

 

aiimage01.jpg

aiimage02.jpg

aiimage03.jpg

  • Like 3
  • Thanks 2
  • Haha 4
Link to comment
Share on other sites

On 11/20/2023 at 11:22 PM, animats said:

Very work nice reversing and duplicating that.

My actual prompt was close to "Feral fit muscular aroused angry wet dirty Asian biker girl with no shirt and bare midriff riding hard in dangerous alley in heavy rain at night". Emotional adjectives do a lot with these models. That's what generated the facial expression.

How to make SL itself look this good? Once we have full PBR for skin layers, and lots of lights, we can get closer. Water droplets on skin may be possible with normals and ambient reflection in a tattoo layer. We still need subsurface scattering, the subtle effect where light going into the skin comes out nearby, a little reddish. The lack of that is why SL skin is stuck in a range from "dead" to "plastic".

Wild feral animated hair is a ways off. Needs too much compute. The other stuff is all do-able with current technology.

Just for fun, another picture in that series.

feral40.thumb.png.db5fde38b3d3f0eec9e7431d2b841cf5.png

Look at the facial expression.

That stomach! 

  • Like 2
Link to comment
Share on other sites

On 11/21/2023 at 2:12 AM, Possum888 said:

I started with an old picture of myself (I have adjusted my appearance slightly since then, mainly by fixing some issues with my skin, but I digress).  The image I used was a close up from a larger image so it is, as you can see, a bit pixellated.

53346420604_fc7eedebb4_m.jpg

Here are some of the variations I was able to come up with:

53346417424_45372157ef.jpg

53345217712_582bcb90ac.jpg

53346097226_45e9e59db1.jpg

53346417429_2ef67fa0a7.jpg

53345217662_e13ab5c0d7.jpg

53346097216_3a0eb4be7a.jpg

 

That appears to be more of an image match situation that an actual ai manipulation?! Not saying it’s wrong just an observation is all. 

  • Like 2
Link to comment
Share on other sites

Original/ControlNet (ControlNet's picture brightness scaled high to get better results)

Snapshot_233.png.d6a18607bb85c951cdb1d711453ee3e6.png

Photo_S1761982474_St25_G14.8.jpeg.cb39b6756a850bf9407cfb7feafc46df.jpeg

anime_S1390389050_St25_G14.8.jpeg.e2142295c6dffd86b282b29973b219f1.jpeg

Easy Diffusion with the following settings:

Dimensions: 512x512, Sampler: dpmpp_2m_sde, Inference Steps: 25, Guidance Scale: 14.8, Model: cyberrealistic_v33, VAE: vae-ft-mse-840000-ema-pruned, Negative Prompt: closed eye, Prompt Strength: 0.6, Preserve Color Profile: false, ControlNet Model: control_v11p_sd15_canny

Photo_S1761982474_St25_G14.8.thumb.jpeg.92e3358aad7d0f8a5a4100f4178572fb.jpeg

Upscaled at AnyRec.

 

  • Like 8
Link to comment
Share on other sites

On 11/23/2023 at 5:48 AM, BilliJo Aldrin said:

There is an online site for img2img. I tried it, but it didn't exactly work out right. Of course i didn't actually know what i was doing. here are 3 pictures it created.I'd post the original, but I'm topless 😁in it.

 

aiimage01.jpg

 

 

this is the stuff of nightmares. that nipple thing is triggering my hole phobia and... omg. j sue es price

  • Like 1
  • Haha 2
Link to comment
Share on other sites

Here are a few I've done today:

Original/ControlNet

Snapshot_275.png.5a7202c6ce268659b0adaa36d6402763.png

Result

anime__head_facing_sky_S4088541526_St50_G5.jpeg.4341e23f06845c32c746bc5d38626f51.jpeg

 

Original/ControlNet

Snapshot_278.png.62d3ee1d5c2444a74d6da9e29c2cfff7.png

Result:

anime_S3870444140_St50_G5.jpeg.fac6ffba102fc6bb8de04c8f64a7be08.jpeg

 

Original/ControlNet

image.png.830f18b4ee36336c44c39404d9ef4e45.png

Results

anime_S3417048422_St25_G5.jpeg.67bf4bb2b856d698d2ed78187cb3fc18.jpeg

photograph_S1336773391_St25_G5.jpeg.2b7b8bef1ac4e8cd1dd284e805c98d8e.jpeg

 

Here is one I took yesterday

 

Snapshot_236.thumb.png.93c7e3afdedaa2b83e65bfd3c5b13982.png

That is not a nip, it is part of the dress design

oopsie.thumb.jpg.93623e9f59a01eae7a7ba97b653a22cd.jpg

Easy Diffusion wanted it to be a nipple though, this one did not turn out as well, because the dimensions were not correct with the picture produced.  Easy diffusion as far as I know, does not let me do customized image sizes so this one was squeezed a bit into one of the presets it has available.

another.thumb.jpeg.1d770a2d671fe3b27f3ecdb311cf7b03.jpeg

Here is another one, I liked the one above a bit more though.

 

The first picture kind of reminds me of an exaggerated SuperStar!

giphy.gif

Edited by Istelathis
  • Like 5
  • Thanks 1
  • Haha 1
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...