Here's another example, using the prompt "An old Danish woman" and again a CFG scale of 60.0: without normalization, with normalization. This is because we apply the normalization at every sampler step, which due to SD's iterative nature means that the end result ends up quite different. As you can see, the image generated at scale = 60.0 while using one of these methods is quite different from the image generated at the same scale and seed but without any normalization. Here's an example of how the results from using the latter method look in practice, illustrated through the prompt "A bottle of vodka on an old wooden table, photo, high quality, highres, Sigma 50 mm f/2.8": generated image at scale = 8.0 (without normalization), generated image at scale = 60.0 without normalization, and generated image at scale = 60.0 with normalization. (This parameter might need to be adjusted in some cases – in case you still see excessive saturation/contrast, try max_val = 1.5 – or even 1.4 – instead.) I then calculate the q-th quantile of the data ( quantile_val in the screenshot – my current choice of q is 0.975), and if this quantile is greater than a parameter max_val (1.7 being the best value of this parameter that I found so far), then I divide the pixel values by quantile / max_val. n > 2 left the excessive saturation and contrast intact, but somewhere around 1.5 1.0). I tried experimenting with using intervals (-n, n) other than (-1, 1) as the target of the "dynamic thresholding".My guess is that this method doesn't work quite as well for SD as it did in the paper because the latent representations of images SD uses are encoded differently from ordinary images – as far as I can tell the "pixel values" of the latent representations are basically unbounded (although mostly restricted to (-3, 3)), which means that clamping the values to (-1, 1) doesn't really make sense. However, the images produced while using this method end up quite "grey" and washed-out, with everything compressed into a narrow dynamic range which is hard to fix even in postprocessing. The "dynamic thresholding" method described in the paper works reasonably well to avoid excessive contrast and saturation.I did this through a process of try & fail with a lot of failed attempts and half-baked ideas, but here are the highlights: I've experimented with various methods of doing this kind of normalization, including the one described in that paper. u/Nextil asked about this a couple of weeks ago, and mentioned in a comment that he'd asked about this on the LAION Discord and was directed towards this paper, which proposes a method they call "dynamic thresholding" to avoid this effect. This is worse when using certain samplers and better when using others (from personal experience, k_euler is the best choice in order to avoid this as much as possible), but always regardless of other settings if the CFG scale is set sufficiently high. At high CFG scales (especially >20, but often below that as well), generated images tend to have excessive and undesired levels of contrast and saturation.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |