Project 5 - Diffusion Models

Website: https://jackie-dai.github.io/cs180-webpages/proj5/index.html

Jackie Dai - Fall 2024

Part 1

1.1 - Forward Process

Forwarding in diffusion models means to take a image and apply a certain amount of noise to it. This is achieved by the equation

Here are results of applying noise at t=250,500,750

1.2 - Classical Denoising

The reverse of forwarding is denoising x_t-1 . That is taking a image and taking away some amount of the noise. One method is by applying a Gaussian blur.

Here is the result of applying a Gaussian blur to images at t=250, 500, 750

1.3 - One-step Denoising

Using a pre-trained diffusion model, we can attempt to predict the amount of noise that was added to an image, and remove the noise.

1.4 - Iterative Denoising

In order to achieve better results, we utilize our previous method but apply it iteratively to produce cleaner image.

To save on computational power, we can skip timesteps by defining a strided_timesteps array to keep track of the timesteps we want to use. Here I am skipping every 10 timesteps.

1.5 - Diffusion Model Sampling

Now, we can generate random images by inputting a completely noisy image and denoise it based upon a prompt, “a high quality photo”.

1.6 - Classifier-Free Guidance (CFG)

The results above are of low quality, but we can up the quality by using “classifier free guidance”.

This is done by estimating conditional and unconditional noise to get our final noise.

1.7 - Image-to-image Translation

In order for diffusion models to recover a noisy image, it has to “make-up” what to replace the noisy pixels with based upon its training and prompt. Here, we can play around with the model by passing in an image and noise it a bit, then see what the model comes up with.

1.7.1 Editing Hand-Drawn and Web Images

We can have a bit more fun with this by taking hand drawn images.

Below are a couple of more own drawings followed by a web image.

1.7.2 Inpainting

This SDEditing can also be done only to a portion of our image. We can do this by masking a part of the image and letting the model recover the masked part and leave the rest of the image intact.

1.7.3 Text-Conditional Image-to-image Translation

So far, we have been using the same prompt, “a high quality photo”. Let’s change up the prompts and see how our results change.

Prompt = “a rocket ship”

Prompt=”a man wearing a hat”

Prompt = “an oil painting of an old man”

1.8 - Visual Anagrams

A neat trick we can do with our model is modify the algorithm a bit to produce optical illusions! In order to do this we will predict the noise for two different prompts, one right-side up and another upside down.

This allows us to create optical illusions, where the human eye will perceive different images depending on the orientation they are looking at the image.

Prompt_1 = “an oil painting of people around a campfire”

Prompt_2 = “an oil painting of an old man” (upside down)

Prompt 1 = a photo of a hipster barista

Prompt 2 = a photo of a dog

Prompt 1 = a photo of the amalfi cost

Prompt 2 = a photo of a dog

1.9 - Hybrid Images

Similar to our frequency project, we can create hybrid images where you’ll see one image up close and another from far away. We can do this by simply predicting the noise for one prompt at low frequencies and another prompt at high frequencies.