Project 1: Image Processing

Jackie Dai, CS180 Fall 2024

Background

Sergei Mikhailovich Prokudin-Gorskii traveled the Russian empire taking colored photographs. In a world without colored printing, he achieved his physical colored photos by taking three snapshots of the same scene but using three different color filters: a red filter, blue filter, and a green filter. The snapshots were projected onto a single glass plate in rapid succession, resulting in a colorful final image. In this project, we hope to utilize modern image processing and imitate the alignment and stacking of the three color channels of a image to produce a colorized output.

Approach

We are given images with separated color channels (RGB). First, I sliced the image into three separate versions (R, G, B). Next, I had to align and stack the three versions to output a fully colorized image. The trick to this is the alignment and there are many approaches to it.

Here is a example of stacking the three color channels WITHOUT any alignment

Cathedral.jpg

It appears to be fuzzy which is due to the misalignment of the color channels.

Single-scale Alignment

Alignment starts with searching through a window of possible shifts in the x and y direction and checking the match between the displaced channel and the anchor channel (G). This method brute forces the search process by looping over a [-15, 15] pixel search window , scoring each displacement with a scoring metric, and taking the best score to align the two images. This method works fine for lower resolution images. However, it becomes a exhaustively slow process when it comes to higher resolution images that require a larger search window.

Tobolsk.jpg, r: (4, 1) b: (-3, -2), runtime: 1.7

monastery.jpg, r: (6,1), b: (6, 0) runtime: 1.4

I tried three different scoring metrics to find the one who outputted the most satisfying image

  1. Sum of square differences (SSD)

  1. Normalized cross correlation (NCC)
  1. Structural similarity index (SSIM)

Ultimately, I found SSD to output the most satisfying images, which is what I went with for the rest of the experiments.

SSD:sqrt(sum(sum(img1img2)2))SSD: sqrt(sum(sum(img1 - img2)^2))

Next, I implemented a crop function and cropped all my channel’s borders by a factor of 0.1 before performing my alignments. This way I can get rid of the visual artifacts on the border left behind by the displacements.

cathedral.jpg, r:(7, 0), b:(-1, 1) runtime: 1.7

The Problem

When I moved on to the larger images in the dataset. The single scale implementation was not going to cut it. There were far too many pixels to check and required too large of a search window. church.tif took 3 minutes to load and had unsatisfying results.

Pyramid Search

One solution to the long process times lies in pyramid imaging. This method downscales the image by a factor of 2 and performs the naive alignment algorithm on the lower resolution to find the best displacement vector. The displacement vector returned will be used to align the image for the previous scaled image. This will repeat recursively until we reach the original image size. Each window size to search for the alignment will become smaller each recursive frame as the resolution increases to save computational power. The alignment vectors for each resolution is passed up the recursive steps, adding to the final alignment. Additionally, each alignment vector passed up the recursive stack had to be scaled by a factor of 2 to account for the downscaled version of the image from which the alignment vector came from.



Here are the results:

Church.tif, r: (33, -8), b: (-25, -4), runtime: 7.6

Harvesters.tif, r: (65,-3) b: (-59, -16), runtime: 8.3
Icon.tif, r: (48, 5), b: (-41, -17), runtime: 8.6
Lady.tif, r: (61, 3), b: (-51, -9), runtime: 9.3

Melons.tif, r: (93, 3), b: (-81, -10), runtime: 9.3
Onion_church.tif, r: (57, 10), b: (-51, -26), runtime: 10.6

Sculpture.tif r: (57, 10), b: (-51, -26), runtime: 9.5
self_portrait.tif, r: (107, -16), b: (-33, 11), runtime: 10.14

three_generations.tif, r: (58, -3), b: (-53, -14), runtime: 8.025
train.tif, r: (43, 27), b: (-42, -5 ), runtime: 9.3