Originally posted by biz-engineer In principle yes, I agree. But how can images be aligned with sub-pixel precision without knowing in advance how much shift was between frames?
---------- Post added 07-04-20 at 17:35 ----------
I get that. What I don't get is how frames can be aligned while not containing the same information and if no information other than pixels is provided. It's like if the camera moves frame by 0.5 pixel, you know it and can use it for alignment, but if the move between frames is random you would need to know the underlying high res image to realign frames. What I means it if you sample a sine wave (2D, amplitude, time) with two clocks shifted by known delay, you can combine sample as if the sampling frequency is double, you can also sample the sine wave with two clock and random delay between clock and still rebuild the oversampled sine wave because you already know it is a sinewave. But in case of an image, you don't know what the image is, you've only got the samples.
Actually, you don't need to know the underlying high res image to realign to the nearest pixel and then also estimate subpixel offset between the frames.
One approach is to cross-correlate the image pairs with different offsets to find the offset that that maximizes the cross-correlation. That gets you the nearest integer-pixel alignment. Next, you analyze the shape of the peak in the cross-correlation and consider the following three examples of how different sub-pixel offset change the shape of the cross-correlation curve in predictable ways:
1) If the images were perfectly aligned, you'd see a cross-correlation values that go low, medium, high, medium, low for the offsets around the aligned value. The shape of the cross correlation curve would be symmetric about the central value.
2) If the images were perfectly offset by exactly half a pixel, you'd see a cross-correlation values that go low, medium-low, medium-high, medium-high, medium-low, low for the offsets around the aligned value. The shape of the cross correlation curve would be symmetric about the mid point between the two central values.
3) If the images were offset by a quarter pixel, you'd see a cross-correlation values that go low, medium-low, medium-high, medium, low for the offsets around the aligned value. The shape of the cross correlation curve would be asymmetric and biased toward the closer offset by an amount related to the actual fractional offset.
In any case, If you fit the three highest values of the cross-correlation curve to a quadratic equation and then solve for the location of the peak of that curve, that location value will be a decent estimate of the sub-pixel offset.
Another approach finds the integer alignment as in approach 1 and then does an FFT of the two images and calculates the subpixel offsets from the differences in the phases of the FFTs.
There are other approaches in the literature.
The deeper point is that fractional-pixel offsets do create measurable differences between the frames and those measurable differences can be used to estimate the fractional-pixel offset.
P.S. Back in the late 1980s, I developed the algorithms for using image data (and a third approach) to directly estimate the perspective equations (2-D offset, scale, rotation, and 2-D keystoning) that related overlapping frames to each other for panoramic tiling. The typical accuracies of those estimates were about 0.2 pixels.