Automatic Image Mosaicing Algorithms

Topic Description

Image alignment or mosaicing is a widespread technique for the fusion of several views of the same scene matter into a unique image with augmented spatial and tonal extent and resolution, known as mosaic. The fusion process assumes a previous stage, named image registration, devoted to the determination of the spatial and photometric relations among such views through the exploitation of correspondences within overlapping regions.

Registration and mosaicing of images have been in practice since long before the age of digital computers. Shortly after the photographic process was developed in 1839, the use of photographs was demonstrated on topographical mapping. Images acquired from hill-tops or balloons were manually pieced together. This was initially done by manually mosaicing images which were acquired by calibrated equipment. The need for mosaicing continued to increase later in history as satellites started sending pictures back to earth. Improvements in computer technology became a natural motivation to develop automated, sound and general-purpose computational techniques.


Image registration is concerned with the extraction of coherent and coordinate information from collection of images of the same subject matter, either coming from video streaming or from sets of individual stills. In particular the image registration process entails the accurate determination of the inter-frame spatial and photometric relationships. When images do not exhibit parallax effects the knowledge of such relationship enables the creation of images with a greater spatial extent and/or augmented details, known as mosaics.

Traditional mosaicing algorithms focus on globally consistent alignment through a large scale optimization procedure that simultaneously consider all images. These global approaches take advantage of the spatial, as well as temporal, adjacency (topology) of the sequence but must operate off-line since they require all the images to be known in advance.

On-line algorithms allows for sequential computation but often requires additional assumptions (e.g. information about scene geometry or camera calibration) or non-standard hardware (orientation sensors, optical filters) to speed up the process thus resulting in special-purpose algorithms. Moreover such approaches implicitly assume temporal adjacency only and thus are not globally consistent.

On the contrary, this research addresses sequential real-time methods that are image-based, thus general purpose, and “globally” consistent.

Pairwise alignment (P2P) Our two stage registration

Our algorithm, SeqRT-Mosaic[2], features a two-stage spatial registration procedure. The early feature-based pair-wise registration compute an initial frame-to-frame alignment. Subsequently, a refinement method has been conceived to bound the cumulative drift errors introduced by the pair-wise placement. The second stage anchors the frame to the already constructed mosaic in a frame-to-mosaic (F2M) fashion providing a sort of globally consistent alignment. Multiple objects are allowed to move in the scene, since they are recognized as "motion outliers" (with respect to camera motion) using an iterative version of RANSAC.

Tonal registration is performed to account for photometric misalignments induced by changing lighting conditions and exposures. Exposure adjustments are automatically operated by the device to provide an optimally exposed image either in dark and bright areas of the scene. Anyway, corresponding pixels might look different across different images. The histogram specification approach provide a fast method to compute an Intensity Mapping Function (IMF) suitable to match color of corresponding pixels in different images.

Frame 386 Frame 392
Color artifacts due automatic exposure adjustments Smooth mosaic using our exposure compensation algorithm


In the following figures, some mosaics created in very different conditions (on-line/off-line mode, indoor/outdoor, presence of moving objects, different reprojection manifolds) are shown to support the claim that the algorithm is image-based and general-purpose.

Panoramic image from an uncalibrated footage sequence
Mosaic from hand-held camera sequence
Wide area mosaic using a visual surveillance Pan-Tilt-Zoom camera
360° spherical panorama

Research Directions

  • Optical distortions: pin-hole camera does not exist in nature, lens distortions need to be handled explicitly to achieve better registration results.
  • Extended dynamic range: while color normalization has proved to effectively deal with exposure changes, it does not reflect the underlying physical phenomena. There is a much wider dynamic range outside than cameras can capture, let reconstruct it as well.
  • View graph topology: pairwise registration is unprecise and global registration is intractable. The knowledge of the right subset of adjacencies among views plays an important role in this tradeoff, it definitely deserves further investigation.
  • Gold standard method: it's the time to integrate the best of all methods in a comprehensive approach that pursues the objective of the optimal reconstruction taking into account the as large as possible number of effects involved in the image formation process. The effectiveness of the algorithm will be assessed quantitatively according a rigourous evaluation metodology[1].


[1] P. Azzari and L. Di Stefano, Performance Evaluation Methodology for Image Mosaicing Algorithms , CV-Lab Technical Report TR-2007-10,University of Bologna, October 2007.
[2] P. Azzari, General purpose real-time image mosaicingappeared in the poster session of ICVSS, July 2007.