This 3D model includes alignment errors....... and we know how to fix them. Our objective is to develop an automatic 3D model creation system, and we know from experience that the errors will get smaller as our calibration process is refined. Below is a description of how this model was made using images from Proto-4F of our 8-camera 3D-360 scanner.
A 3D model requires images from multiple perspectives, so for this model we scanned from 4 different locations: two scans from a high perspective with the scanner cameras at 6 feet, and two low scans with the scanner 3 feet above the floor. Once the scans were completed (all of the pictures have been taken and downloaded) the images from the 4 scans were processed using our automatic 3D reconstruction software. This processing resulted in 4 "point clouds" of 3D data: one point cloud for each scan. Next the 4 point clouds were aligned with each other to create a single "point cloud" of, in this case, 20 million points.
Point clouds are a precise, but inefficient way to format and store 3D data. Point clouds for 3D data can be compared to the BMP format for 2D images. Just as compressed JPEGs are about 10x more efficient than uncompressed BMPs for storing 2D images, triangular meshes are a more efficient way to store 3D data than uncompressed point clouds. Meshes are efficient because a group of 3 points for a single triangle can replace thousands (or millions) of points if the points are in a plane. Decades of work from people around the world has resulted in mature procedures to generate meshes from point clouds. Our current meshing routine turned the 400 Mbyte "point cloud" of 20,000,000 points into a 20MB mesh of 24,000 triangles. In the future we will use more efficient meshing procedures that produce better meshes with even fewer triangles.
After meshing we have a 3D model of the area that was scanned, but at this point the mesh is not photorealistic. We make the model photorealistic by "projecting" the original color images taken during the scanning process onto the mesh. This automatic process is called "texture projection," and when it is done well it results in a photorealistic 3D model.
Texture projection works very well when everything is correctly aligned and registered, but alignment errors can rapidly build on each other and produce errors that make a model look bad. The alignment errors in this process come from several different sources in the calibration/scanning/processing pipeline:
- Lens distortion correction errors inside each camera
- Alignment errors between the left and right camera in each of the 4 pairs of cameras
- Alignment errors between each of the 4 pairs of cameras
- Alignment errors between the 4 scans
These are all well defined problems that we are working on. We could proceed slowly and reduce the errors by recalibrating the existing Proto-4F 3D-360 camera system. This approach would take weeks and it could cut the errors in half a few times, but it cannot correct the built-in limitations of our current lenses and calibration facility.
Another option is to build on our two plus years of experience with the Proto-4x family and design a new Proto-5x series. The new design will have more lenses, higher resolution sensors, faster processors (ARM/AMD Fusion/Tegra/FPGA/other?), and it will be calibrated with a 10x larger "calibration bunker." I am currently working on Proto-5x designs, and a key characteristic may be to increase the number of cameras from the current 8 to 32, or even as many as 100. A large array of inexpensive lenses can cost less and outperform a small number of expensive lenses. The trick is to design a manufacturable and and inexpensive array of sensors, lenses and processors. While a design with up to 100 camera may sound extravagant, remember that the fly's eyes have over 1,000 lenses:
Because Proto-5x will require the design, layout, fabrication and testing of a new camera/processor board, this approach will take at least four months. Software porting, calibration, and testing could add another 4 to 8 months to the process. Depending on the final design, the Proto-5x family could reduce the errors by a factor of 10 or more.
The Prototype-4.x family of 3D-360s is based on a camera that we have been developing for over a year. While several areas of enhancement are still left to be implemented, the new camera is ready to be compared against the Canon 5D. Prototype-3 used eight Canon 5Ds, and the new camera in Prototype-4 needs to meet or exceed the 5D's performance.
One significant difference between our camera and the Canon 5D is that the 5D (and all other color cameras) uses tiny color filters arranged in a Bayer pattern on top of the individual pixels inside of the camera. While the 5D has 12 million pixels, only 3 million are RED, 6 million are GREEN, and 3 million are BLUE. Our camera is arguably a 15 million pixel sensor because it cycles through three large filters with the 5 million pixel monochrome sensor to produce 5 million RED pixels, 5 million GREEN pixels, and 5 million BLUE pixels. Our camera is immune to color artifacts caused by the Bayer patterns, but taking a picture takes three times longer because the filters must be rotated into place between shots. Fortunately our system automatically changes between filters in less than one second. In the future we may want to add filters for other parts of the spectrum including infrared (IR) and ultra violet.
The purpose of this test is to compare the color reproduction, noise, and Bayer pattern artifacts between the two cameras. The 5D has a 14mm Canon lens, and the FOV is similar to our custom lens. Here is the test procedure:
1) Take a picture with each camera in RAW mode
2) Use minimal automatic processing on each image. For the 3D-360 Photoshop was used for color balance and sharpening. For the Canon 5D the image was processed with DxO
3) Compare the cropped images at actual size and zoomed to 600%
Here are the results:
Above is the shot from the Prototype-4 camera,
And below is the shot from the Canon 5D.
The two shots show that our camera compares well to the Canon 5D. A slight BLUE halo is visible to the left of some objects, but this may be caused by a dirty or warped Wratten filter.
Below is a zoomed comparison of the areas the GREEN circles.
Close inspection shows that the 3D-360 camera has less noise and fewer Bayer pattern artifacts, but the 5D seems a little sharper. The difference in sharpness could be related to the dynamic range of the two images. The raw 3D-360 image covers a linear range of 24 bits, but the 5D covers a smaller range of only 12 bits. We use a combination of linear and logarithmic curves to squeeze the 24 bits per pixel per color channel down to 16 bits per pixel per channel. To improve contrast we may reduce our range from 24 bits to 22 bits.
I am pleased with this early test, and we are currently implementing upgrades that should make the difference even more dramatic.
This is the first color image produced by the new camera & lens combination. The bilinear rectification routine that we completed last week was automatically applied to correct chromatic aberration. In the future bicubic interpolation will make the image even sharper. The original 16-bit image had levels and curves adjusted in Photoshop, and the result was converted to the 8-bit JPG below.
Stereo reconstruction works by identifying similar features within two images, and we will use any technique that enhances small features. As a first step in our stereo reconstruction pipeline we currently use bilinear interpolation to rectify/dewarp images. While bilinear interpolation is easy to code and does a good job, there are many other types of interpolation worth considering. The two images below have been modified with bicubic interpolation and bilinear interpolation. The results confirm that bicubic is sharper, so we will eventually migrate to bicubic interpolation.
Wikipedia has some more examples.
We spent the last year designing and building a camera and software that can capture images with pixels that are 16-bits deep. It isn't easy to view these images since most tools expect 8-bit images, so the following routine is used to squeeze the 65,536 values in the 16-bit image down to the 256 values of an 8-bit image. There are thousands of ways to compress a 16-bit image, and this approach is specifically for our machine vision/stereoscopic needs.
This approach to compressing pixel intensities is based on the octave relationship, and it is similar to the way a piano's keys represent a wide range of frequencies. Each "octave" in this case is light intensity that is either twice as bright or half as bright as its neighboring octave. Each octave of light intensity is broken into 20 steps, and this is similar to the 12 keys (steps) in each octave of a piano keyboard. Below is a table and chart that illustrate the conversion from 16-bit images to 8-bits. Each red dot in the chart represent an octave, and there are 20 steps inside each octave. The approach outlined here allows an 8-bit image to evenly cover 12 octaves: almost the full dynamic range of a 16-bit image.
This curve will probably be modified many times with different numbers of divisions per octave, but the basic approach will stay the same. Below is an example of an original 16-bit linear image, and an 8-bit version of the same image after application of the above logarithmic curve. The pictures are not pretty, but they illustrate how details can be pulled from the shadows. The 16-bit linear image is on the left, and the curve-adjusted 8-bit image is on the right.
The image at the right allows you to see the details in the shadows (notice the wires in the upper right) as well as details in the bright areas. An image editing program could be used to manually adjust brightness and extract details from the 16-bit image, but the curve described here can do a good job automatically.
Next post: Rectification.