This post compares our latest 3D results with results from November 2012 (3 months ago).
Our 3D models are generated by processing pairs of 2D images, and the same 2D images that were processed in the November post have been processed again. The only difference between the two 3D models is that new version was created using more sophisticated sub-pixel processing routines.
To compare the models we use the 3D program Scanalyze from Stanford. The models can be viewed with realistic coloring, but it is easier to compare them if they are given "false colors" In the video below, Scanalyze is used to display the latest 3D model (GREEN) and the older 3D model (RED). For the comparison we zoom into a part of the model that should be flat, and then we study the points in each model associated with line across this flat region. If the line is flat then the model is accurate, but any deviations from a straight line represent errors.
[embedplusvideo height="480" width="640" editlink="http://bit.ly/1alfdjS" standard="http://www.youtube.com/v/yumkXHgAniA?fs=1&hd=1" vars="ytid=yumkXHgAniA&width=640&height=480&start=&stop=&rs=w&hd=1&autoplay=0&react=0&chapters=¬es=" id="ep9229" /]
To evaluate the relative error of the two approaches we calculate the Standard Deviation (STDEV) of 750 points in each model that should define a straight line. The results below show that the errors in the new model have a STDEV of 0.75, and this is less than half of the November results with a STDEV of 1.7.
It is nice to see that the GREEN line is over 2x better (flatter) than the RED line, but we were hoping for an even larger improvement. Unfortunately we must accept the fact that better software can help reduce errors, but software cannot completely overcome the small errors that our current calibration "bakes" into the 2D images. The correct way to fix the problem is to bring sub-pixel precision to the rectification process of the original 2D images. We expect a much larger error reduction after implementing the new calibration/rectification process.
We spent the summer of 2012 enhancing our 3D scanner. The 3D scan below with 3.5 million points shows that the system can now produce high-resolution 3D models. Some improvement was the result of integrating code from the open source projects Point Cloud Library & OpenCV, but the largest improvement came from camera recalibration.
Why did we need to recalibrate?
Over the last year it seems that 4 of the 8 image sensor boards in our prototype had vertically shifted up to 5 pixels since the last calibration. Because stereo cameras need to be calibrated to within at least 1/2 pixel, a 5 pixel error is completely unacceptable. The 5-pixel shift represents a very small mechanical change. The pixels on our image sensor are 2.2 microns per side, so a 5-pixel error is a shift of only 11 microns: less than the diameter of one human hair!
The quality of our 3D models improved significantly once we corrected the problem by shifting our images up or down the appropriate number of pixels.
Why wasn't this shift discovered sooner?
Our stereoscopic camera system had been producing good results, so we incorrectly assumed that the cameras were still calibrated. Because we trusted the calibration, we spent the summer carefully reviewing everything else in the system. During our search we optimized the code to improve 3D reconstruction speed and quality, but certain problems remained. It wasn't until this September that we identified and fixed the calibration problem.
This experience has demonstrated the robustness of our 3D scanning approach which uses both passive pixel matching and pattern projection. Before we fixed the calibration errors, the passive pixel-matching part of our scanning process was effectively disabled. Our robust pattern projection is the only reason that we were able to produce usable 3D models from such a poor calibration. Now that we have both good calibration and solid pattern projection, our results are the best ever. Next Steps
There are still some loose ends from this summer's work that we want to tie up by the end of the year. These last few tweaks will improve 3D accuracy and reduce or eliminate the distortion in surfaces that should be flat.
Finally, we have also gained a valuable insight for the next design. The new system will be designed to maintain camera rigidity/stability to within about 1/10 micron. This is about 100x better than the current prototype. We plan to finalize the new system design and begin construction in 2013.
The cameras are finally calibrated, and the communications and power systems are installed and working. Now I can finally begin producing scans to test and fine tune the software.
Today I scanned part of the lab, and the animated GIF illustrates the 3D nature of the scan. When producing a 3D model, multiple perspectives must be captured to fill in occlusions (blind spots). For this model, three scans from different locations were merged to produce a point cloud. The GIF consists of 7 different screen-shots of the point-cloud. While there are still occlusions, many have been filled. For example, notice that you can see both above and below the table.
The original 32-bit software that we use to turn pictures into 3D models is almost 5 years old, and it runs on 32-bit Windows XP. The old software often crashes when processing high resolution images because the 2GB memory limit isn't enough to process the gigabytes of data that our scanner can quickly produce. Today's scan was made on a computer running 64-bit Windows 7, and we are currently replacing the old 32-bits software with more advanced 64-bit code. The new software runs much faster in 64-bit mode because it can keep temporary files in RAM instead of writing them to and reading them from a slow disk. Even using a Solid State Drive (SSD) wastes minutes of unnecessary processing.
COMING UP: Much better scans processed by SketchUp & posted into Google Earth.
The Prototype-4.x family of 3D-360s is based on a camera that we have been developing for over a year. While several areas of enhancement are still left to be implemented, the new camera is ready to be compared against the Canon 5D. Prototype-3 used eight Canon 5Ds, and the new camera in Prototype-4 needs to meet or exceed the 5D's performance.
One significant difference between our camera and the Canon 5D is that the 5D (and all other color cameras) uses tiny color filters arranged in a Bayer pattern on top of the individual pixels inside of the camera. While the 5D has 12 million pixels, only 3 million are RED, 6 million are GREEN, and 3 million are BLUE. Our camera is arguably a 15 million pixel sensor because it cycles through three large filters with the 5 million pixel monochrome sensor to produce 5 million RED pixels, 5 million GREEN pixels, and 5 million BLUE pixels. Our camera is immune to color artifacts caused by the Bayer patterns, but taking a picture takes three times longer because the filters must be rotated into place between shots. Fortunately our system automatically changes between filters in less than one second. In the future we may want to add filters for other parts of the spectrum including infrared (IR) and ultra violet.
The purpose of this test is to compare the color reproduction, noise, and Bayer pattern artifacts between the two cameras. The 5D has a 14mm Canon lens, and the FOV is similar to our custom lens. Here is the test procedure:
1) Take a picture with each camera in RAW mode
2) Use minimal automatic processing on each image. For the 3D-360 Photoshop was used for color balance and sharpening. For the Canon 5D the image was processed with DxO
3) Compare the cropped images at actual size and zoomed to 600%
Here are the results:
Above is the shot from the Prototype-4 camera,
And below is the shot from the Canon 5D.
The two shots show that our camera compares well to the Canon 5D. A slight BLUE halo is visible to the left of some objects, but this may be caused by a dirty or warped Wratten filter.
Below is a zoomed comparison of the areas the GREEN circles.
Close inspection shows that the 3D-360 camera has less noise and fewer Bayer pattern artifacts, but the 5D seems a little sharper. The difference in sharpness could be related to the dynamic range of the two images. The raw 3D-360 image covers a linear range of 24 bits, but the 5D covers a smaller range of only 12 bits. We use a combination of linear and logarithmic curves to squeeze the 24 bits per pixel per color channel down to 16 bits per pixel per channel. To improve contrast we may reduce our range from 24 bits to 22 bits.
I am pleased with this early test, and we are currently implementing upgrades that should make the difference even more dramatic.
We spent the last year designing and building a camera and software that can capture images with pixels that are 16-bits deep. It isn't easy to view these images since most tools expect 8-bit images, so the following routine is used to squeeze the 65,536 values in the 16-bit image down to the 256 values of an 8-bit image. There are thousands of ways to compress a 16-bit image, and this approach is specifically for our machine vision/stereoscopic needs.
This approach to compressing pixel intensities is based on the octave relationship, and it is similar to the way a piano's keys represent a wide range of frequencies. Each "octave" in this case is light intensity that is either twice as bright or half as bright as its neighboring octave. Each octave of light intensity is broken into 20 steps, and this is similar to the 12 keys (steps) in each octave of a piano keyboard. Below is a table and chart that illustrate the conversion from 16-bit images to 8-bits. Each red dot in the chart represent an octave, and there are 20 steps inside each octave. The approach outlined here allows an 8-bit image to evenly cover 12 octaves: almost the full dynamic range of a 16-bit image.
This curve will probably be modified many times with different numbers of divisions per octave, but the basic approach will stay the same. Below is an example of an original 16-bit linear image, and an 8-bit version of the same image after application of the above logarithmic curve. The pictures are not pretty, but they illustrate how details can be pulled from the shadows. The 16-bit linear image is on the left, and the curve-adjusted 8-bit image is on the right.
The image at the right allows you to see the details in the shadows (notice the wires in the upper right) as well as details in the bright areas. An image editing program could be used to manually adjust brightness and extract details from the 16-bit image, but the curve described here can do a good job automatically.
Next post: Rectification.