This year’s enhancements to the image processing routines in our stereo scanning software has improved processing speed and 3D model accuracy. Comparisons between our current results and those from 9 months ago show that we have reduced the magnitude of one type of geometric error in our 3D scans by a factor of 2 to 4, and we project that future software and hardware enhancements will allow us to cut the noise in half at least 5 more times. Finding/developing a benchmark to clearly reflect these results has been tricky.
In the previous post we compared scans by superimposing them on each other and then comparing the non-linearity of flat surfaces. Because each surface should be flat, any deviation from a straight line represents a scanning error. We used standard deviation analysis to determine that our improvements had cut the error in half for this specific test, but that one number doesn’t tell the whole story. What other metrics and ratios should we use to judge the quality of the 3D scans that our scanner produces?
Until we come up with a more useful metric to quantify the relative quality, we will use human perception to evaluate the quality of scans. The video below shows the results of our last 9 months of software enhancement.
This post compares our latest 3D results with results from November 2012 (3 months ago).
Our 3D models are generated by processing pairs of 2D images, and the same 2D images that were processed in the November post have been processed again. The only difference between the two 3D models is that new version was created using more sophisticated sub-pixel processing routines.
To compare the models we use the 3D program Scanalyze from Stanford. The models can be viewed with realistic coloring, but it is easier to compare them if they are given “false colors” In the video below, Scanalyze is used to display the latest 3D model (GREEN) and the older 3D model (RED). For the comparison we zoom into a part of the model that should be flat, and then we study the points in each model associated with line across this flat region. If the line is flat then the model is accurate, but any deviations from a straight line represent errors.
To evaluate the relative error of the two approaches we calculate the Standard Deviation (STDEV) of 750 points in each model that should define a straight line. The results below show that the errors in the new model have a STDEV of 0.75, and this is less than half of the November results with a STDEV of 1.7.
It is nice to see that the GREEN line is over 2x better (flatter) than the RED line, but we were hoping for an even larger improvement. Unfortunately we must accept the fact that better software can help reduce errors, but software cannot completely overcome the small errors that our current calibration “bakes” into the 2D images. The correct way to fix the problem is to bring sub-pixel precision to the rectification process of the original 2D images. We expect a much larger error reduction after implementing the new calibration/rectification process.
We spent the summer of 2012 enhancing our 3D scanner. The 3D scan below with 3.5 million points shows that the system can now produce high-resolution 3D models. Some improvement was the result of integrating code from the open source projects Point Cloud Library & OpenCV, but the largest improvement came from camera recalibration.
(direct YouTube link)
Why did we need to recalibrate?
Over the last year it seems that 4 of the 8 image sensor boards in our prototype had vertically shifted up to 5 pixels since the last calibration. Because stereo cameras need to be calibrated to within at least 1/2 pixel, a 5 pixel error is completely unacceptable. The 5-pixel shift represents a very small mechanical change. The pixels on our image sensor are 2.2 microns per side, so a 5-pixel error is a shift of only 11 microns: less than the diameter of one human hair!
The quality of our 3D models improved significantly once we corrected the problem by shifting our images up or down the appropriate number of pixels.
Why wasn’t this shift discovered sooner?
Our stereoscopic camera system had been producing good results, so we incorrectly assumed that the cameras were still calibrated. Because we trusted the calibration, we spent the summer carefully reviewing everything else in the system. During our search we optimized the code to improve 3D reconstruction speed and quality, but certain problems remained. It wasn’t until this September that we identified and fixed the calibration problem.
This experience has demonstrated the robustness of our 3D scanning approach which uses both passive pixel matching and pattern projection. Before we fixed the calibration errors, the passive pixel-matching part of our scanning process was effectively disabled. Our robust pattern projection is the only reason that we were able to produce usable 3D models from such a poor calibration. Now that we have both good calibration and solid pattern projection, our results are the best ever.
There are still some loose ends from this summer’s work that we want to tie up by the end of the year. These last few tweaks will improve 3D accuracy and reduce or eliminate the distortion in surfaces that should be flat.
Finally, we have also gained a valuable insight for the next design. The new system will be designed to maintain camera rigidity/stability to within about 1/10 micron. This is about 100x better than the current prototype. We plan to finalize the new system design and begin construction in 2013.
Making 3D models is time consuming. Recent programs like Google’s SketchUp (it’s free) have simplified the process of making digital 3D models, but SketchUp is definitely not automatic.
To make a 3D model look photorealistic, real world pictures can be “projected” onto a SketchUp model. While this technique can add realism, SketchUp is still a manual approach that can take hours, weeks, or even months to produce good results.
Many in the 3D and animation world would like an automatic process that can produce 3D models from a series of 2D pictures. Our goal is to create a system that automatically produces photorealistic digital 3D models that can be processed in existing 3D programs like 3D Studio Max, GeoMagic, or SketchUp.
The Microsoft Photosynth project can automatically create 3D-like effects (some call it 2.5D) by automatically processing 10s to 100s of 2D images. While this process is automatic, it does not produce a 3D model that can be used by other programs.
Garbage in…….. Garbage out.
A challenge for Photosynth and other automatic stitching/panoramic approaches is that they often use regular uncalibrated cameras. While this is convenient, it forces the programs to analyze each camera image to determine the field of view and other essential lens/camera characteristics: the cameras are essentially calibrated during processing. Precisely calibrating a camera is challenging in a lab setting, so it is reasonable to expect that on-the-fly calibration results will not be very precise. Any errors in the camera calibration step will build on each other and cause problems later in the process. While calibration problems cause annoying alignment errors in panoramic 2D & 2.5D images, they cause unacceptable distortion in 3D models. Here is a list of variables that must be determined before using a 2D image to create an accurate 3D model:
Camera Variables that must be determined for Precise Stereoscopic 3D Reconstruction
- The exact center of the image sensor behind the lens: sensors are normally a few pixels off-center
- Camera Horizontal & Vertical Field of View to within 1/100 degree
- Camera lens distortion correction variables: Pincushion, barrel, radial.
- Camera horizontal orientation (0.00 to 360.00 degrees) to within 1/100 of a degree
- Camera vertical orientation (tilt, roll) to within 1/100 degree
- Camera location for each shot: X, Y, and Z coordinates to within one millimeter
- Camera dynamic range and gamma
The quality of a 3D model is limited by the quality of the 2D pictures used to make it. Here’s how we calibrate our camera system:
1) Design and build a calibration routine/facility to determine the key camera variables.
2) Design and build a system of cameras that can be easily calibrated.
The important point is that the camera system and the calibration system need to be built for each other: they literally fit together like a lock and key. As we see it, a calibrated system produces “clean” images that simplify and speed up the 3D reconstruction process. Our current 8-camera system (Proto-4F) has been designed to produce sets of calibrated images, and these images are used to automatically produce 3D models.
This 3D model includes alignment errors……. and we know how to fix them. Our objective is to develop an automatic 3D model creation system, and we know from experience that the errors will get smaller as our calibration process is refined. Below is a description of how this model was made using images from Proto-4F of our 8-camera 3D-360 scanner.
A 3D model requires images from multiple perspectives, so for this model we scanned from 4 different locations: two scans from a high perspective with the scanner cameras at 6 feet, and two low scans with the scanner 3 feet above the floor. Once the scans were completed (all of the pictures have been taken and downloaded) the images from the 4 scans were processed using our automatic 3D reconstruction software. This processing resulted in 4 “point clouds” of 3D data: one point cloud for each scan. Next the 4 point clouds were aligned with each other to create a single “point cloud” of, in this case, 20 million points.
Point clouds are a precise, but inefficient way to format and store 3D data. Point clouds for 3D data can be compared to the BMP format for 2D images. Just as compressed JPEGs are about 10x more efficient than uncompressed BMPs for storing 2D images, triangular meshes are a more efficient way to store 3D data than uncompressed point clouds. Meshes are efficient because a group of 3 points for a single triangle can replace thousands (or millions) of points if the points are in a plane. Decades of work from people around the world has resulted in mature procedures to generate meshes from point clouds. Our current meshing routine turned the 400 Mbyte “point cloud” of 20,000,000 points into a 20MB mesh of 24,000 triangles. In the future we will use more efficient meshing procedures that produce better meshes with even fewer triangles.
After meshing we have a 3D model of the area that was scanned, but at this point the mesh is not photorealistic. We make the model photorealistic by “projecting” the original color images taken during the scanning process onto the mesh. This automatic process is called “texture projection,” and when it is done well it results in a photorealistic 3D model.
Texture projection works very well when everything is correctly aligned and registered, but alignment errors can rapidly build on each other and produce errors that make a model look bad. The alignment errors in this process come from several different sources in the calibration/scanning/processing pipeline:
- Lens distortion correction errors inside each camera
- Alignment errors between the left and right camera in each of the 4 pairs of cameras
- Alignment errors between each of the 4 pairs of cameras
- Alignment errors between the 4 scans
These are all well defined problems that we are working on. We could proceed slowly and reduce the errors by recalibrating the existing Proto-4F 3D-360 camera system. This approach would take weeks and it could cut the errors in half a few times, but it cannot correct the built-in limitations of our current lenses and calibration facility.
Another option is to build on our two plus years of experience with the Proto-4x family and design a new Proto-5x series. The new design will have more lenses, higher resolution sensors, faster processors (ARM/AMD Fusion/Tegra/FPGA/other?), and it will be calibrated with a 10x larger “calibration bunker.” I am currently working on Proto-5x designs, and a key characteristic may be to increase the number of cameras from the current 8 to 32, or even as many as 100. A large array of inexpensive lenses can cost less and outperform a small number of expensive lenses. The trick is to design a manufacturable and and inexpensive array of sensors, lenses and processors. While a design with up to 100 camera may sound extravagant, remember that the fly’s eyes have over 1,000 lenses:
Because Proto-5x will require the design, layout, fabrication and testing of a new camera/processor board, this approach will take at least four months. Software porting, calibration, and testing could add another 4 to 8 months to the process. Depending on the final design, the Proto-5x family could reduce the errors by a factor of 10 or more.