The Steuart Systems photorealistic 3D scanner uses hardware & software to produce 3D models, but how good is our software? We used the ETH 3D benchmark to evaluate it. We were hoping to be in the top 10% of the 116 entries from around the world, and the March 31 2022 benchmark result put us at #1. Some other group will claim the top spot eventually, but for now the ranking proves that our approach is world-class.
Good software is nice, but camera hardware is our main strength. Over the years our tests have shown that high-quality low-noise HDR images from our camera hardware amplifies the power of whatever software we use. Low-noise 3D content looks better, and it is easier to compress, distribute and view on the web.
Stay tuned. Over the next few weeks we will post results as we tune our software and our array of 32 cameras.
We have steadily improved our scanning results over the last 6 weeks by modifying hardware, writing new software, and tuning over a dozen variables. The video below demonstrates the effect of our enhanced noise reduction:
Low noise in 3D models is important for two reasons:
Low noise 3D looks better.
Low noise 3D models are easier to compress & display. In many cases smoothing should allow us to reduce a scan to less than 1% of the original size.
Noise reduction & smoothing has been around for decades, but there is a delicate balance between appropriate smoothing, and over-smoothing which can make objects look like jelly beans. Our past experience with generic smoothing routines has been disappointing because they often round edges & eliminate important details.
Why Our Smoothing Is Better Than Other Options
Instead of applying generic smoothing filters to our data after the 3D data has been created, we apply smoothing during the creation of 3D data. We can achieve an optimal level of smoothness because our smoothing software has intimate knowledge of the scanner hardware and configuration. Stereo scanners like ours can be accurate to a fraction of a millimeter up close, but precision falls off as the distance from the scanner increases. Our smoothing routines use this fact to smooth our 3D data with more finesse.
This post compares our latest 3D results with results from November 2012 (3 months ago).
Our 3D models are generated by processing pairs of 2D images, and the same 2D images that were processed in the November post have been processed again. The only difference between the two 3D models is that new version was created using more sophisticated sub-pixel processing routines.
To compare the models we use the 3D program Scanalyze from Stanford. The models can be viewed with realistic coloring, but it is easier to compare them if they are given “false colors” In the video below, Scanalyze is used to display the latest 3D model (GREEN) and the older 3D model (RED). For the comparison we zoom into a part of the model that should be flat, and then we study the points in each model associated with line across this flat region. If the line is flat then the model is accurate, but any deviations from a straight line represent errors.
To evaluate the relative error of the two approaches we calculate the Standard Deviation (STDEV) of 750 points in each model that should define a straight line. The results below show that the errors in the new model have a STDEV of 0.75, and this is less than half of the November results with a STDEV of 1.7.
It is nice to see that the GREEN line is over 2x better (flatter) than the RED line, but we were hoping for an even larger improvement. Unfortunately we must accept the fact that better software can help reduce errors, but software cannot completely overcome the small errors that our current calibration “bakes” into the 2D images. The correct way to fix the problem is to bring sub-pixel precision to the rectification process of the original 2D images. We expect a much larger error reduction after implementing the new calibration/rectification process.
This 3D model includes alignment errors……. and we know how to fix them. Our objective is to develop an automatic 3D model creation system, and we know from experience that the errors will get smaller as our calibration process is refined. Below is a description of how this model was made using images from Proto-4F of our 8-camera 3D-360 scanner.
A 3D model requires images from multiple perspectives, so for this model we scanned from 4 different locations: two scans from a high perspective with the scanner cameras at 6 feet, and two low scans with the scanner 3 feet above the floor. Once the scans were completed (all of the pictures have been taken and downloaded) the images from the 4 scans were processed using our automatic 3D reconstruction software. This processing resulted in 4 “point clouds” of 3D data: one point cloud for each scan. Next the 4 point clouds were aligned with each other to create a single “point cloud” of, in this case, 20 million points.
Point clouds are a precise, but inefficient way to format and store 3D data. Point clouds for 3D data can be compared to the BMP format for 2D images. Just as compressed JPEGs are about 10x more efficient than uncompressed BMPs for storing 2D images, triangular meshes are a more efficient way to store 3D data than uncompressed point clouds. Meshes are efficient because a group of 3 points for a single triangle can replace thousands (or millions) of points if the points are in a plane. Decades of work from people around the world has resulted in mature procedures to generate meshes from point clouds. Our current meshing routine turned the 400 Mbyte “point cloud” of 20,000,000 points into a 20MB mesh of 24,000 triangles. In the future we will use more efficient meshing procedures that produce better meshes with even fewer triangles.
After meshing we have a 3D model of the area that was scanned, but at this point the mesh is not photorealistic. We make the model photorealistic by “projecting” the original color images taken during the scanning process onto the mesh. This automatic process is called “texture projection,” and when it is done well it results in a photorealistic 3D model.
Texture projection works very well when everything is correctly aligned and registered, but alignment errors can rapidly build on each other and produce errors that make a model look bad. The alignment errors in this process come from several different sources in the calibration/scanning/processing pipeline:
– Lens distortion correction errors inside each camera
– Alignment errors between the left and right camera in each of the 4 pairs of cameras
– Alignment errors between each of the 4 pairs of cameras
– Alignment errors between the 4 scans
These are all well defined problems that we are working on. We could proceed slowly and reduce the errors by recalibrating the existing Proto-4F 3D-360 camera system. This approach would take weeks and it could cut the errors in half a few times, but it cannot correct the built-in limitations of our current lenses and calibration facility.
Another option is to build on our two plus years of experience with the Proto-4x family and design a new Proto-5x series. The new design will have more lenses, higher resolution sensors, faster processors (ARM/AMD Fusion/Tegra/FPGA/other?), and it will be calibrated with a 10x larger “calibration bunker.” I am currently working on Proto-5x designs, and a key characteristic may be to increase the number of cameras from the current 8 to 32, or even as many as 100. A large array of inexpensive lenses can cost less and outperform a small number of expensive lenses. The trick is to design a manufacturable and and inexpensive array of sensors, lenses and processors. While a design with up to 100 camera may sound extravagant, remember that the fly’s eyes have over 1,000 lenses:
Because Proto-5x will require the design, layout, fabrication and testing of a new camera/processor board, this approach will take at least four months. Software porting, calibration, and testing could add another 4 to 8 months to the process. Depending on the final design, the Proto-5x family could reduce the errors by a factor of 10 or more.
The cameras are finally calibrated, and the communications and power systems are installed and working. Now I can finally begin producing scans to test and fine tune the software.
Today I scanned part of the lab, and the animated GIF illustrates the 3D nature of the scan. When producing a 3D model, multiple perspectives must be captured to fill in occlusions (blind spots). For this model, three scans from different locations were merged to produce a point cloud. The GIF consists of 7 different screen-shots of the point-cloud. While there are still occlusions, many have been filled. For example, notice that you can see both above and below the table.
The original 32-bit software that we use to turn pictures into 3D models is almost 5 years old, and it runs on 32-bit Windows XP. The old software often crashes when processing high resolution images because the 2GB memory limit isn’t enough to process the gigabytes of data that our scanner can quickly produce. Today’s scan was made on a computer running 64-bit Windows 7, and we are currently replacing the old 32-bits software with more advanced 64-bit code. The new software runs much faster in 64-bit mode because it can keep temporary files in RAM instead of writing them to and reading them from a slow disk. Even using a Solid State Drive (SSD) wastes minutes of unnecessary processing.
COMING UP: Much better scans processed by SketchUp & posted into Google Earth.