Simple Structured Vision

The vision part of the web bot page mentioned in the latter part of the description has been confusing to people. In part, because everything today is "MORE POWER" and "I'll just use a 3D camera!" and that seems great until they get buried in data and can't react in time as a result.

This is simple, small, efficient, smart and comes from a day when you didn't have massive compute or ultra high resolution. And things still had to work. I was introduced to it near the turn of the century (LOL) by a discussion on the PICList:
http://www.piclist.com/techref/io/sensor/pos/opticalranging.htm

Structured light vision techniques greatly simplify the process of triangulating the position of a point in space. Using structured light techniques, one can do realtime navigation without lots of horsepower. This is because one is able to project a beam with known coordinates. This provides information about at least one aspect of the geometry in front of the camera, so the triangulation equations are much easier and less computationally expensive.

Structured light techniques are far superior to sonar for range and shape measurements in quality and potentially cheaper. They are able to map out the region in front of the camera quickly and accurately. The angular resolution of structured light systems is quite good. They are also better than IR sensors because they are far more reliable, higher resolution, and easier to debug because they can be seen.

The basic idea with structured light vision is to project a light beam of known geometry onto a scene and then use a video camera to observe how it is distorted by objects. Using simple geometric formulas, one can reconstruct the shape of those objects.

There are several optical attachments to laser diodes which produce wide planes of laser light without any moving parts. Additionally, one does not have to use a visible laser to perform the scan. Most cameras will detect infrared laser beams quite well (some cameras are more sensitive to infrared light).

Basic idea

Here we see a self-scanning laser diode projecting a beam onto the region in front of it; above is the camera.

This arrangement would produce the following images:

The first image is what the camera sees with the laser on. The second image is what the camera sees with the laser off. The third image is the mathematical difference of the two. By subtracting the first two images, only the difference between them is left: the laser projection. Finally, one may want to filter the difference image so that the scanned beam appears no thicker than one pixel. This will aid in map building.

Scene Geometry and Range Calculation

Now that we have a bitmap image of the laser scan on the objects (and knowing the geometry of the scanner and laser), we can reconstruct the geometry of the scene. Examining a simplified form of the laser projection below, where

Using simlar triangles, we see that: y / f = h / ( r + f ). Solving for 'r' we arrive at: r = ( f * h ) / y - f

Since we know f and h, we can determine the range, r, as a function of y.

Software Implementation

We now merely have to scan the difference bitmap and calculate the range of any pixels that result from the laser projection.

     For y = 1 to Bitmap.YMax
          For x = -Bitmap.XMax to Bitmap.XMax
               If Bitmap.Pixel(x,y) != 0 then
                    ry = f*h/y - f
                    rx = x * (f+ry)/f
                    rz = -h

                   { Now do something with the knowledge        }
                   { that a pixel maps to a particular position }
                   { relative to the camera. Perhaps convert    }
                   { to absolute coordinates and build a map    }
               end if
          next x
     next y

Reality

Of course the issue with this is it assumes a perfect set of pictures, with no motion between the frames. In fact, a mobile robot will be moving (it's right there in the name) and the two images will be shifted slightly. But this really doesn't cause as much of an issue as one might think. And in fact, the shift is valuable on it's own.

The difference between the frames will be larger given relative motion of the pixels caused by motion of the robot or objects around the robot. Objects moving around the robot SHOULD be seen! We want the robot to react to things that might run into it or interact with it.

Motion caused by the robot falls into two areas: Rotational shifts and Perspective shifts.

Perspective Shifts are caused with the robot moves forward or back and obejcts in the cameras view change size. This is actually very useful! Think about an object far from the robot. It's pixels will change little, if at all. The robot won't react to it. Now think about an object very close to the robot. It will change in a large way and will indicate that the object is close. Which is exactly what you want! Especially, think about an object near, but to the right side of the robot. This object will cause a large shift in pixels, indicating a close, large, dangrous object. The robot should then turn away from it, to the left, thereby avoiding the collision.

Rotational Shifts are caused when the robot turns, and are probably the least useful / biggest problem. However, they can be reduced by sensing optical flow, and removing the rotation from the second frame digitally. But in practice, this is largely unnecessary; if the "laser" line is horizontal, and is the primary illumination, it won't change as it lefts side to side. Bright objects in the distance will typically be higher up and can be ignored by simply reducing the weight of reaction to pixels higher in the image. Bright objects up close should cause a reaction as they are "noticed" during the turn.

See also: