Technologies: Sensors » Embedded vision system for real-time applications

Welcome to The Neuromorphic Engineer

Technologies » Sensors

Embedded vision system for real-time applications

PDF version | Permalink

Pierre-François Rüedi and Eric Grenet

1 March 2007

Moving some processing functions into the sensors can aid robust, low-power, and low-cost operation.

There is an increasing demand for low-power, low-cost, real-time vision systems able to perform a reliable analysis of a visual scene: especially in environments where the lighting is not controlled. In the automotive industry, for instance, there are many potential applications including lane-departure warnings, seat-occupancy detection, blind-angle monitoring and pedestrian detection. But there are multiple constraints involved in embedding a vision system in a vehicle. First, the automotive industry has stringent requirements in terms of cost. Second, a vision system in a moving vehicle will experience sudden changes in illumination level and wide intra-scene dynamic range: this imposes severe constraints on the sensor characteristics and the optical design. Finally, the diversity of environments and situations, and the need for a fast reaction time make algorithm development a challenging part of the work.

The approach we have taken to solving these multiple requirements consists in moving part of the image processing to the sensor itself. This allows the extraction of robust image features independent of the illumination level and variation, and limits data transmission to the features required to perform a given task. The vision sensors developed at CSEM^1,2 perform the computation of the contrast magnitude and direction of local image features at the pixel level by taking spatial derivatives at each pixel. These derivatives are multiplied by a global steering function varying in time, resulting in a sinusoidal signal whose amplitude and phase represent, respectively, the contrast magnitude and direction.

The contrast representation derived in the vision sensor is equivalent to normalizing the spatial gradient magnitude with the local intensity. Unlike the spatial gradient, the contrast representation does not depend on illumination strength, thus introducing considerable advantages for the interpretation of scenes. Furthermore, information is dispatched by decreasing order of contrast magnitude, thus prioritizing pixels where contrast magnitude is strong: these are usually sparse in natural images.³ This mechanism allows the reduction of the amount of data dispatched by the sensor. Figure 1 illustrates the high intra-scene dynamic range of the vision sensor and its ability to discard illumination.

Figure 1.

Shown is the gray-level image (top left) and contrast magnitude representation (top right) close to a tunnel exit. Middle left is the contrast magnitude representation with a transition between a sunny area and a shadowed area across the pedestrian crossway. Middle right shows the contrast direction representation on a road at night, with cars coming in the opposite direction. Here the representation is color-encoded. The bottom row shows the contrast magnitude with the sun entering in the field of view.

A compact and low-power platform, called Devise, has been developed to demonstrate the efficiency of this approach to implement low-power real-time vision systems. The platform, shown on Figure 2, embeds a vision sensor,² a BlackFin BF 533 processor, memory, and communication interfaces. An ethernet interface enables easy connection to a PC, allowing visualization of raw data in real time and easing the development and debugging of new algorithms. Once an application has been developed and migrated to the BlackFin processor, a low-data-rate radio-frequency link is available that can be used, for instance, to communicate between different nodes in a network of such platforms.

Figure 2.

Shown left are the platform components: sensor board, processing board, battery, optic, and case. Right, the vision system can be seen mounted in a car behind the rear-view mirror for live lane-departure warnings.

In the last few years, we have made a continuing effort to develop software that exploits the contrast information delivered by our vision sensors to analyze visual scenes in natural environments. Development has been focused in two areas: automotive, as mentioned previously, and surveillance. The main function of our 'driver assistant’ algorithm, for instance, is to detect the road markings so that the position of the vehicle on the road is known at all times and the driver can be warned if they leave their lane unintentionally. Each road marking—consisting of two edges with high contrast magnitude and opposite contrast directions—is detected and tracked in a restricted area that is continuously adapted to the last detected position. Continuous and dashed markings are differentiated. The vanishing point is extracted, the variations of which give useful gyroscopic information (tilt and yaw angles). A Kalman filter supervises the system and gives robustness to the detection (e.g. when markings are temporarily missing). The system also estimates the illumination level and road curvature by fitting the markings points with a clothoid equation, allowing it to appropriately control the headlights.

This algorithm, implemented in the BlackFin processor, works robustly at 25 frames per second in varying conditions such as night, sun in the field of view, and roads with poor quality markings. For demonstration purposes, detection results (mark position and type, road curvature, light level, etc.) are sent via the low-data-rate radio-frequency link to a cellular phone that displays a synthetic view of the road in real time (see Figure 3).

Figure 3.

Various road situations and their related symbolic representation. Shown are single-lane (top left) and a multi-lane curves (top middle) by day, a lane departure in a tunnel (bottom middle) and on a countryside road with single marking by night (bottom left). To the right is a real-time display and a warning on a cell phone.

This work demonstrates that moving some of the image processing to the sensor itself is a solution to implement real-time low-power and low-cost vision systems able to function robustly in uncontrolled environments.⁴

Authors

Pierre-François Rüedi
CSEM S.A.

Eric Grenet
CSEM S.A.

References

M. Barbaro, P.-Y. Burgi, A. Mortara, P. Nussbaum and F. Heitger, A 100×100 pixel silicon retina for gradient extraction with steering filter capabilities and temporal output coding, IEEE J. Solid-State Circuits 37 (2), pp. 160-172, 2002.
P.-F. Rüedi, P. Heim, F. Kaess, E. Grenet, F. Heitger, P.-Y. Burgi, S. Gyger and P. Nussbaum, A 128×128Pixels 120 dB Dynamic Range Vision Sensor Chip for Image Contrast and Orientation Extraction, IEEE J. Solid-State Circuits 38 (12), pp. 2 Dec., 2003.
D. J. Field, What is the goal of sensory coding?, Neural Computation 6, pp. 559-601, 1994.
E. Grenet, Embedded High Dynamic Vision System For Real-Time Driving Assistance, TRANSFAC '06, pp. 120, San Sebastian, Spain October, 2006.

DOI: 10.2417/1200703.0047

Tell us what to cover!

If you'd like to write an article or know of someone else who is doing relevant and interesting stuff, let us know. E-mail the and suggest the subject for the article and, if you're suggesting someone else's work, tell us their name, affiliation, and e-mail.