|Welcome to The Neuromorphic Engineer|
Biological Models » Attention
Brain-inspired auditory processor
PDF version | Permalink
The Korean Brain Neuroinformatics Research Program has two goals: to understand information processing mechanisms in biological brains and to develop intelligent machines with human-like functions based on these mechanisms. We are now developing an integrated hardware and software platform for brain-like intelligent systems called the Artificial Brain. It has two microphones, two cameras, and one speaker; looks like a human head; and has the functions of vision, audition, inference, and behavior (see Figure 1).
The sensory modules receive audio and video signals from the environment and perform source localization, signal enhancement, feature extraction, and user recognition in the forward ‘path’. In the backward path, top-down attention is performed, greatly improving the recognition performance of real-world noisy speech and occluded patterns. The fusion of audio and visual signals for lip-reading is also influenced by this path.
The inference module has a recurrent architecture with internal states to implement human-like emotion and self-esteem. Also, we would like the Artificial Brain to eventually have the abilities to perform user modeling and active learning, as well as to be able to ask the right questions both to the right people and to other Artificial Brains.
The output module, in addition to the head motion, generates human-like behavior with synthesized speech and facial representation for 'machine emotion'. It also provides computer-based services for users.
The Artificial Brain may be trained to work on specific applications, and the OfficeMate is our choice of application test-bed. Similar to office secretaries, the OfficeMate will help users perform tasks such as scheduling, making telephone calls, data searching, and document preparation.
The auditory processor
The auditory module consists of feature-extraction, binaural, and attention models, all inspired by the human auditory pathway. The feature extraction model is based on a cochlear filter bank, zero-crossing detector, and nonlinearity. The filter bank consists of many bandpass filters, of which center frequencies are distributed linearly in terms of the logarithmic scale. The zero-crossing time intervals are used to estimate robust frequency characteristics in noisy speech. The logarithmic nonlinearity provides wide dynamic range and robustness to additive noise, while time-frequency masking may suppress weaker signals that are likely to be noise.
The binaural model estimates interaural time delay based on zero-crossing times for noise robustness. Also, the binaural processing algorithm has been extended to incorporate multiple sound sources and room acoustics with multipath reverberation. The convolutive indpendent-component-analysis algorithm we developed successfully separates multiple speeches using linear or cochlear filterbanks.1
A simple but efficient top-down attention model has been developed with a multilayer Perceptron classifier for pattern recognition systems. In this top-down attention model, an attention cue may be generated either from the classified output or from an external source. The attended output class estimates an attended input pattern based on the top-down attention. This may be done by adjusting the attention gain coefficients for each input neuron using an error backpropagation algorithm. For unattended input features, the attention gain may become very small, while those of attended features remain close to 1. Once a pattern is classified, attention may shift to find the remaining patterns.2
To provide intensive computing power we developed a special chip for real-time applications. The system-on-a-chip consists of circuit blocks for analog-to-digital conversion, nonlinear speech-feature extraction, a programmable processor for the recognition system, and digital-to-analog conversion. Also, the extended binaural processing model was implemented using field-programmable gate arrays and tested using a board with two microphones and five speakers (see Figure 2). The two microphones receive six audio signals, and the chip and board demonstrated great signal enhancement: the final signal-to-noise ratio was about 19dB, and the enhancement 18dB.3
In the early 21st century, intelligent machines will help humans as if they were friends pr family members , and provide services for human prosperity. Intelligence to machines, and freedom to mankind!
Tell us what to cover!
If you'd like to write an article or know of someone else who is doing relevant and interesting stuff, let us know. E-mail the editor and suggest the subject for the article and, if you're suggesting someone else's work, tell us their name, affiliation, and e-mail.