9 » IC Electronic information » Category: E

IC Electronic information

Embedded Speech Recognition Module of a Design and Implementation

In Electronic Infomation Category: E | on April 22,2011

Service robots for the purpose of service, so people need a more convenient and DS75S datasheet and more natural, more humane way to interact with the robot, rather than to meet the complex operation of the keyboard and DS75S price and buttons. HCI is based on hearing an important development in the field direction. The current mainstream speech recognition technology is based on statistical models. However, the statistical model training algorithm complexity, large amount of computation, usually by IPC, PC or laptop to complete, which will undoubtedly limit its use. Embedded voice interaction has become a hot research topic.

Embedded speech recognition systems and DS75S suppliers and PC, voice recognition systems, although the computing speed and memory capacity is limited and it has a small size, low power consumption, high reliability, small investment, the advantages of flexible installation , especially for the intelligent home, robots and consumer electronics.

1 module and structure of the overall program

The basic principles of speech recognition as shown in Figure 1. Speech recognition consists of two phases: training and recognition. Whether training or identification, must be on the input speech pre-processing and feature extraction. Training phase specific work done by the user to enter a number of training speech, after preprocessing and feature extraction feature vectors obtained parameters, and finally through the establishment of training speech feature modeling to the reference model library purposes. The identification phase is the main work done by the eigenvectors of the input voice parameter and a reference library reference model model comparison of similarity measures, then the most similar feature vector as input recognition result output. In this way, eventually to achieve the purpose of speech recognition.

Figure 1, the basic principles of speech recognition

Existing speech recognition technology in accordance with the recognition object recognition can be divided into specific and non-specific recognition. Identify a specific person is targeted at identifying specific, non-specific recognition is the recognition object is for most users, the general need to collect more personal voice recording and training through learning, achieve a higher recognition rate.

Based on existing technology to develop embedded speech interactive system, there are mainly two ways: one is called a direct voice in the embedded processor development kits; the other is the external expansion of embedded processor speech chip. The first method process a large amount of computational complexity, need to take a lot of processor resources, the development cycle is long; the second method is relatively simple, just need to focus on the interface part of the voice chip is connected with the microprocessor, simple structure, easy to build, microprocessor computational burden greatly reduced, enhanced reliability and shorten the development cycle.

Speech recognition technology developed very rapidly at home and abroad. Currently in PC applications, typically include: iFLYTEK the InterReco2.0, in the Division of pattern recognition Pattek ASR3.0, SinoVoice the jASRv5.5; in embedded applications, a representative are: Ling Yang SPCE061A, ICRoute the LD332X, Shanghai, China Town Electronics WS-117.

This speech recognition program is based on embedded microprocessor core, plus non-specific human peripheral speech recognition chips and related circuit. Speech recognition chip companies use ICRoute LD33 20 chips.

2 Hardware Design

Shown in Figure 2, the hardware circuit includes a master part and the speech recognition part of the core. Part of speech into speech recognition, it will transmit the processed data in parallel to the main controller, host controller after treatment, send the command data to the USART, USART peripherals can be used to extend serial devices, such as speech synthesis module.

Chart 2 hardware circuit

2.1 Speech Recognition Circuit

Figure 3 shows the speech recognition part of the diagram, reference ICRoute LD3320 Data Sheet released by the design. LD3320 internal integration of the fast and stable algorithm, without an external Fla-sh, RAM, no user training and recording in advance the completion of non-specific human speech recognition, recognition accuracy is high.

speech recognition part of the Figure 3 schematic

Figure, LD3320 and STM32F103C8T6 directly parallel phase, are used 1k pullup resistor, A0 is used to determine the address of data segment or segments; control signals, reset signals and interrupt return signal INTB is directly connected with the STM32F103C8T6 by 10k pullup resistor, the auxiliary system stability; and STM32F103C8T6 using 8 MHz with an external clock; light-emitting diodes D1, D2 for the power-on reset instructions; MBS (pin 12) as the * bias, then an RC circuit , guaranteed to output a floating voltage to the *.

2.2 host controller circuit

This is the main controller selects STM32F103C8T6 STs chip. The chip is based on ARM Cottex-M3 32-bit RISC core operating frequency up to 72 MHz, built-in high-speed memory (64 KB Flash and 20 KB of SRAM), a wealth of enhanced I / O ports and connected to two APB bus peripherals. STM32 family offers a new 32-bit product that combines high-performance, real-time, low power, low voltage and other characteristics, while maintaining a high level of integration and ease of development, the 32-bit MCU performance and effectiveness of the World leads to a new level.

3 software system design

Software system design includes three parts: the main control unit of the embedded operating system C / OS-II transplantation, LD3320 programming of speech recognition, dialog management unit design.

3.1 embedded operating system C / OS-II portable

C / OS-II is an open source, portable, can be cured, can be cut, preemptive real-time multitasking operating system. It is specifically designed for embedded applications for the computer, and most of the code using C language, with the implementation of high efficiency, small footprint, real-time with excellent performance and scalability features, the minimum core up to 2 KB. In C / OS-II, the task is particularly important concept, it is the preemptive kernel, so the division of task priorities is crucial. Based on hierarchical and modular design concept, the division of tasks throughout the system are listed in Table 1.

Table 1, the system task priority master planning

Table 1, in addition to OSTaskStat and OSTaskIdle task system comes with the other seven tasks are all user created. App_TaskStart is the system the first task, the system clock and the underlying device is initialized, create all the events, and other user tasks, and system status monitoring; App_TaskSR complete speech recognition; App_TaskCmd complete the dialogue focused on the analysis and implementation of the command and sent out through the USART1; App_TaskCom as a peripheral extension task, by sending commands or data USART2 out, responsible for controlling the peripheral expansion devices, such as speech synthesis equipment;

App_TaskUpdate USART1 received by parsing the dialogue set of commands and data update; App_TaskPB is the key scan task, responsible for the detection of three independent keys, divided into short press and long press detection; App_TaskLED drive four LED lights to indicate the current working condition.

3.2 speech recognition program design

Speech recognition program design, development LD332X reference manual, this article uses interrupt the work, its work process is divided into a voice for the identification of common initialization initialization - Write recognition list - begin to identify - the interrupt.

general initialization and voice recognition with initialization. In the initialization process, the main completed soft reset, mode setting, clock frequency setting, FIFO settings.

writing recognition list. List of rules, each entry corresponds to a specific identification number (1 byte), the number can be the same, you can not continuous, but the value is less than 256 (00H ~ FFH). The chip supports up to 50 items identified, each identification entry is standard Mandarin Chinese Phonetic Alphabet (lower case), every 2 words (pinyin) interval by a single space. This paper has adopted a continuous entry of different identification numbers, Table 2 is a simple example.

Table 2 identify the list of examples

begin to identify. Several related register settings, you can start voice recognition. Figure 4 is related processes. * ADC channel is the input channel, ADC gain is * the volume, set the value 00H ~ 7FH, recommended to set the value 40H ~ 6FH, the greater the value the greater the volume on behalf of MIC to identify the more sensitive start, but may lead to more more false identification; value of the smaller volume of the smaller representatives of MIC needs to start speech recognition at close range, the advantage is far from the interference of voice did not respond. This set is 43H.

Figure 4 begin to identify the process

interrupt. If * collected voice, regardless of whether the identified normal results, will generate an interrupt signal. The interrupt program register values ??according to the results. Register to read the value of BA, there are several candidates to know the answer, and C5 register the highest score in the answer is most likely the correct answer.

3.3 Dialogue management unit design

Order to facilitate the management of dialogue, this paper designed a dialog management unit for identification of the statement and the wait to wait for the command to execute stored in the main controller to achieve by defining two-dimensional array. LD3320 each set up to identify the identification of 50 candidate sentences, each sentence can identify words, phrases or short sentences, length of not more than 10 Chinese characters or pinyin string of 79 bytes. For these reasons, the paper design of dialogue management array listed in Table 3.

Table 3 dialogue management unit array

Behavior stored in the array numbers to perform the behavior, speech recognition corresponding to the 50 statements, 50 set of instructions, each instruction can contain up to six behavior, behavior can be classified as a parallel step, through multiple acts of combination, you can complete more complex tasks.

4 performance tests and applications

Order to ensure the design of speech recognition speech recognition modules, stability and response time, this paper describes the speech recognition module to do the appropriate testing, test environment was a quiet family environment and noisy hospital environment, a total of 8 voice commands, voice commands for each of 10 separate tests, each environment for each particular person is 80 times the total number of experiments, recording the number of successful identification. The results are listed in Table 4.

Table 4 Test results

Tests three non-specific persons, one a female non-specific, non-specific non-specific persons were 2 and 3 were male. As can be seen from the data in the table, the family environment on non-specific to the voice recognition rate can reach more than 90% of the hospital environment noisy speech recognition rate can reach 82.5% or more. Recognition rate, in a noisy environment than in a quiet environment, speech recognition rate is reduced; stability, in a quiet environment, good stability of the system, said a voice over, up to 2 times that the module can make the correct response; in noisy environments, decreased stability of the system, individual voice command to say more than 3 times or 3 times to be identified by the module; real-time, the voice in a quiet environment to ensure real-time response of the system nature, the response time is generally less than 1 s, in a noisy environment the response time is relatively longer.


This article discusses the embedded speech recognition based STM32 module design and implementation of the component units of the module hardware and software implementation are described in detail. Large number of experiments and practical application show that the design of the voice recognition module has good stability, high rate of speech recognition, noise interference, simple and easy to use and so on. The module is practical, intelligent service robot can be widely used in space, smart home and consumer electronics and other fields.

DS75S datasheetDS75S suppliersDS75S Price

All right © 2010-2016 Certificate