9 » IC Electronic information » Category: F

IC Electronic information

Fault-tolerant real-time embedded system design

In Electronic Infomation Category: F | on December 31,2010

Fault-tolerant real-time systems research focuses on two aspects: ① improved real-time scheduling algorithm, so that to ensure real-time tasks encountered in normal operation and BT865AKRF datasheet and error, could come before the prescribed time limit the correct output. ② the past applied to a redundant fault-tolerant computer systems strategy ported to real-time systems.

with hardware fault tolerance in computer systems, the failure of 65% from the software, only 8% from the hardware. Therefore, the software fault-tolerant computer system reliability as a key decision. In the event of hardware or software for temporary or permanent failures, to ensure mission-critical still within the prescribed time limit to complete operations, and BT865AKRF price and output the correct results, a dual-processor real-time embedded fault-tolerant system architecture. This system is a multi-processor architecture, to achieve communication between computers, and BT865AKRF suppliers and seamless integration of computer hardware, operating systems, application level software fault-tolerant software design, to improve overall system reliability purposes.

1 Fault-tolerant real-time system architecture

the system shown in Figure 1, the hardware structure of Fault Tolerant System Model. Compared to the system in a dual system based on the combination of loosely coupled multiprocessors and tightly coupled system architecture, in a different interconnection between processors to achieve communication through the channel, as in the hardware fault-tolerant software in combination may provide fault tolerance.

Figure 1 Fault-tolerant system model

A B machines each machine and the external control logic alone and peripherals, this will not cause the system to competition for resources, to increase the overall stability of the system. Of course, this is to spend more hardware for the price. Detection comparator and inconsistent detection circuit designed to implement the arbitration, which according to A and B machine cycle machine to send the self-test signal to determine the A and B machine system computer system operating condition.

running two-machine system are as follows:

① A machine if the machine were running with the B, then A as the main computer system, computer B to use as a backup, A machine operating results as the system output, A machine is running to the detection point, sends the log to the B, B machine update log list.

② normal and B if A machine malfunction, the result will be A running machine as the system output, while the B machine running fault status report to the A machine, B machine to the reset control operation.

③ If A machine failure, B machine correctly, the switching operation, B heavy machine scheduling backup tasks, B machine operating results as the system output to the A machine reset control operation, A machine testing point updates log to back up the task of keeping the state consistent.

2 software design and implementation

Figure 2 shows the model with embedded real-time system architecture and modular structure using hierarchical combination of seamless integration of computer hardware, operating systems, application software, the software level fault tolerant design. In the layered structure of the whole model, to overcome the hardware and software separation and bridge the gap and improve the system flexibility and portability. Model of each layer can be seen as a relatively independent system. In each layer, in accordance with system functions, divided into different functional blocks.

Figure 2 Fault Tolerant System Software Architecture

symmetrical structure of the system to support fault tolerance, Each node from the bottom is divided into three main parts, namely MCFT (Multiprocessor CommunicatiON for Fault Tolerance), RTOS system-level fault-tolerant components, task-level dynamic redundancy component.

2.1 fault-tolerant multi-machine communication module MCFT

between the operating system and hardware add MCFT layer, MCFT as the BSP (Board Support Package) part, as the hardware platform abstraction layer for the operating system to provide a unified interface, improve system portability. Fault-tolerant needs of the task, the functionality provided by MCFT transfer log, the main system and backup systems to keep the key tasks of the state and data consistency. MCFT shielding concrete realization of the underlying communication details of the realization of the system has nothing to do with the connection medium.

MPFT manages some of the data packets, and between the various nodes send and receive these packets, the packet is structured as follows:

2.2 RTOS system-level fault tolerance components

RTOS system-level fault-tolerant components, including kernel-level fault-tolerant system support components, the system self-diagnostic components and the main / backup units switch to support components.

(1) support kernel-level fault tolerance components

to support the operating system level and application-level communication in the system, save the two objects on each node table, a local task list, a fault-tolerant task table. Local task list is different on each node, which is included in this node to create all the tasks. Fault-tolerant object table contains all the fault-tolerant system tasks on all nodes is the same. To keep the task on all nodes form the consistency of fault-tolerant, fault-tolerant object on each node to create, delete, and so must be notified to the backup node. Use of technology and the checkpoint log law passed to keep the main system and backup system and data backup tasks consistent with the state. Once the host fails, the system automatically to the main / backup units switch, backup computer system is ready to make backup tasks, using real-time task scheduling strategy, the backup task occurred on the backup machine rescheduling, as the host.

(2) the system self-diagnostic components

shown in Figure 3, the system self-diagnostic methods used to diagnose system-level fault detection with the mission to diagnose application-level level fault.

self-diagnostic test is divided into several different stages, the system testing phase and the cycle starts from the self-test phase. Automatically starts the diagnosis of factors: the primary / backup machine timer switch and host failure. Periodic self-test phase, according to system requirements, periodic testing peripherals and communication ports. Each stage corresponds to the equipment of several functional blocks, including the CPU self-diagnostics, interrupt response from the diagnosis, serial self-diagnosis, self-timer diagnosis, discrete self-diagnosis, RAM self-diagnosis.

comparison is the result of any transaction in real-time systems have to go through the steps, so the task-level fault detection into the results of discriminant part.

(3) main / backup units switch supports component

Arbitration detection circuit on the main / standby machine set a "watchdog" to monitor. When the main / standby machine is in normal working condition, running on the CPU periodically on a mission to "watchdog" reset signal is applied, so that "watchdog" counter trigger signal overflow is unlikely to occur; when the CPU appears failure, "watchdog" will output a trigger signal and discrete alarm, this time, the system automatically switch to backup system machine work.

Figure 3 master process

2.3 task-level dynamic redundancy

in real time multi-tasking system, a software redundancy with another method - the task-level dynamic redundancy. Task-level dynamic redundancy approach is real-time system transient fault recovery methods.

in real-time multitasking environment, take full advantage of the functionality provided by the operating system, the basic task of each task as a back-up redundancy, and fault-tolerant scheduling backup tasks, which play similar to retry or rollback recovery. Use of technology and the checkpoint to pass laws to keep the main system log and backup the system state consistency, to achieve error recovery, a higher price.

Depending on the application, combined with real-time requirements, the use of the following measures:

① the application into multiple tasks, the task to process the forms, each task into the run order is from 1 to  n, and the end of each task by setting up check points, passing log.

② According to the requirements of the application prior to prioritize the various tasks, making the tasks can share the processor time required to achieve real-time processing.

③ basic task for the task of preparing a backup stored in memory, usually do not create backup tasks, do not share system resources, use of only activated when needed, reserve the priority of the task priority level than the corresponding higher. Established immediately to seize the executive, is a sense of retry or roll back procedure.

④ reserve for the realization of recovery tasks, the task can be exactly the same as the original, it can be replaced by algorithms.

following algorithm can generate fault-tolerant scheduling for each task in order to achieve the task redundancy:

Nmax executed when the backup task times after the test also pass to that the system permanent fault occurs, the system alarm. Nmax is the valve value is determined by the real-time requirements.

3 reliability analysis

taking into account the double switch problems (including cut into the success rate associated with this cut time and time again to cut the problem of fault identification) after complete Fault Tolerant System steady state availability for the

where: λ is the average failure rate, β for fault diagnosis rate, the average diagnosis the reciprocal of the time; μ is the average repair rate, is the reciprocal of mean time to repair; α to join the failure rate is the reciprocal of the average cut time; C for fault classification rate; α failure rate was cut again, is the reciprocal of the time cut again (time to restart the countdown duplex); D is cut into the success rate.

symmetrical two-machine system, in a typical calculation can be 99.99995% availability.

4 Conclusion

real-time systems in the security field as more and more applications in reliability has become an important factor in measuring the merits of one system. The traditional fault-tolerant real-time system is only one aspect of the system to meet the needs of fault tolerance. In the event of hardware or software for temporary or permanent failure, the system in the time can still be completed within the scope of operations, and output the correct results, this paper presents a software and hardware combined with a complete solution to meet the system Strong real-time, high reliability, service requirements continue to flow. This program applies to RTEMS in high reliability.

BT865AKRF datasheetBT865AKRF suppliersBT865AKRF Price

Related technical information

All right © 2010-2016 Certificate