9 » IC Electronic information » Category: T

IC Electronic information

The interaction of distributed embedded system consistency

In Electronic Infomation Category: T | on March 18,2011

Many applications, the safety and 74LS38 datasheet and security of person or equipment is closely related to the improvement of security requirements with the hope that equipment or system components in its composition and 74LS38 price and control device still fails to ensure security, that is failure - security (failsafe) features. System is composed of subsystems, the subsystem of a controlled failure to stop working (failsilent, failure - silent mode), the system is still in terms of failure, because it no longer provides scheduled services, which may cause the whole system function failure. Therefore, from the highest level of security is global to analysis. Such as a car, the brake system as a whole can not fault - Silent to achieve security, but should be able to work after a fault (failoperatiONal, failure - are still working mode), or at least drop in performance that can still work (faildegraded, fault - Lowering operating mode).

Single component architecture (including hardware and 74LS38 suppliers and software) failure of a fault can not continue to provide services, and it can not meet the failure - still working mode or failure - downgrade mode requirements. This must be backed up the redundant architecture, each able to complete the failed backup parts of most or all of the original service, to maintain system uptime. Alternate backup job requires them to work state (the system input, output and who should not be output) have the same view. This same view, and through information exchange can be established by agreement, known as interactive consistency (interactive consiSTency).

Some control functions on the object exists the possibility of mutually redundant, for example, the four wheels of the vehicle brakes, when a sub-wheel braking system (equipment or control device) is faulty, the amendments to the other wheel brake force can achieve overall brake system failure - still working mode or failure - downgrading operation. At this point in terms of individual wheels, it can be achieved as long as the fault - silence can be. Clearly, the trustworthiness of a single wheel is reduced requirements will result in greatly reduced cost. In this case, the wheels there is interaction between the controller consistency.

Gangster from Infineon and Delphi Automotive research report [1] compared each other based on this distributed redundant redundant braking system with centralized control of redundant hardware costs alone, that cost can be large for the decline, saying Centralized brake system controller redundancy will be abolished, which is Chinas automotive electronics industry a warning significance. However, the article did not mention the program with a focus on hardware to achieve the object can be mutually redundant braking system (the cost difference between these two programs will not be too large). Are quite different this time due to communication, interaction will be different consistency, coupled with other factors, their advantages and disadvantages to be studied. Distributed each other but at least the brake system is a redundant option. References [1] is outlined in reference to the control program, did not talk about technical details and the adoption of the agreement. This will be based on the theory of interactive consistency, in the implementation of such applications to analyze possible problems.

1 SM Algorithm

Consistency of interaction has been studied for 30 years, it was known as the Byzantine Generals Problem Algorithm (Byzentine Generals Problem). There are 2 versions of the original literature [23], the 1980 article cited a lot, but acknowledged difficult to read [4]. The original discussion was carried out for point to point communication, the paper based on reference [3], understanding, communication means for the bus to be launched, which will introduce the authors view. Reference [3]: a redundant system, "all non-fault nodes should use the same input (in order to produce the same output); If you enter the system right, you should use the input values ??(so as to produce the correct output ). " Reference [3] provides two solution algorithms: First oral message algorithm OM (Oral Message Algorithm), the second is the message signature algorithm SM (Signed Message Algorithm). A mistake in terms of allowing m, OM algorithm needs 3m +1 nodes and m +1 rounds of message delivery, SM requires m +2 m +1 rounds of nodes and messaging. This is the principle and properties of 2 are very different algorithms. OM algorithms rely on information relayed from the node with a vote to determine the input, when the vote can not be predefined to take the default input. When the primary node has the wrong value of the Byzantine and the majority was wrong when the error-free, although from the nodes is the same view, but is not correct. SM algorithms rely on inspection and repeat step by step forward, you can find all nodes (including the master node) is wrong, and as long as there is a proper receipt on it. Because of good performance and require less from the nodes, SM worthy of further exploration. The following communication with the bus situation to introduce the SM approach.

on the need to exchange data and to ensure a consistent n = m +2 nodes, the problem can be decomposed, each node can take turns as the master node to send messages to other nodes, the implementation of SM algorithm.

Each communication frame contains two parts: the data d and d on the signature with a. According to Reference [3], the signature is wrong either fake nodes, each node should be different and each is different. I believe that may not be so under the requirements of industrial applications, as detailed later.

Communication follows the frame of the wheel:

The first one, the master node send its own data and signature (d: a0);

Round 2, each node forwards from the first one received by the frame together with his signature ((d: a0): aj), where (j = 1, ..., n-1);

Subsequent rounds, all from the node to forward frames received from the previous round together with his signature ((... ((d: a0): aj) ...): ar), where (j, ..., r {1, ..., n-1}; j ... r), that has been transmitted from the node of this content is no longer forwarded.

As broadcast by bus rather than point to point communications, as long as the calculation of traffic with different number of frames to be: N = 1 + (n-1) + (n-1) 2 + .... The total number of communication rounds m +1.

save each from the node set of an alternative choice, initialization is empty: choice {}. choice of the update in the m +1 rounds of communication after the end. Updated when the first test the validity of the signature, only the whole is effective only to the frame d added to the choice, if the choice already, do not repeat the add. Point to Point Communications by reference [3] approach, the master appears when the wrong choice to have multiple elements, the bus master node to communicate only once the signature calculation, according to this approach (see below) choice will be only one element (true value or empty).

Reference [3] proved that guaranteed the assumption that all non-fault conditions from the node will get the same choice:

A1 can always send the right message delivered;

A2 each node to know who is sending;

A3 deletion message can be detected;

A4 signature can not be fake, fake can be detected;

Able to detect any sign from the node is wrong.

SM algorithm and the related communications error occurred undetected frame error occurs once the situation is equivalent to the number of fault-tolerant design to be considered.

Reference [3] proposed a signature example of a method, that is, the data d with key ki obtained signatures a: a = (ki d) mod p, where p is a power of 2, ki is less than p, an odd number, the receiving node ki-1 with another key authentication: d = (ki-1 a) mod p. ki and ki-1 has the following relationship: (ki ki-1) mod p = 1. In this way, wrong node can fake probability 1 / p. Such programs are either fake way to know the encryption Caixing. Reference [3] that the more stringent requirements of the occasion to adopt the method of cryptography.

From the perspective of industrial applications, there is the possibility of the wrong source of fake nodes but electromagnetic interference, for human use against hacker attacks should be other measures, so it can use relatively simple CRC checksum commonly used as a signature. Note that the CRC checksum is the checksum of application data, not to be confused with communication frame checksum. In a redundant system of concern is the application of data consistency, and application data communication controller in the MCU and the potential for error during the transmission, communications frame CRC check does not cover the fault. For example, the application of the two-channel FlexRay transmitted through the same data, because the process is written to the output buffer sharing, and if one of which was subject to interference, application data and application of CRC (signature) no longer match, then the receiver will be able to found that the transmission of application data to be wrong and discarded.

Radio communications on the bus, because of interference of each node in different conditions, please, they may receive a different frame, the frame if they missed the wrong place, it will happen to send a node to other nodes in the case of different values , which is the case of point to point communication, is a Byzantine fault. Meanwhile, the forwarding process also involves the MCU transfer process and communication controller, which will be the wrong place, so the process is also used to forward the data plus the signature approach. For example, the node p receives in round 3 frames (((d: a0): aj): ai), the test ai signature is correct, if wrong, then the node j to i in the forward communication wrong. If no error, continue to ((d: a0): aj) for aj signature test, if wrong, then the forwarding node 0 to j communication wrong. If there is no wrong, and then the (d: a0) a0 signature for the test, if wrong, then the node 0 and node 0 MCU communication controller communication error occurred, or when it occurred in the calculation of the signature wrong.

Not pass the signature test data will not submit to the choice, if the primary node without error, from a node has m +1, then at least one of the first round proper receipt from the node, the other in later rounds wrong node, the forwarding node will not affect this choice. SM algorithm can now be found an interesting feature: If some nodes only occurred from a transient fault, error-free nodes as the forwarding, it still has a chance to get the same choice.

As a result of the bus radio, the master node can receive its own frame is forwarded, it can self-test. After all, if self-test is not passed forward to further measures can be taken, for example, the node re-signed and written communication controller operator, or immediate access to fault - silent mode.

SM algorithm assumes that the loss of frames can be detected, which rely on additional overtime alarm unit. Once the time window closes, each node has received under the choice of the update frames.

An SM algorithm can be considered after the end of a SM will begin a new algorithm, it can be initialized choice. If each node, begin the sequence data of the fixed nodes and nodes before forwarding the moment, begin to preset local timer, then the window at the scheduled time due to failure on a node can be without forward error found. Forwarding node on the end of the signal or time can be used to trigger the start of the node forwards.

2 mutually redundant

Through the implementation of SM algorithm, all non-transient failure error nodes and nodes can be sent by other nodes in the same input values. Wrong node, the input values ??given by all nodes (including the node itself is wrong) recognition, not to be used further for calculation, this time into the wrong node failure - silent mode, and the remaining nodes in the correct distribution of brake torque started to re-direct algorithm. Input error-free exchange of each node when the control algorithm with similar control output for each wheel (brake torque). SM algorithm and then use to exchange their results, each node will have the same output value of each node.

Each node in the calculation, because of interference (such as EMI or heat shock, etc.) and error output value. Or any other additional conditions (for example, the failure of the brake torque sensor, MCU self-test error), and other factors, people calculated that the output of the node is not normal. The SM algorithm, the output value of this error consistently delivered to each node.

That the wheels have 4 brake torque from the results of the redundant nodes, which is a two-dimensional matrix. Error in the matrix to identify which one is the brake torque, and a possible permanent fault node is. As a result analog computing, a redundant system and to avoid common mistakes (common mode faults) emphasized the diversity among the backups. The same algorithm may also make a small difference in the results, so with the other results were regarded as off limits greater than the wrong book.

Serious fault output node may be all wrong, there may be only transient node failures result of a calculation is wrong. Since all the nodes are used correctly distinguish the same principle, the same input data, they make the same identification conclusions.

If there is a node of a braking torque calculation is wrong, then this node can be considered having transient interference. At this point can be determined by the majority or average braking torque of each wheel, each node corresponding to the brake at the wheel torque to be implemented.

If there is a node of the brake torque calculations are wrong, then place the node to determine a serious fault may be permanent, then all correct nodes will be re-calculated braking torque compensation algorithm output, so that car to reduce yaw and reduce turbulence.

Serious error is a communication link broken, the correct node by SM algorithm can find wrong with the time-out nodes. Broken link node by SM algorithm found all the other nodes are wrong, then it should be judged as they are wrong, the brake torque to the wheels should take a default value, the default value is the correct node in other compensation algorithm the wheels used in the default value.

Clear from the node as the only guarantee failure after failure - silent mode, the brake torque of the vehicle to ensure the redistribution of failure - security features. Controller hardware will be greatly simplified, although the initial software costs increase, then you can share, the overall cost down.

3 CAN possibility in such applications

3.1 Bandwidth

On four wheels and an instruction node node (transmitted by the brake pedal or other system commands, forwarded by the wheels to the wheel speed signal) to form a mutually redundant systems, to allow a mistake, the use of SM algorithm needs two communication nodes need to exchange the 5 input data 1, a total of 25 frames, 4 1 node output data exchange, a total of 16 frames. If required once every 5 ms coordination, transmission 41, will be very tight in terms of the CAN, which is often referred to the bandwidth limitations of CAN. It is assumed that the redundant system only allowed after the 5 ms within a fault. Redundant from each other if the brake system cost considerations, to the bandwidth issues, the establishment of a dedicated CAN bus is worth it.

SM algorithm is in fact the transmission of data overlap considerably, in this case in terms of a node it received from 5 times. If only a transient from the faulty node, it must receive the correct value. In accordance with the SM algorithm, in order to deal with m a mistake, you need one from the node m +1, so although mutually redundant system has 5 nodes, each slave node does not need to be forwarded. For example, to take forward the input from node 3, then every 5 ms to send 36, you can alleviate the bandwidth bottleneck. If taken forward from the node 2, the bandwidth into each of 5 ms to send 27.

3.2 Uncertainty

One point of view that only time-triggered communication protocol to meet the high reliability requirements. But this is one-sided. In this case, if all redundant systems need to send each other messages are set to higher priority, then the bus frame they will not stop like a long delay, as long as the bandwidth is allowed, can be served. The arrival of these messages within the sequence of the SM algorithm is not affected. More broadly speaking, as long as the triggering event and time CAN-one correspondence, it also enables time-triggered functions. The competitive advantage of CAN send, not send wrong node bandwidth used by other nodes as soon as possible, SM algorithm the time required for the timer as long as a sufficient. Missed wrong frame


The CAN bit-stuffing rule in effect on the CRC, according to the author of the study, it missed the frame error rate greater than Bosch CAN2.0 specification data. However, due to the presence of SM algorithm, signature, this issue has been waived or eased. When the signature of the CRC method proposed in this paper, then the signature can be wrong and missed the case of missing rate with a CAN error frame analysis and improvement of the same method.

3.4 Fault Tolerance

SM algorithm can always assume that the right to send the message delivery, communication can be detected in the wrong to be excluded. This requires some kind of error correction or redundancy. CAN automatic retransmission error correction is a good measure, but it can only deal with transient interference, failure of the physical channel, such as disconnection, short circuit, etc., need to use fault-tolerant CAN transceiver protocol ISO118983. The bandwidth of this transceiver is smaller, standard 125 kbps, the better MAX3054 up 250 kbps. If mutually redundant braking system control cycle for 20 ms (ie within 20 ms allowed a mistake) (reference [5] quoted by wire control allowable time 50 ms), it CAN be allowed to meet physical failure of the algorithm can satisfy the bandwidth requirements SM. From the probability sense, the re-issued and the bandwidth increment is not significant.

4 Summary

Distributed redundant systems are characterized by mutual use of the principle of redistribution of fault - security. Not all systems can use this method. But in the distributed system to ensure data consistency is very important, for example, may be applied to different purposes the same data control systems, these systems are designed separately by default when the data must be consistent. If there are inconsistencies, the interaction of these systems is difficult to predict. SM algorithm therefore has its practical significance.

SM MCU algorithm and communication nodes can be found in the transmission process of the controllers fault, and communications in the Byzantine fault, this is a very important feature. We find that the SM algorithm as the signature of the two functions have relayed the same time to send and receive authorized repeat, so that the master node can reach consensus with other nodes, which greatly improves the ease of achieving fault tolerance. Generally speaking, the Byzantine fault difficult to find, when the backup framework for mutual enter a Byzantine fault, you can not determine who is wrong and can not achieve fault - silent, or even possible conflicting output. By SM algorithm for mutual backup architecture can achieve the consistency of input, and then borrowed some of the other nodes in the system computing power can be used to achieve the framework for mutual back output equivalent to the consistency of the triple backup. This has enormous economic significance.

Based on analysis of the SM bus communication methods and the implementation of the algorithm for CAN SM algorithm in some of the problems. CAN is a mature technology, low cost, try to extend their application is granted. Redundant braking systems on each other, CAN still be applied. Reference [3] referred to a development in 2002 was the time-triggered protocol, it now appears, may be the FlexRay protocol. There is no doubt, FlexRay big bandwidth advantage, but research remains to be its depth, such as its transmission clock synchronization depends on the level of analog cable jump, jump glitch may change location, thus undermining the basis of the agreement clock.

SM algorithm needs to forward the message signature, require the involvement of MCU, FlexRay or CAN on, in order to achieve high-level protocol or middleware software, time-consuming to increase the intermediate links, increase the time jitter and the possibility of interference, Overall efficiency is not high, is not ideal, it is best to use specialized hardware, this is something worth exploring.

74LS38 datasheet74LS38 suppliers74LS38 Price

All right © 2010-2016 Certificate