2010-05-05

http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-theorem/fulltext

Let’s start with a discussion of what causes in databases. The following is at least a partial list:

1) Application . The application performed one or more incorrect updates. Generally, this is not discovered for minutes to hours thereafter. The must be backed up to a point before the offending transaction(s), and subsequent activity redone.

2) Repeatable . The crashed at a processing node. Executing the same transaction on a processing node with a replica will cause the backup to crash. These have been termed Bohr bugs. [2]

3) Unrepeatable . The crashed, but a replica is likely to be ok. These are often caused by weird corner cases dealing with asynchronous operations, and have been termed Heisenbugs [2]

4) Operating system . The OS crashed at a node, generating the “blue screen of death.”

5) A hardware failure in a local . These include memory failures, disk failures, etc. Generally, these cause a “panic stop” by the OS or the . However, sometimes these failures appear as Heisenbugs.

6) A network partition in a local . The LAN failed and the nodes can no longer all communicate with each other.

7) A disaster. The local is wiped out by a flood, earthquake, etc. The no longer exists.

8) A network failure in the WAN connecting clusters together. The WAN failed and clusters can no longer all communicate with each other.

 

很经典的8种分类,甚至包括了地震和洪水…

Tags: ,,,,,,.