
related sites
Cloud Computing
Google and I.B.M. Join in ‘Cloud Computing’ Research
http://www.nytimes.com/2007/10/08/technology/08cloud.html
Cluster Computing and MapReduce
http://code.google.com/edu/content/submissions/mapreduce-minilecture/listing.html
Data-Intensive SuperComputing (DISC)
http://www.cs.cmu.edu/~bryant/pubdir/cmu-cs-07-128.pdf
Hadoop
http://hadoop.apache.org/
Scalable Computing with Hadoop
http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/yahoo-sds.pdf
[GArchitecture,2008] Google Architecture. http://highscalability.com/google-architecture.
Google Code University, http://code.google.com/edu/parallel/index.html
The course of MapReduce at MIT, http://mr.iap.2008.googlepages.com/home
The course of Mass Data Processing Technology on Large Scale Clusters at Tsinghua Univ.,
http://net.pku.edu.cn/~course/cs402/resource/mdp_tsinghua/readings.htm
Introduction to Distributed System Design, http://code.google.com/edu/parallel/dsd-tutorial.html
Introduction to Parallel Programming and MapReduce,
http://code.google.com/edu/parallel/mapreduce-tutorial.html
Figure 3: How the final multi-node cluster will look like.
a map function that generates values and associated keys from each document,
a reduction function that describes how all the data matching each possible key should be combined.
http://highscalability.com/tags/bigtable
Greg Linden points to a new Google article MapReduce: simplified data processing on large clusters. Some interesting stats: 100k MapReduce jobs are executed each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.
Running Hadoop On Ubuntu Linux (Single-Node Cluster)
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
Running Hadoop On Ubuntu Linux (Multi-Node Cluster)
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
James Governor’s take on how to tell if something isn’t cloud computing.
If you peel back the label and its says “Grid” or “OGSA” underneath… its not a cloud.
If you need to send a 40 page requirements document to the vendor then… it is not cloud.
If you can’t buy it on your personal credit card… it is not a cloud
If they are trying to sell you hardware… its not a cloud.
If there is no API… its not a cloud.
If you need to rearchitect your systems for it… Its not a cloud.
If it takes more than ten minutes to provision… its not a cloud.
If you can’t deprovision in less than ten minutes… its not a cloud.
If you know where the machines are… its not a cloud.
If there is a consultant in the room… its not a cloud.
If you need to specify the number of machines you want upfront… its not a cloud.
If it only runs one operating system… its not a cloud.
If you can’t connect to it from your own machine… its not a cloud.
If you need to install software to use it… its not a cloud.
If you own all the hardware… its not a cloud.
—-
Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the expectation of failure. Imagine asking people, "If the probability of something happening is one in 1013, how often would it happen?" Common sense would be to answer, "Never." That is an infinitely large number in human terms. But if you ask a physicist, she would say, "All the time. In a cubic foot of air, those things happen all the time."
———
Everyone, when they first build a distributed system, makes the following eight assumptions. These are so well-known in this field that they are commonly referred to as the "8 Fallacies".
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn’t change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing 
Java JDK 5.0学习笔记






