2008-06-28
 

阴雨天,适合在家睡觉,不过一早7点爬起来,看了1个小时乱七八糟新闻 blog 论坛
 
重启整理昨天的思路,继续mapreduce的学习中,不过依然不是很明map和flod的用法,可以做对比例子太少,现有的问题和答案虽然有,但是我不想现在看完,又没例子可以去试验自己所理解的是否正确,两难境地,不过可以暂时往下继续,map的使用属于后期试验部分,待整部分细节整理完结之后再看,也许会明白那些参数是啥意思了。
 
基本上,这几天学习的领悟是
Google自创的Google file system 采用64M为一个chunk,采用1 master多slave的方式,当然多slave中也有随时做master备份的准备
采用mapreduce技术,在多slave中数据挖掘的时候,更加快捷的找到所需的内容
如果与iptv结合,目前可以利用的方向是
1.jump部分,更快找到所需数据所在节点,当然也是更快的找到数据
2.在recover部分,也是一样为了解决更快找到解决数据
 
至于hadoop的streaming部分,待今天学习之后再看又有啥可以利用的长处
Tags: .
2008-06-27

Reduce/Flod (higher-order function)

 

Map (higher-order function)

 

 

update

GHC/Using rules

http://www.haskell.org/haskellwiki/GHC/Using_Rules

related sites


Google and I.B.M. Join in ‘’ Research
http://www.nytimes.com/2007/10/08/technology/08cloud.html
Cluster Computing and MapReduce
http://code.google.com/edu/content/submissions/mapreduce-minilecture/listing.html 
Data-Intensive SuperComputing (DISC)
http://www.cs.cmu.edu/~bryant/pubdir/cmu-cs-07-128.pdf 
Hadoop
http://hadoop.apache.org/
Scalable Computing with Hadoop
http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/yahoo-sds.pdf

[GArchitecture,2008] Google Architecture. http://highscalability.com/google-architecture.

Google Code University, http://code.google.com/edu/parallel/index.html
The course of MapReduce at MIT, http://mr.iap.2008.googlepages.com/home
The course of Mass Data Processing Technology on Large Scale Clusters at Tsinghua Univ.,
http://net.pku.edu.cn/~course/cs402/resource/mdp_tsinghua/readings.htm
Introduction to Distributed System Design, http://code.google.com/edu/parallel/dsd-tutorial.html
Introduction to Parallel Programming and MapReduce,
http://code.google.com/edu/parallel/mapreduce-tutorial.html

 

Figure 3: How the final multi-node cluster will look like.

Tags: .

 

nDistributed File System
nGoogle File System
nHadoop Distributed File System
ndemonstrated on clusters with 2000 nodes
nKosmos Distributed File System
nProgramming Model
nFunctional Language
nMapReduce, HadoopMapReduce
nColumn oriented database
nBigTable, HadoopBase, HyperTable
nHyperTable, 28M rows of data inserted at a per-node write rate of 7mb/sec.
 
__________
 

a map function that generates values and associated keys from each document,

a reduction function that describes how all the data matching each possible key should be combined.

http://highscalability.com/tags/bigtable

Greg Linden points to a new Google article MapReduce: simplified data processing on large clusters. Some interesting stats: 100k MapReduce jobs are executed each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.

Running Hadoop On Ubuntu Linux (Single-Node Cluster)

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29

Running Hadoop On Ubuntu Linux (Multi-Node Cluster)

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

Tags: .
2008-06-26

James Governor’s take on how to tell if something isn’t cloud computing.

If you peel back the label and its says “Grid” or “OGSA” underneath… its not a cloud.
If you need to send a 40 page requirements document to the vendor then… it is not cloud.
If you can’t buy it on your personal credit card… it is not a cloud
If they are trying to sell you hardware… its not a cloud.
If there is no API… its not a cloud.
If you need to rearchitect your systems for it… Its not a cloud.
If it takes more than ten minutes to provision… its not a cloud.
If you can’t deprovision in less than ten minutes… its not a cloud.
If you know where the machines are… its not a cloud.
If there is a consultant in the room… its not a cloud.
If you need to specify the number of machines you want upfront… its not a cloud.
If it only runs one operating system… its not a cloud.
If you can’t connect to it from your own machine… its not a cloud.
If you need to install software to use it… its not a cloud.
If you own all the hardware… its not a cloud.

  —-

Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the expectation of failure. Imagine asking people, "If the probability of something happening is one in 1013, how often would it happen?" Common sense would be to answer, "Never." That is an infinitely large number in human terms. But if you ask a physicist, she would say, "All the time. In a cubic foot of air, those things happen all the time."

———

Everyone, when they first build a distributed system, makes the following eight assumptions. These are so well-known in this field that they are commonly referred to as the "8 Fallacies".

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn’t change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.

http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing

Java JDK 5.0学习笔记

http://book.csdn.net/bookfiles/135/

Tags: .
标题算是引子,完全无音乐细胞,完全不知道谁是谁的我,就在乱听
 
连续听了2周Coldplay新专辑,感觉每首歌都不错,当然每天听都会腻,所以换了口味……直接BT去Coldplay-15CD 搞来15CD,发现也就那么几首比较经典,于是乎放弃Coldplay找朋友推荐新歌
 
 
朋友推荐:
 
Leona_Lewis-SpiritF (这女人嗓子真好)
 
 
—————
 R.E.M.-Accelerate  (这张也不错滴)
 
……
btw,顺带推荐:timbaland_presents_one_republic_-_apologize
 
个人收藏了mp3版
本来是有mv版的…可惜由于心情急切,又是bt又是等等其他对硬盘蹂躏的后果,在提示出现I/O出错之后,36G硬盘直接无法读取数据,重启以后系统大慢,无法只好丢弃了那硬盘到抽屉,mv也一并随着硬盘归去,当然mv不如mp3好,因为mv在看的同时会直接影响欣赏