2008-10-31

韩国产牛肉……一人份20000

大街上的一般烤猪肉,一人份才3000不到

味道还不错,直接导致后来再吃烤猪肉的时候吃不下去……烤猪肉太难吃了 –_-! 又油腻,又多肥肉,瘦肉一烤又硬

那的煮花生味道还真不错,大家看我吃那么多,我说我小的时候最爱吃的零食就是这个了,嫩的花生煮了之后吃起来很爽的

后来的된장탕还不错就是腌的发酵大豆汤,不过汤里面竟然有牛肉块,味道不错,牛肉块也很不错……

晚上吃饭那会一点点精神都没,一句话都不想说,连续一周这么熬还真受不住,哎,不过谁叫我读计算机的呢? 看周围的人又有几个不熬夜的……

下午把11月的日程给list了下,发现要干的事情不比10月少,10月主要是2篇论文压住头,11月论文我看至少得有1篇了,当然还有另外一个更加夸张的文档制作,另外9月份看了一半的书还得继续往下看,还有几本书10月份计划里面的但是光顾着论文没什么时间来看书了。

下午又把latex拿出来稍微熟悉了下,好久不用了,差点忘记掉,打算11月份写论文和做那个夸张的文档改latex写去,latex弄中文不爽,英文弄出来还是很爽滴

又周末了,下个月的事情明天开始就得开始了,基本上一天能动的时间都安排的满满的,又是一个充实的11月,只是不能再跟最近这周这么拼了,正常的一天12小时还是可以保证,只是白天的效率得想办法看有没什么方式来提高了,否则真的是浪费时间 加油~

顺便推荐2首歌

Helios的eingya专辑中 <Paper Tiger> 和 <Coast Off>

昨晚听了一晚上,今天听了一天,风格属于Dance型的,想起来昨晚就好笑,听着这2首歌,用着两台电脑两个键盘,当然两台电脑在两张桌子上放的,我来回跑着按键盘的回车键和y键来克隆系统,也不知道是在跳舞还是在克隆系统 -_-

Tags: ,,,,,.

上次Hadoop User Group Meeting (Oct.Meeting)

看到一半,网络断掉无法再继续,忙过这么半个月所以这次继续下去

———-

继续

接着来比较3个办法的差异,用图形来表示 很直观:

首先是 Repartition Join 传统的Join方式:a log block->local scan –>sort –> shuffle

接着是relpicated join 分为2个部分 user data 先建立Hashtable 而a log block 则是local scan 然后跟hashtable进行hash probe(怎么进行???)

然后是semijoin出场了  a log block –> local scan 1和local scan 2 然后scan1 和user table进行结合,再建立hash table 然后再跟scan2 进行 hash probe (同样的问题怎么进行???)

然后是实验部分

User data : each user has a unique id, records from 50k to 500m 100bytes/ record, pull a 5-byte attribute out of record

Log data:100GB (1 billion records): N-to-1 match, referencing 0.1%, 1%. 10% of users . User IDs Zipf-distributed (s=0.5)

Environment: 10nodes cluser, 1Gb switch. 8 core,16GB memory, 8 disks , 8Mappers and Reducers concurrently per node .128MB block size.

结果展示了semijoin的强劲性能……

看来会议是一系列的……

紧接着由于进行第二部分的报告

———————————————————————————

Jaql [Jea k o l ]->pipes

Unix pipes for the JSON model

Kevin Beyer, Vuk Ercegovac, Eugene, Shekita, Jun Rao, Ning Li, Sandeep Tata

IBM Almaden Research Center  http://code.google.com/jaql  http://jaql.org

貌似有2个中国人 强的~ 好像是data minning?? 不过大概看了下应该是类似的数据检索 –_-!! 真的是要针对于特殊的数据结构,进行不同方式的检索,

举了一个例子 A query is a pipeline

Soure –> operator –> operator –>sink

$people –file..;

$greetings =file,..;

—————-One Map Job——————

$people    –>   filter $/type = ‘friendly’ –>map{hello: $.name} –> write $greetings;

read imput       find friendly people             keep just name           write output

Operations listed in natural order vs last operation first

———————————————-
Partition

partition one or more imputs. send each individual partition through a sub-pipe . merge the results

$People

–> filter by $.birthdate < data (‘1990-01-01’)

–>partition by $t = $.type   /partition the older people by type

    |- aggregate {type:$t,n:count($)} –|; // aggregate per partition

Per partition sub-pipe

$people  -> partition by $.type      |-sort by $.rating       –>   top 100 –>myBestMatches($,3)-|

      partition people by type//sort partition by rating//keep just the first 100 in partion//find best machines per partition

基本上就是从Unix的pipes的命令引申过来的,借鉴mapreduce的一些概念所做的一些功能类似的 hmm

————————————

第三个内容: Experiences Moving A PB Data-center     Sriram Rao  http://www.linkedin.com/pub/0/324/120这个人的主页……

疯了,这个印度英语,基本不懂在讲啥

技术上讲的不是很多,他只是说了他的KFS比HDFS传输速度要快,比如HDFS要6个小时,他的KFS只要3个小时……

备份服务器的一些基本时间消耗倒是讲的蛮清楚,复制2->6份需要20小时 而6-7只需要3小时??而且讲了蛮多搞笑的事情,比如移动机器之前先全部关机冷却一个小时,比如烂了3台机器,下面的听众也一直在笑,估计这个人讲完提问的人不会太尖锐……呵呵

听了30多分钟,稍微可以听懂点了 太恐怖了……我一直在想象那种很厚嘴唇的大舌头英语发音,hmm

Tags: ,,,,,.

发现白天实在效率很低,也许真的是美国人为啥比较发达的原因吧?人效率高的那会,都是美国的白天?我们都在效率高的时候睡觉所以啥也干不了??

所以今晚又熬了一夜,花了3个小时找clone的办法,花了一个小时想如何破解学校那个装补丁才可以上网的绝招,学校那监控补丁没linux版本…所以,hmm 只能郁闷或者想绝招了,当然我还是很快想到了上网的绝招滴

至于clone的办法可以参看 Clonezilla-Clone System

当然前面3个小时都在玩这个 How to clone your bootable Ubuntu install to another drive 被这个误解了……那个该死的dd复制个系统要半个小时都不会复制完,实在是不如clonezilla 2分钟一个系统

另外一个 clone系统不错的好处就是,之前由于要配置ssh的publish key和private key,如果有15台机器的话,要来回复制c15 2大概 hmm 我个人理解要有 15x 14 x 13 x 12 x 11 x 10等等等之类次的来回cp了,估计如果搞这个可以搞一天,而且逻辑上哪台跟哪台没弄肯定会记错…… 现在好了 key都是一样的,只要原始版本的linux的 ssh server安装之后 ssh localhost 可以,这样基本上ssh 网内 clone出来的机器都不没问题了

晚上在机房把所有的机器都拆了,硬盘全部拆出来,真的是拆的手指痛,有15台机器,每台机器1个硬盘,拆掉15次,然后装上15次,再拆掉15次,再装15次,一共60次……电源线+数据线……真是bt。估计今天白天大家要是看到肯定会疯掉的,估计这帮韩国人计算机系的没几个人会这么拆电脑来玩,都是疯子电脑坏掉就直接丢掉等教授换新的……

搞了一晚上,基本上流程是在clone之前写好了计划,把hosts都已经修改过,slave1 192.168.0.1 这样的方式依次类推 然后每个机箱上都贴了一个ip地址最后一位数

唯一遗憾的就是hadoop 的site-config.xml那暂时没配置,等下次上机做实验的时候我再教那些人一块来搞吧

暂时写这么多,等我详细试验过之后,如果没什么问题,我会把今晚详细配置15台机器组一个cluster系统的办法和流畅写一下的

先不去睡觉了,把我那台ubuntu升级到8.10去…之前升级过一次beta版本 差点疯掉…这回正式版应该会好一些吧,不过之前beta版本gnome界面超漂亮……看这回有没跟之前的beta不同咯

今晚又要借着我们实验室那小伙去澳大利亚语言研修的东风,又有饭吃了,昨天是土木系那边教授请客吃生鱼片,今天就是我们教授请吃烤肉去了……hmm

Tags: ,,,,,,,.

今晚不睡觉的目的,其实是要把机房的15台机器全部装好Ubuntu 8.04并且装好hadoop所需的环境设置

刚刚用那个dd 跟下午一个毛病,花费很久很久很久

Replace with the correct paths for your drives if they differ. It’s going to take a while, so grab a book or start up a movie. Maybe go to bed.

从这几句话看……就知道有多么的慢了

Clonezilla 实在太强了,只需要2分钟即可备份一份系统出来,而且自动修复grub……基本上就是傻瓜式的操作,省时又省力……只恨没早点找到这个,不过现在时间还来得及,hmm

http://www.clonezilla.org/

Tags: ,,,,.

还是无视这篇文章

Clonezilla – Clone System Ubuntu

看这篇吧

下午搞了一下午这个玩意

当然,直接看网上的教程是用dd if=/dev/sda of=/dev/sdb 这种方法,但是每次copy之后就会出现

dd: reading `/dev/scd1': Input/output error
1053920+0 records in
1053920+0 records out
这种错误,然后直接把硬盘接到另外一台机器开启之后,GRUB error 18或者error 2
于是很无奈的没办法解决……
刚刚又仔细的在网上看了下教程

If you’ve ever wanted to completely clone your Ubuntu install, with all of the tweaks, files you’ve downloaded and changes you’ve made to it, there’s a fairly simple way to do this. This is great if you want a complete backup, or if you’re looking to move your system to a newer (read: bigger, faster, stronger) hard drive or even just to clone your install to other machines with the same hardware.

We’ll be using the terminal (Applications-> Accessories-> Terminal) and the dd command to do this. You’ll also need to have your second disk up and running when we get going. You can either have it installed and mounted internally or use an external disk enclosure and USB or Firewire. (Note: Doing this via USB 1 will be excruciatingly slow!)

You’ll also want to either be cloning your hard drive to one of the exact same size, or if you have a larger disk, make a partition of the same size on it and clone to that. Then, use an Ubuntu liveCD to change the partition size (System-> Administration-> Partition Editor). Lastly, you’ll need a LiveCD.

On to the good stuff. Got both disks plugged in? Good! Now you’ll need to figure out which disk you are copying from and which disk you are copying too. In your terminal, type:

df -h

Look first for the partition that’s mounted at root, or ‘/’. Here’s what my root partition looks like.

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 71G 46G 22G 68% /

If you’re using a SATA drive it will appear like that. IDE should be /dev/hda1. See that slash below the Mounted on? That’s the root drive.

Now you’ve got to locate the drive you’re copying too. The same df -h command will do the trick. Look for another disk mounted on /dev/****. If you’re not sure what you’re looking for, first run the df -h command without your second disk mounted. Then plug the 2nd disk in (be sure to shut down if you’re doing this inside your machine and not via USB or FireWire) and run the df -h command again. The newest partition that appears is what you’re looking for!

So if your current root partition is /dev/sda1 and the partition you’re going to copy to is /dev/sdb1 (a USB mounted drive) here’s the command you’ll need to type in your terminal:

sudo dd if=/dev/sda1 of=/dev/sdb1

Replace with the correct paths for your drives if they differ. It’s going to take a while, so grab a book or start up a movie. Maybe go to bed.

Once it’s complete, you’ve got yourself a brand new copy of your current Ubuntu install. You’re not quite done yet though. Now you’ve got to install Grub on your new disk so you can boot from it. Make sure your new disk is attached to your machine and your old disk is unplugged and boot into the Ubuntu LiveCD.

Once your machine boots up, open up a terminal session and type:

sudo grub

Grub will launch and give you the grub> prompt. Here, type:

find /boot/grub/stage1

You should see something come back that looks like hd(0,0). Jot that down, you’ll need it in a second.

Now, still in the grub> prompt, type:

root hd(0,0)

You’ll put in whatever result you go above – it may be different than hd(0,0).

Once that completes, type:

setup (hd0)

Even if you got a result that differs from hd(0,0) above.

Type:

quit

And you’re out of grub. Restart your machine, removing the LiveCD and you should be up and running on your new hard drive. You may also encounter a problem on your first boot where the system will try to scan your hard drive for bad sectors. If that fails, you’ll find yourself in a root terminal session. Just type:

fsck

Let the disk check finish and you should be good to go.

这篇真的不错,基本原理是一样的,只是用dd的时候用Live CD来启动系统,而不是我之前那样直接启动到系统里面再用系统去copy系统……这样可能有些东西在运行,无法copy,然后这个教程还讲到 copy完成之后,还要再修改新搞的硬盘的grub设置……怪不得我没设置,自然会出错 -_-

Tags: ,,,,,,.

本打算今晚继续来一个跟昨晚一样的一睡12个小时,不过走回家的路上,又是想法一堆,无奈又闪回实验室了

今晚打算搞一个실습계회서(实习计划书)当然不是我们中文里面的实习,韩语的实习,应该就是实际操作的意思

这个学期我教授开的cloud computing课,其实也就是为了我开的,当然我也得为教授多考虑考虑

上周上课的时候,教授说 那本cloud computing的课本技术性的东西太少了,都是介绍一些基本的web2.0的应用,比如google docs类似的,还有日程表,时间管理之类的……

具体哪本书嘛 http://www.douban.com/subject/3120765/

Cloud Computing

就是这本了,强烈建议不要买,20多美元的书,简直就是一本广告书……不知道作者收没收那些网站钱的……

好回归正题 继续写课程报告书

中文的草稿我在路上已经打好了:

一、实习目的

       hadoop平台上MapReduce framework 应用

二、实习方法

     1. 机房上机操作

     2. 机房现今有15台机器,上课有7人,可以分3个组,每个组可以使用3-5台不等的机器

三、实习内容

     1. hadoop环境设置

     2. MapReduce 应用

       wordcount / pi 计算 / k-means 算法 / 矩阵计算 

     3. Nutch Search 操作

四、实习效果/实习目标

    1. 理解MapReduce 实验原理

    2. 理解MapReduce 使用方法

    3. 理解Software as a Services (SAAS)

    4. 个人project

五、 实习前期准备工作

    1. DISP(Data-Intensive Supercomputing) 介绍

    2. MapReduce 基本介绍以及应用领域
        Natural Language Processing (NLP), Machine Learning, Data Minning, 矩阵计算 等等

 

韩语版本:

Cloud Computing 수업실습계획서

一、목적

hadoop 의 MapReduce framwork 위한 Algorithms 실습

二、실습방법

1.전산실에 컴퓨터 이용

2.전산실에 15대컴퓨터있음

수업구성원 7명 3팀으로 실습을 진행한

三、실습내용

1.hadoop 환경설정

2.MapReduce 실행

a. wordcount

b. Pi 계산

c. k-means algorithms 실행

d. Matrix 계산

등..

3. Nutch 사용방법 (a Opensource for search engine)

四、실습목표

1.MapReduce 원리 이해

2.MapReduce 사용방법 이해

3.Software as a Services (SAAS) 이해

4. 개인 Project

五、실습준비

1.DISP(Data-Intensive Supercomputing) 소개

2.MapReduce 사용 Item: 자연언어처리,Machine Learning, Data Minning, Data Clustering 등

Tags: ,,,.