当前位置:文档之家› COMP5349_2014 Semester1_week4_lecture_04_MapReduce_Hadoop

COMP5349_2014 Semester1_week4_lecture_04_MapReduce_Hadoop

COMP5349_2014 Semester1_week4_lecture_04_MapReduce_Hadoop
COMP5349_2014 Semester1_week4_lecture_04_MapReduce_Hadoop

Outline

!?Foundations: Functional Programming !?MapReduce Programming Model

!?Hadoop Framework

Master'Opera*on''Master$stores$the$state$of$each$map$and$reduce$tasks$$It$receives$intermediate$?le$loca:ons$and$push$them$to$reduce$tasks$incrementally$

Data'Locality'Split$0$and$1$locate$in$the$same$worker$machine,$two$map$tasks$are$assigned$to$this$worker.$Input$data$is$read$locally!$$1$GFS$chunk$may$equal$1$or$splits$$Diagram from the CACM version of the

original MapReduce paper

COMP5349 "Cloud Computing" - 2014 (U. R?hm) Diagram from the original slides by Jeff Dean and Sanjay Ghemawat The$par**on'func*on'put$all$map$output$keys$into$R $region,$in$this$case$R"=2"

and$k2,$k4,k5$is$par::oned$to$region$1$while$k1$and$k3$are$par::oned$to$region$2$$The$default$par::on$func:on$is$hashing$e.g.$$

hash(“key”) mod R

Diagram from Tom White, Hadoop, the definitive Guide, O’reilly, 2009, page 154 COMP5349 "Cloud Computing" - 2014 (U. R?hm)

Communication Between Mappers and Reducers

Diagram from Tom White, Hadoop, the definitive Guide, O’reilly, 2009, page 163

相关主题
文本预览
相关文档 最新文档