spark is slower than java Mapreduce --scala big bulls pls advise - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>Programming - 葵花宝典

spark is slower than java Mapreduce --scala big bulls pls advise

spark is slower than java Mapreduce --scala big bulls pls advise# Programming - 葵花宝典

v*r2014-11-19 08:11

1 楼

spark beginner trying out the buzz tech
input 200GB uncompressed data file stored in hdfs
37 worker nodes, each has 24 cores
using java map reduce, 6-8 minutes
using spark, 37 minutes, 2 18 minute-stage
"lightning fast cluster computing, 100x faster" ???!!!!
Big bulls please advise!
#sortMapper sort values for each key, then do some iteration for the grouped
values
text = sc.textFile(input,1776) #24*37*2
text.map(mapper).filter(lambda x: x!=None).groupByKey().map(sortMapper).
filter(lambda x: x[1]!=[]).saveAsTextFile(output)
sc.textFile and saveAsTextFile is very slow
configuration as follows:
conf = SparkConf().set("spark.executor.memory","24g").set("spark.driver.
memory","16g").set("spark.serializer", "org.apache.spark.serializer.
KryoSerializer")

N*n2014-11-19 08:11

2 楼

It's "lightening fast" only when it's in-memory, otherwise there's
no magic here.

w*m2014-11-19 08:11

3 楼

Shuffle了没有

★ 发自iPhone App: ChineseWeb 8.7

【在 v*****r 的大作中提到】

: spark beginner trying out the buzz tech
: input 200GB uncompressed data file stored in hdfs
: 37 worker nodes, each has 24 cores
: using java map reduce, 6-8 minutes
: using spark, 37 minutes, 2 18 minute-stage
: "lightning fast cluster computing, 100x faster" ???!!!!
: Big bulls please advise!
: #sortMapper sort values for each key, then do some iteration for the grouped
: values
: text = sc.textFile(input,1776) #24*37*2