Redian新闻
>
spark is slower than java Mapreduce --scala big bulls pls advise
avatar
spark is slower than java Mapreduce --scala big bulls pls advise# Programming - 葵花宝典
v*r
1
spark beginner trying out the buzz tech
input 200GB uncompressed data file stored in hdfs
37 worker nodes, each has 24 cores
using java map reduce, 6-8 minutes
using spark, 37 minutes, 2 18 minute-stage
"lightning fast cluster computing, 100x faster" ???!!!!
Big bulls please advise!
#sortMapper sort values for each key, then do some iteration for the grouped
values
text = sc.textFile(input,1776) #24*37*2
text.map(mapper).filter(lambda x: x!=None).groupByKey().map(sortMapper).
filter(lambda x: x[1]!=[]).saveAsTextFile(output)
sc.textFile and saveAsTextFile is very slow
configuration as follows:
conf = SparkConf().set("spark.executor.memory","24g").set("spark.driver.
memory","16g").set("spark.serializer", "org.apache.spark.serializer.
KryoSerializer")
avatar
N*n
2
It's "lightening fast" only when it's in-memory, otherwise there's
no magic here.
avatar
w*m
3
Shuffle了没有

★ 发自iPhone App: ChineseWeb 8.7

【在 v*****r 的大作中提到】
: spark beginner trying out the buzz tech
: input 200GB uncompressed data file stored in hdfs
: 37 worker nodes, each has 24 cores
: using java map reduce, 6-8 minutes
: using spark, 37 minutes, 2 18 minute-stage
: "lightning fast cluster computing, 100x faster" ???!!!!
: Big bulls please advise!
: #sortMapper sort values for each key, then do some iteration for the grouped
: values
: text = sc.textFile(input,1776) #24*37*2

avatar
b*l
4
没有 worker mem 24g x 37 是数据量的4倍

【在 w********m 的大作中提到】
: Shuffle了没有
:
: ★ 发自iPhone App: ChineseWeb 8.7

avatar
b*l
5
做磁盘读写也不能比java mr 慢吧

【在 N********n 的大作中提到】
: It's "lightening fast" only when it's in-memory, otherwise there's
: no magic here.

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。