今天这个缺口很有意思! - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>Stock

今天这个缺口很有意思!

今天这个缺口很有意思!# Stock

c*72010-07-13 07:07

1 楼

PM please thanks
我拿30，返 15给你
can phone call before referral

l*n2010-07-13 07:07

2 楼

已经被两个面试官考到这个知识点了。。。无奈new grads，不懂啊。。。
就是data set很skew，一个map task产生的key很多，造成一个reduce task handle不
了，怎么办？
partition function怎么写合适？还有个面试官说，要我写个combiner。。。不会写啊
，肿么办。。。
被一个奇葩公司店面，一上来就问我这个new grads两道system design题。。我想吐血
啊。。。

f*z2010-07-13 07:07

3 楼

什么时候补?
今天不补就明天补.
补了后才能再涨,
ER的利好1天就走完了,下面怎么半?
如果明天直接跳空低开,牛牛就完了.
如果今天补了,反而是好事.

c*72010-07-13 07:07

4 楼

只对没有开过checking的人有效，SSN required

i*62010-07-13 07:07

5 楼

明显题目不全，没办法回答。你那几个问号是一个题目还是好几个？你能完整的重复一
下原来的题目吗？

g*a2010-07-13 07:07

6 楼

This is a ER season. Today is a sign of bull market. Last ER was selling on
news

z*a2010-07-13 07:07

7 楼

Hi,
Could you refer me to open this account?
Thank you

【在 c*********7 的大作中提到】

: PM please thanks
: 我拿30，返 15给你
: can phone call before referral

f*y2010-07-13 07:07

8 楼

“data set很skew”，这个现象准确描述是，“相同key的record太多”，
如果是这样的话，这个不关partition问题，因为本质上“相同key的“的数据肯定要在
一个reduce里面的，不管你partition怎么写。
那么怎么解决？
1,combiner确实是个方法，比如以word count为例子，
《hello, 1》,《hello, 1》,《hello, 1》可以合并一条《hello, 3》
2,combiner方法不是什么地方都可以使用，有些不能合并的，就不能使用combiner，那
怎么办？
其实没什么好办法，可能你最开始设计上就有问题而导致skew，也许可以通过多轮
mapreduce解决，这个扯远了。
lz以前没搞过，要你设计确实有点为难。
继续努力，加油！

c*72010-07-13 07:07

9 楼

PMed

【在 z******a 的大作中提到】

:
: Hi,
: Could you refer me to open this account?
: Thank you

k*02010-07-13 07:07

10 楼

Use two map-reduce jobs, the first job does a partial aggregation, then use
a second reduce job to do a final aggregation. This is a typical problem.
Also, you can check the Hive system design, which deals with this problem by
using two map-reduce jobs.

b*52010-07-13 07:07

11 楼

how does one do partial aggregation？let's say the map functions creates
like 1 gazillion key "k", so the data skews heavily on key "k", and one
reducer gets 1 gazillion elements, and can't handle it.
how do u do partial aggregation on those 1 gazillion key "k"? aggregate half
of them first? how do u aggregate half of them? where does the other half
go?
我稍微google了一下mapreduce data skew，一些paper好像都是要custom partition
，先估计一下map出来的key的distribution。。。

use
by

【在 k********0 的大作中提到】

: Use two map-reduce jobs, the first job does a partial aggregation, then use
: a second reduce job to do a final aggregation. This is a typical problem.
: Also, you can check the Hive system design, which deals with this problem by
: using two map-reduce jobs.