Cloudera 面经（电面＋ onsite） - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>JobHunting - 待字闺中

Cloudera 面经（电面＋ onsite）

Cloudera 面经（电面＋ onsite）# JobHunting - 待字闺中

r*y2015-11-04 08:11

1 楼

这是楼主第二次onsite他家，希望这次能有好结果吧。
面的组是内部维护hadoop和数据的组。
第一次电面，hiring manager，纯聊天，简历。谈得还不错。于是就有了接下来的下一
步。
第二次电面，依然是问简历，相关工作经历。主要问了问对开源项目的理解，尤其是他
家的impala。还有avro, thrift, nifi, hbase也都问了一些。
下一步是做了个oa，codility的oa，不难，三题，第三题时间不够，第二题有个小bug
，修了以后就提交了。
接下来是onsite，每轮一个小时。
onsite第一面，是个很资深的engineer，还是详细问简历，之前做的project的
architecture，要在白板上画出来。最后题问题的时候问cloudera在这方面也是不是这
样处理的，对方说是很相似的设计。
onsite第二面，大组的manager。详细聊聊hdfs，以及实时data ingestion进hdfs这方
面的设计。主要是考察系统设计以及对开源项目的了解。
onsite第三面，一个刚从ops转到dev的engineer。主要问linux的方方面面，我坦承说
这方面只会基本操作。然后继续问hadoop设计，都答得还不错。最后聊到他从ops转dev
的感受，感觉聊得还是挺开心。
onsite第四面，资深的一个engineer，初期就加入了。全部是coding，先写一个map，
再写一个reduce，然后用这两个函数写average，也就是算平均值。然后如果多线程算
平均值，我答了executor service，thread pool这些。除了多线程部分写的是伪代码
，其它地方都是白板写java。
总的感受：面他家算法题很少，也不难，但是一定要准备多线程，尤其是java中的多线
程，并行部分。系统设计和对开源框架的了解很重要，他家问的很多。楼主两次onsite
他家，都聊得很开心，感觉气场很符合，希望这次能有好结果吧，发个面经攒攒人品。

f*b2015-11-04 08:11

2 楼

请问楼主是什么背景？

t*t2015-11-04 08:11

3 楼

我日，现在看你们面筋，发现我都不会啊。去面全都得fail

b*52015-11-04 08:11

4 楼

我他妈的全会。。。我OA后，人家理都不理我

a*u2015-11-04 08:11

5 楼

下周一我也2次onsite，竟然让现场写map reduce啊，卧槽，move on

b*52015-11-04 08:11

6 楼

写个map reduce java function很难么？？！

【在 a***u 的大作中提到】

: 下周一我也2次onsite，竟然让现场写map reduce啊，卧槽，move on

a*u2015-11-04 08:11

7 楼

应该不难，只是我自己平时没写过。

【在 b**********5 的大作中提到】

: 写个map reduce java function很难么？？！

j*82015-11-04 08:11

8 楼

太难了
对于俺这种从来没用过的

【在 b**********5 的大作中提到】

: 写个map reduce java function很难么？？！

e*a2015-11-04 08:11

9 楼

did u answer correctly all questions?

b*52015-11-04 08:11

10 楼

calculate average using pig:
assuming input.txt is something like 'n' delimited
1.0
2.0
3.0
myinput = LOAD 'input.txt' as (A:double); // (1.0)(2.0)(3.0)
grouped = GROUP myinput ALL; // (all: {(1.0)(2.0)(3.0)})
avg = FOREACH grouped GENERATE AVG(grouped.myinput);

【在 j*****8 的大作中提到】

: 太难了
: 对于俺这种从来没用过的

n*32015-11-04 08:11

11 楼

现在大家都用spark , who site map reduce anymore?

【在 b**********5 的大作中提到】

: calculate average using pig:
: assuming input.txt is something like 'n' delimited
: 1.0
: 2.0
: 3.0
: myinput = LOAD 'input.txt' as (A:double); // (1.0)(2.0)(3.0)
: grouped = GROUP myinput ALL; // (all: {(1.0)(2.0)(3.0)})
: avg = FOREACH grouped GENERATE AVG(grouped.myinput);

n*32015-11-04 08:11

12 楼

现在大家都用spark , who site map reduce anymore?

【在 b**********5 的大作中提到】

b*52015-11-04 08:11

13 楼

no map reduce, no spark...
there are a lot of commonality between these big data technologies...
when i was at a spark tutorial thingy, and the speaker talked about how
spark distribute jobs across cluster, i am like, isn't it the same thing as
storm, you got nimbus serving as the master, giving tasks to different
workers, and the workers spins a thread to execute the subtask...
and then u read about the cassandra, and its topology aware replication
strategy, and i am like, isn't it similar to HDFS rack aware replication as
well?
so yeah, u may get a different API, but everything is based off big table,
map reduce...that's why certain people from google or backtype were the real
smart engineer, not east asian WSN who can solve a leetcode problem...

【在 n*****3 的大作中提到】

: 现在大家都用spark , who site map reduce anymore?

r*y2015-11-04 08:11

14 楼

new grad，今年刚毕业的ms

【在 f*******b 的大作中提到】

: 请问楼主是什么背景？

f*b2015-11-04 08:11

15 楼

楼主之前面的是他家的什么组？有面筋吗

【在 r******y 的大作中提到】

: new grad，今年刚毕业的ms

f*b2015-11-04 08:11

16 楼

楼主之前面的是他家的什么组？

【在 r******y 的大作中提到】