新人求教一个HADOOP的问题# DataSciences - 数据科学
m*e
1 楼
刚接触到HADOOP。工作中碰到一个问题。
公司每天都要给几个M的账号打分(behavior score). 分数存在 model/date/score/
part-00000 里。
for example:
/model/2015-03-01/score/part-00000
/model/2015-03-02/score/part-00000
/model/2015-03-03/score/part-00000
.....
data in each file : customer_id,score
I need to get daily scores for about 200K accounts for 6 months. any easy
way to do this?
Thanks!
公司每天都要给几个M的账号打分(behavior score). 分数存在 model/date/score/
part-00000 里。
for example:
/model/2015-03-01/score/part-00000
/model/2015-03-02/score/part-00000
/model/2015-03-03/score/part-00000
.....
data in each file : customer_id,score
I need to get daily scores for about 200K accounts for 6 months. any easy
way to do this?
Thanks!