一道大数据题,求最优解。# JobHunting - 待字闺中
l*z
1 楼
给两个文件,employee file 和 department file,其结构如下:
employee file:
employee_id, employ_name,department_id
.
.
.
employee_id, employ_name,department_id
department file:
department_id, department_name, manager_id
.
.
.
department_id, department_name, manager_id
要求生成一个输出文件:
employee_id, employ_name,manager_id,department_id,number_of_employee_in_this
_department
简单的解法是scan一遍employee file 就可以count每个部门的人数。
问题是输入的文件很大,不能全部load到内存。如何实现?
不考虑mapreduce啥的。
employee file:
employee_id, employ_name,department_id
.
.
.
employee_id, employ_name,department_id
department file:
department_id, department_name, manager_id
.
.
.
department_id, department_name, manager_id
要求生成一个输出文件:
employee_id, employ_name,manager_id,department_id,number_of_employee_in_this
_department
简单的解法是scan一遍employee file 就可以count每个部门的人数。
问题是输入的文件很大,不能全部load到内存。如何实现?
不考虑mapreduce啥的。