w*p
3 楼
j*y
4 楼
为啥我听人说,其实人家最关心的是到底处理过多大的数据,否则数据不大,简单的程
序不难写吧。
难的是规模很大,怎么处理的问题。所以说有什么大的数据吗?
【在 w******p 的大作中提到】
: http://jsmapreduce.com/
序不难写吧。
难的是规模很大,怎么处理的问题。所以说有什么大的数据吗?
【在 w******p 的大作中提到】
: http://jsmapreduce.com/
j*y
5 楼
不过这网页确实不错,简单的可以run
【在 w******p 的大作中提到】
: http://jsmapreduce.com/
【在 w******p 的大作中提到】
: http://jsmapreduce.com/
s*r
6 楼
可以自己装个hadoop
如果只是想测试一些简单的python/perl写的mapper/reducer脚本是否work
什么都不用装 linux下通过管道测试就行了
细节可以查阅大象书中hadoop streaming一节
如果只是想测试一些简单的python/perl写的mapper/reducer脚本是否work
什么都不用装 linux下通过管道测试就行了
细节可以查阅大象书中hadoop streaming一节
y*u
7 楼
如果想连连mapreduce算法,下面python script能模拟
MapReduce.py
import json
class MapReduce:
def __init__(self):
self.intermediate = {}
self.result = []
def emit_intermediate(self, key, value):
self.intermediate.setdefault(key, [])
self.intermediate[key].append(value)
def emit(self, value):
self.result.append(value)
def execute(self, data, mapper, reducer):
for line in data:
record = json.loads(line)
mapper(record)
for key in self.intermediate:
reducer(key, self.intermediate[key])
#jenc = json.JSONEncoder(encoding='latin-1')
jenc = json.JSONEncoder()
for item in self.result:
print jenc.encode(item)
wordcount.py
import MapReduce
import sys
"""
Word Count Example in the Simple Python MapReduce Framework
"""
mr = MapReduce.MapReduce()
# =============================
# Do not modify above this line
def mapper(record):
# key: document identifier
# value: document contents
key = record[0]
value = record[1]
words = value.split()
for w in words:
mr.emit_intermediate(w, 1)
def reducer(key, list_of_values):
# key: word
# value: list of occurrence counts
total = 0
for v in list_of_values:
total += v
mr.emit((key, total))
# Do not modify below this line
# =============================
if __name__ == '__main__':
inputdata = open(sys.argv[1])
mr.execute(inputdata, mapper, reducer)
MapReduce.py
import json
class MapReduce:
def __init__(self):
self.intermediate = {}
self.result = []
def emit_intermediate(self, key, value):
self.intermediate.setdefault(key, [])
self.intermediate[key].append(value)
def emit(self, value):
self.result.append(value)
def execute(self, data, mapper, reducer):
for line in data:
record = json.loads(line)
mapper(record)
for key in self.intermediate:
reducer(key, self.intermediate[key])
#jenc = json.JSONEncoder(encoding='latin-1')
jenc = json.JSONEncoder()
for item in self.result:
print jenc.encode(item)
wordcount.py
import MapReduce
import sys
"""
Word Count Example in the Simple Python MapReduce Framework
"""
mr = MapReduce.MapReduce()
# =============================
# Do not modify above this line
def mapper(record):
# key: document identifier
# value: document contents
key = record[0]
value = record[1]
words = value.split()
for w in words:
mr.emit_intermediate(w, 1)
def reducer(key, list_of_values):
# key: word
# value: list of occurrence counts
total = 0
for v in list_of_values:
total += v
mr.emit((key, total))
# Do not modify below this line
# =============================
if __name__ == '__main__':
inputdata = open(sys.argv[1])
mr.execute(inputdata, mapper, reducer)
相关阅读
求住家保姆关于 I-9 form平安夜祝福所有在找工作的XDJM!这是据信啊还是真在process找下家比现在低7%申请H1B的准备期要多长啊?公司年后裁人了,还不错,没有年前裁。有个一天的training。一道面试题OPT申请加急- 请告知California Center Fax number?OPT 可以和 H1B同时申请吗Immediate Software Engineer opennings available for strong问题贴,德州huston居住生活条件怎样?求建议,intern offer 二选一H1b transfer approvedNew opening in Shanghai: Customer Support Engineer搜索的核心技术, 李彦宏和GOOGLE的LARRY谁是先行者? (转载)面试的时候一旦兴奋起来就逻辑混乱,我的致命伤如何申请同一公司不同职位H1B: who should the checks be made to?