Got asked several times in interviews. lines = LOAD 'sample.txt' AS (line:chararray); words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; grouped = GROUP words BY word; wordcount = FOREACH grouped GENERATE group, COUNT(words); DUMP wordcount;
k*u
4 楼
er..
B*g
5 楼
-- Hive queries for Word Count drop table if exists doc; -- 1) create table to load whole file create table doc( text string ) row format delimited fields terminated by 'n' stored as textfile; --2) loads plain text file --if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc table) load data local inpath '/home/trendwise/Documents/sentiment/doc_data/ wikipedia' overwrite into table doc; -- Trick-1 -- 3) wordCount in single line SELECT word, COUNT(*) FROM doc LATERAL VIEW explode(split(text, ' ')) lTable as word GROUP BY word;
【在 c***z 的大作中提到】 : Got asked several times in interviews. : lines = LOAD 'sample.txt' AS (line:chararray); : words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; : grouped = GROUP words BY word; : wordcount = FOREACH grouped GENERATE group, COUNT(words); : DUMP wordcount;
p*8
6 楼
我要的幸福
l*n
7 楼
现在pig越来越少人用,hive,impala成主流了
【在 B*****g 的大作中提到】 : -- Hive queries for Word Count : drop table if exists doc; : -- 1) create table to load whole file : create table doc( : text string : ) row format delimited fields terminated by 'n' stored as textfile; : --2) loads plain text file : --if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc : table) : load data local inpath '/home/trendwise/Documents/sentiment/doc_data/
OK, Scala version: val countTable = myText.split("\W+").groupBy(identity).mapValues(_.length) PS: split(" ") would work for interview purpose; also there are two \ before W