Pig word count - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>DataSciences - 数据科学

Pig word count

Pig word count# DataSciences - 数据科学

g*n2014-10-06 07:10

1 楼

亲戚家的孩子想来美读高中，怎么办？
亲戚家的孩子想来美读高中，有什么办法让这边的高中接收？怎么把这孩子弄过来？
望有经验的不吝赐教。多谢

p*82014-10-06 07:10

2 楼

4月12日，孙燕姿被曝已于3月31日登记结婚。孙燕姿对感情事一向低调，对交往5
年的荷兰籍男友纳迪姆保护有加，但近期一改常态向媒体大方表示“有空就去结一下婚
”，似是对结婚的暗示。
纳迪姆热爱运动，曾经参加铁人三项赛，他原本在香港金融业工作，与孙燕姿交往
后转任职新加坡。2009年孙燕姿来台湾开演唱会时，纳迪姆也神秘随着孙燕姿的家人一
同前来，住在同间饭店。
在外界讶异于孙燕姿“闪电结婚”之余，其实她去年11月来台湾参加活动时，被媒
体问到婚期时就曾说：“明年吧，有时间就结！”果然是按照计划走。

c*z2014-10-06 07:10

3 楼

Got asked several times in interviews.
lines = LOAD 'sample.txt' AS (line:chararray);
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;
grouped = GROUP words BY word;
wordcount = FOREACH grouped GENERATE group, COUNT(words);
DUMP wordcount;

k*u2014-10-06 07:10

4 楼

er..

B*g2014-10-06 07:10

5 楼

-- Hive queries for Word Count
drop table if exists doc;
-- 1) create table to load whole file
create table doc(
text string
) row format delimited fields terminated by 'n' stored as textfile;
--2) loads plain text file
--if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc
table)
load data local inpath '/home/trendwise/Documents/sentiment/doc_data/
wikipedia' overwrite into table doc;
-- Trick-1
-- 3) wordCount in single line
SELECT word, COUNT(*) FROM doc LATERAL VIEW explode(split(text, ' ')) lTable
as word GROUP BY word;

【在 c***z 的大作中提到】

: Got asked several times in interviews.
: lines = LOAD 'sample.txt' AS (line:chararray);
: words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;
: grouped = GROUP words BY word;
: wordcount = FOREACH grouped GENERATE group, COUNT(words);
: DUMP wordcount;

p*82014-10-06 07:10

6 楼

我要的幸福

l*n2014-10-06 07:10

7 楼

现在pig越来越少人用，hive，impala成主流了

【在 B*****g 的大作中提到】

: -- Hive queries for Word Count
: drop table if exists doc;
: -- 1) create table to load whole file
: create table doc(
: text string
: ) row format delimited fields terminated by 'n' stored as textfile;
: --2) loads plain text file
: --if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc
: table)
: load data local inpath '/home/trendwise/Documents/sentiment/doc_data/

p*t2014-10-06 07:10

8 楼

结了。。。。

B*g2014-10-06 07:10

9 楼

sql必胜，哈哈

【在 l******n 的大作中提到】

: 现在pig越来越少人用，hive，impala成主流了

r*o2014-10-06 07:10

10 楼

孙燕姿结婚了，wow，我还挺喜欢她的《天黑黑》和《风筝》

c*z2014-10-06 07:10

11 楼

damn, I am loving Pig

M*g2014-10-06 07:10

12 楼

她那个口音还是很特别的。。。挺好玩的

【在 r********o 的大作中提到】

: 孙燕姿结婚了，wow，我还挺喜欢她的《天黑黑》和《风筝》

c*z2014-10-06 07:10

13 楼

OK, Scala version:
val countTable = myText.split("\W+").groupBy(identity).mapValues(_.length)
PS: split(" ") would work for interview purpose; also there are two \
before W

r*o2014-10-06 07:10

14 楼

我挺喜欢她的，我有朋友见过她本人，说黑黑的，很瘦小

【在 M*****g 的大作中提到】

: 她那个口音还是很特别的。。。挺好玩的