r*d
2 楼
请问大家:有什么好方法好资料帮助学习Pig Latin? Pig Latin我完全是新手。我在
amazon上找了下,好像没有书评特别好的。
谢谢!
amazon上找了下,好像没有书评特别好的。
谢谢!
c*z
4 楼
You need the following things:
1. An editor, I use sublime2, the cloudera package uses Gedit
2. A cluster with Pig installed at the edge nodes, you can use the VM in the
cloudera package
3. A file transfer to move Pig code from local drive (if you edit locally)
to edge node, I use Winscp, the cloudera package uses Hue
4. A way to run the code at edge node, I use putty, the cloudera package
uses Hue
My work flow: write Pig code locally using sublime2, upload code to edge
node using winscp, run code at edge node using putty.
1. An editor, I use sublime2, the cloudera package uses Gedit
2. A cluster with Pig installed at the edge nodes, you can use the VM in the
cloudera package
3. A file transfer to move Pig code from local drive (if you edit locally)
to edge node, I use Winscp, the cloudera package uses Hue
4. A way to run the code at edge node, I use putty, the cloudera package
uses Hue
My work flow: write Pig code locally using sublime2, upload code to edge
node using winscp, run code at edge node using putty.
s*n
5 楼
AA时候
c*z
6 楼
Also, my work flow using Scala Scalding (i.e. Scala on Hadoop):
1. edit and compile in Intelij, or other IDE, or edit in any text editor and
compile in a terminal (CMD for windows)
setting up Intelij is complicated and out of my scope
2. upload the jar file to edge node, I use winscp
optionally, upload the code to github for version control
3. run the jar file at edge node using putty, with specific input path,
output path and other argument, I save them to an .sh file for reuse (you
can save to .txt file and then copy and paste)
the command to run the jar is something like
hadoop jar myjar.jar packagename.functionname --input "myinputpath/part*" --
output "myoutputpath" --hdfs
4. to get the output to a text file, use something like
hadoop fs -cat "myoutputpath/part*" > myresult.tsv
(I prefer .tsv over .csv because comma can appear in numbers like 133,010
and mess up things)
1. edit and compile in Intelij, or other IDE, or edit in any text editor and
compile in a terminal (CMD for windows)
setting up Intelij is complicated and out of my scope
2. upload the jar file to edge node, I use winscp
optionally, upload the code to github for version control
3. run the jar file at edge node using putty, with specific input path,
output path and other argument, I save them to an .sh file for reuse (you
can save to .txt file and then copy and paste)
the command to run the jar is something like
hadoop jar myjar.jar packagename.functionname --input "myinputpath/part*" --
output "myoutputpath" --hdfs
4. to get the output to a text file, use something like
hadoop fs -cat "myoutputpath/part*" > myresult.tsv
(I prefer .tsv over .csv because comma can appear in numbers like 133,010
and mess up things)
c*8
7 楼
武则天时期会不会好一点
g*t
8 楼
pig好慢好慢
c*y
9 楼
这个时候:直教天下父母心,不重生男重生女。
还有这个:宋废帝给妹妹山阴公主置面首三十。
还有这个:宋废帝给妹妹山阴公主置面首三十。
r*d
14 楼
谢谢大家回复!
update: i found the book 'Programming Pig' especially the first 6 chapters
very helpful - it is not demanding to pick up Pig as yet another tool :)
update: i found the book 'Programming Pig' especially the first 6 chapters
very helpful - it is not demanding to pick up Pig as yet another tool :)
相关阅读
哭求好的spark教程 (转载)Well-funded Startup Data Scientist Position今天签第一个project大数据到底是不是忽悠? (转载)请教个build data analytics engine的问题University of Florida Postdoctoral Research Scientist Biomedical Informaticsdata science的personal statement怎么写?新手求问spark最近有没有DS有关的会议啊?sort a matrix (1M rows x 100 columns) for each row in GPUSSIS教学免费视频分享 (转载)Colah 关于 neural network 的一篇博客Capital One Data Scientist 电面求问~有人知道commonwealth bank of Australia这家吗?请问SAS用什么做Big DataCareer talk --你问我答-Next Tuesday 8PM CDT(May 26) (转载)分享一本data science的书,很多牛人的interview假如想实现 entity recognition, relation extraction这些功能的话,除了GATE, 还有 哪些其它的open source library。问个matlab的并行计算的问题有湾区这边的微信群嘛