Redian新闻
>
Hadoop Spark 学习小结[2014版] (转载)
avatar
Hadoop Spark 学习小结[2014版] (转载)# DataSciences - 数据科学
o*e
1
Last time Japan did this was after Tokyo Earthquake 1923. Japan has not
learned its lessons. It will just repeatedly blame everything on China and
attack China everytime it gets into a mess of their own making.
http://www.democracynow.org/2014/1/15/shock_doctrine_in_japan_s
Quote from the full transcript:
"AMY GOODMAN: You talk about a Japanese shock doctrine.
KOICHI NAKANO: Right. The state secrecy law that was passed in December last
year, just a month ago, basically two years after the big earthquake and
tsunami and the nuclear power accident, that still continues to literally
kind of shake Japan, and in the climate of anxiety and insecurity, the
government basically is pushing in the classic sort of Naomi Klein kind of
way of shock doctrine. And for the Japanese, it is particularly worrisome
because it reminds us of what happened before the Second World War, actually
, when Tokyo was destroyed by a huge earthquake in 1923. And the peace
preservation law that eventually led to the birth of state secret police and
the brutality of the military regime was also enacted two years right after
the big earthquake that destroyed Tokyo back in the 1920s. So, the parallel
is quite spooky.
AMY GOODMAN: And the parallel now with Fukushima, what it would mean in a
cover-up of what’s been happening? And we’re not talking about the past
now—
KOICHI NAKANO: Exactly.
AMY GOODMAN: —though it’s almost three years ago.
KOICHI NAKANO: Exactly.
AMY GOODMAN: Because this is continuing to unfold.
KOICHI NAKANO: Exactly. The continuous contamination of water, of ocean, of
the soil, and the continuous danger with the spent fuels in the nuclear
reactors in Fukushima—I think there are lots of concerns, and citizens are
trying to know the truth. But I think the state secrecy law is potentially
going to make it easier for the government to cover up information."
Americans need to say no to TPP, just Japanese's giant backdoor towards
American jobs, weapons and freedom to start another pacific war.
http://www.mitbbs.com/article_t/CivilSociety/8047.html
avatar
z*e
2
【 以下文字转载自 JobHunting 讨论区 】
发信人: dongfeiwww (在路上), 信区: JobHunting
标 题: Hadoop Spark 学习小结[2014版]
关键字: Hadoop,Big Data,Spark
发信站: BBS 未名空间站 (Sun Aug 17 12:28:34 2014, 美东)
基于很多朋友希望更新这个学习资料,我就尽力按我的积累补充,are you ready, 享
受技术饕餮大餐
#Hadoop
Hadoop社区依然发展迅速,2014年推出了2.3,2.4, 2.5 的社区版本,比如增强
Resource Manager HA,
YARN Rest API, ACL on HDFS...
http://hadoop.apache.org/releases.html
这个是Hadoop project member and committee, 里面好多来自Hortonworks,不过也有
不少国人加入了,
都是未来的希望啊。
http://hadoop.apache.org/who.html
# Spark
Spark今年大放溢彩,Spark简单说就是内存计算(或者迭代式计算,DAG计算,流式计算
)框架,
MapReduce因效率低下大家经常嘲笑, Spark号称性能超Hadoop百倍,算法实现仅有其1
/10或1/100
Reynold 作为Spark核心开发者,介绍
http://www.csdn.net/article/2013-04-26/2815057-Spark-Reynold
http://www.csdn.net/article/2014-08-07/2821098-6-sparkling-feat
起源于2010年Berkeley AMPLab,发表在hotcloud上
https://www.usenix.org/legacy/events/hotcloud10/tech/full_papers/Zaharia.pdf
BTW: 这个实验室非常厉害,做大数据,云计算,跟工业界结合很紧密,比如Twitter也
Berkeley开了门课程
http://blogs.ischool.berkeley.edu/i290-abdt-s12/
还有个BDAS (Bad Ass)引以为傲: https://amplab.cs.berkeley.edu/software/
在2013年,这些大牛出动把Berkeley AMPLab的人拉出去成立了Databricks,半年就做
了2次summit
参会1000人,根据CTO说 Spark新增代码量活跃度今年远远超过了Hadoop本身,马上要
推出商业化产品Cloud
Spark核心数据结构:
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for
In-Memory Cluster Computing
https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf
Spark目前是1.0.2最新版本:https://spark.apache.org/docs/1.0.2/
目前还有一些子项目,比如 Spark SQL, Spark Streaming, MLLib, Graphx
如;http://spark.apache.org/streaming/
工业界也引起广泛兴趣,国内Taobao, baidu也开始使用:
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark
还有一些第三方的项目基于Spark上面
Shark - Hive and SQL on top of Spark
MLbase - Machine Learning research project on top of Spark
BlinkDB - a massively parallel, approximate query engine built on top of
Shark and Spark
GraphX - a graph processing & analytics framework on top of Spark (GraphX
has been merged into Spark 0.9)
Apache Mesos - Cluster management system that supports running Spark
Tachyon - In memory storage system that supports running Spark
Apache MRQL - A query processing and optimization system for large-scale,
distributed data analysis, built on top of Apache Hadoop, Hama, and Spark
OpenDL - A deep learning algorithm library based on Spark framework. Just
kick off.
SparkR - R frontend for Spark
Spark Job Server - REST interface for managing and submitting Spark jobs on
the same cluster
Apache Spark支持4种分布式部署方式,分别是Amazon EC2, standalone、spark on
mesos和 spark on YARN
至于如何入门,还是得好好看官方文档,上面有入门,搭建环境,Summit上的视频也是
http://spark-summit.org/2014/
也有个training视频:
http://spark-summit.org/2014/training
今年的Summit 回顾
http://www.csdn.net/article/2014-07-17/2820713
今年最叫好的demo是Dtabricks Cloud, 把twitter上面实时收集的数据做作为machine
learning素材,
用类似IPython notebook,可视化呈现惊艳,而搭建整个sampling系统就花了20分钟!
http://databricks.com/cloud
最后CSDN上面也有个Spark专栏,大家可以多去看看
spark.csdn.net
avatar
l*u
3
re

last

【在 o**********e 的大作中提到】
: Last time Japan did this was after Tokyo Earthquake 1923. Japan has not
: learned its lessons. It will just repeatedly blame everything on China and
: attack China everytime it gets into a mess of their own making.
: http://www.democracynow.org/2014/1/15/shock_doctrine_in_japan_s
: Quote from the full transcript:
: "AMY GOODMAN: You talk about a Japanese shock doctrine.
: KOICHI NAKANO: Right. The state secrecy law that was passed in December last
: year, just a month ago, basically two years after the big earthquake and
: tsunami and the nuclear power accident, that still continues to literally
: kind of shake Japan, and in the climate of anxiety and insecurity, the

avatar
s*r
4
赞!

【在 z****e 的大作中提到】
: 【 以下文字转载自 JobHunting 讨论区 】
: 发信人: dongfeiwww (在路上), 信区: JobHunting
: 标 题: Hadoop Spark 学习小结[2014版]
: 关键字: Hadoop,Big Data,Spark
: 发信站: BBS 未名空间站 (Sun Aug 17 12:28:34 2014, 美东)
: 基于很多朋友希望更新这个学习资料,我就尽力按我的积累补充,are you ready, 享
: 受技术饕餮大餐
: #Hadoop
: Hadoop社区依然发展迅速,2014年推出了2.3,2.4, 2.5 的社区版本,比如增强
: Resource Manager HA,

avatar
d*3
7
Thanks for sharing.
avatar
c*g
8
哪些公司用得上这个啊?是不是只有大公司才用?
avatar
m*1
9
Spark 和 Hapdoop 有什么不同呀? 最近刚刚听说spark, 比较好奇
avatar
m*e
10
FG不用,L不知道,二流的软更不会用,aws会提供平台,亚麻自己用不用不知道,就算
用也是极少数组
估计只有三流公司会用,不是因为好用,而是里面的人要给简历加分以后跳槽
Amp lab的东西作为学术灌水很好找发考题,学生搞搞很好。工业应用还差太远

【在 c******g 的大作中提到】
: 哪些公司用得上这个啊?是不是只有大公司才用?
avatar
n*3
11
spark performance very good ah,
why you said 工业应用还差太远? All the startups are talking about it,
FG不用 is mostly due to very likely they have similar stuff already,
does G use hadhoop? see how popular hadoop is ...
the spark is the future hadoop

【在 m******e 的大作中提到】
: FG不用,L不知道,二流的软更不会用,aws会提供平台,亚麻自己用不用不知道,就算
: 用也是极少数组
: 估计只有三流公司会用,不是因为好用,而是里面的人要给简历加分以后跳槽
: Amp lab的东西作为学术灌水很好找发考题,学生搞搞很好。工业应用还差太远

avatar
z*3
12
re
spark上的lib才是真正的big data
hadoop那个只能算是分布式file system和crud而已

【在 n*****3 的大作中提到】
: spark performance very good ah,
: why you said 工业应用还差太远? All the startups are talking about it,
: FG不用 is mostly due to very likely they have similar stuff already,
: does G use hadhoop? see how popular hadoop is ...
: the spark is the future hadoop

avatar
T*u
13
学习SPARK跟党走!
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。