B*n
2 楼
难道是因为所有计算都是in memory的?看了databrick 的demo,每个cluster的内存都
是上千G的。
但内存大的话计算显然快呀,这idea不是很简单么?
新手,求科普,谢谢
是上千G的。
但内存大的话计算显然快呀,这idea不是很简单么?
新手,求科普,谢谢
z*g
8 楼
RDD can provide fault tolerance for in-memory intermediate result by only
storing very small amount of data on persistent storage. This is
particularly useful for iterative algorithms, since there is intermediate
result involved. Although in case there is not enough memory, Spark performs
exactly like Hadoop.
【在 s********k 的大作中提到】
: 一直没搞懂这个RDD,到底牛在什么地方
storing very small amount of data on persistent storage. This is
particularly useful for iterative algorithms, since there is intermediate
result involved. Although in case there is not enough memory, Spark performs
exactly like Hadoop.
【在 s********k 的大作中提到】
: 一直没搞懂这个RDD,到底牛在什么地方
z*g
9 楼
There is nothing new about in-memory. The key point is that RDD can achieve
fault tolerance for intermediate computation results without having to
writing the whole data back to disk.
fault tolerance for intermediate computation results without having to
writing the whole data back to disk.
相关阅读
Svn vs git感觉flink出来之后,hadoop就显得不怎么再需要了Linux Makefile: How to include cpp files in subfolder for (转载)GIT如何跟别人share repo?Re: Silk Road老板被判终身 (转载)今天bloomberg tv访谈linusFlink可以contributePython 3.x是不是步子太大扯到蛋了?C++并发和Java并发有多大区别?java 链表里面dummy node 一问?谢谢学scala从akka入手就可以了发现现在还有好多公司用红宝石呢当年号称的PSD一家比一家烂,全是垃圾型startup。 (转载)APP泡沫即将破灭(转载)写ruby又中一招有公司用Haskell吗?hn首页文章,is heron killing storm?建议马工们有机会多搞信息安全、安全开发方面的东西 (转载)问个php的问题[吐槽]抽象真是不一定好