big data的潮流来源于实际需求,而处理方法其实来源于BASE (compare to ACID) http://www.johndcook.com/blog/2009/07/06/brewer-cap-theorem-bas "Eric Brewer’s CAP theorem says that if you want consistency, availability, and partition tolerance, you have to settle for two out of three." "It’s harder to develop software in the fault-tolerant BASE world compared to the fastidious ACID world, but Brewer’s CAP theorem says you have no choice if you want to scale up."
BASE 是 Basically Available, Soft State, Eventual Consistency 本身倒是很让人误解,不过你就把它想成 trade consistency for availability. ACID是trade availability for consistency
availability, compared
【在 c******o 的大作中提到】 : big data的潮流来源于实际需求,而处理方法其实来源于BASE (compare to ACID) : http://www.johndcook.com/blog/2009/07/06/brewer-cap-theorem-bas : "Eric Brewer’s CAP theorem says that if you want consistency, availability, : and partition tolerance, you have to settle for two out of three." : "It’s harder to develop software in the fault-tolerant BASE world compared : to the fastidious ACID world, but Brewer’s CAP theorem says you have no : choice if you want to scale up."
BD is mostly used to store the internet JUNK data such as web pages, blogs, comments, thumb-up, etc. It's a big pile but has little value so BD indeed simply stores a big pile of garbage, which is why such data is unstructured to begin w/. BD is merely hype.
Saving is just one part of puzzle, extracting useful info out of it is another, that's why it's called big data analysis and Hadoop et al. is burning hot. Hype or not, there's money to be made and we are talking about trillion dollar business. Internet and smartphone were a hype too. Just a quote from your boss SB. I don't know how many times you need to be proven wrong until you can stop your bullshit. "Now we'll get a chance to go through this again in phones and music players . There's no chance that the iPhone is going to get any significant market share. No chance. It's a $500 subsidized item. They may make a lot of money. But if you actually take a look at the 1.3 billion phones that get sold, I' d prefer to have our software in 60% or 70% or 80% of them, than I would to have 2% or 3%, which is what Apple might get."
【在 N********n 的大作中提到】 : : BD is mostly used to store the internet JUNK data such as web pages, : blogs, comments, thumb-up, etc. It's a big pile but has little value : so BD indeed simply stores a big pile of garbage, which is why such : data is unstructured to begin w/. BD is merely hype.
from what I know the recent BD wave begin from G/F/T (Google BigTable is the root of many NoSQL) Look at how they used it, Google => web indexing, "My Search History", Google Earth, Google Code hosting, Orkut, YouTube, and Gmail Facebook => Inbox Search, Instagram unit, primary big data analytical store, messages and monitoring (still mainly use sharded MySQL though, with a lot of optimization and not really use a lot of relational logic) Twitter =>Analytical data (like Facebook, still mostly customized MySQL based store as backend) As you can see, Google/Facebook using NoSQL to do a lot of critical things, you can bot say it is garbage. But they also use NoSQL with highly customized query engine layer. I agree though, Big Data is really not for small companies, no one really hit the Big Data for small companies Even twitter, not really "big" enough, only Google really used NoSQL in great extent, but of course, Big table is not just NoSQL, and alot times not really used as NoSQL NoSQL不是啥好的名词,太多不同的东西都被称为NoSQL, 其实唯一一样的特点就是“没 有SQL" 对于小公司来说,其实NoSQL很多时候不是用于bigdata, 而是用于“我不需要sql"
【在 N********n 的大作中提到】 : : BD is mostly used to store the internet JUNK data such as web pages, : blogs, comments, thumb-up, etc. It's a big pile but has little value : so BD indeed simply stores a big pile of garbage, which is why such : data is unstructured to begin w/. BD is merely hype.
Applications using M$ stack typically are not big enough to take advantage of NoSQL DBs. I am not surprised M$ people talking NoSQL as hype because it' s something they don't understand.
the store, lot
【在 c******o 的大作中提到】 : from what I know the recent BD wave begin from G/F/T (Google BigTable is the : root of many NoSQL) : Look at how they used it, : Google => web indexing, "My Search History", Google Earth, Google Code : hosting, Orkut, YouTube, and Gmail : Facebook => Inbox Search, Instagram unit, primary big data analytical store, : messages and monitoring (still mainly use sharded MySQL though, with a lot : of optimization and not really use a lot of relational logic) : Twitter =>Analytical data (like Facebook, still mostly customized MySQL : based store as backend)
g*g
81 楼
I don't think any distributed DB will do well on count. You can always use a mixed approach though.
Real time accurate count is different from log aggregation, the latter has neither real time requirement nor high consistency. Cassandra takes a labor to implement a distributed count and still has limitation, I wouldn't use it for accounting purpose. http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf
【在 g*****g 的大作中提到】 : Real time accurate count is different from log aggregation, the latter has : neither real time requirement nor high consistency. : Cassandra takes a labor to implement a distributed count and still has : limitation, I wouldn't use it for accounting purpose. : http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf : : doc
N*n
88 楼
If there's really "trillion" dollar to make then Yahoo would have made it already. They've been using HADOOP since 2006, right? That's 7 years in their hand to deliver. 7 years in tech world feels like a century. If after 7 years they still earn far less than the other tech companies then this HADOOP thing is not as useful as hyped. Like I said it matters not how big a pile of data Hadoop is able to store. If the data is worthless to begin w/ then there's no value to mine from it. Useful data is usually structured.
【在 g*****g 的大作中提到】 : Saving is just one part of puzzle, extracting useful info out of it is : another, : that's why it's called big data analysis and Hadoop et al. is burning hot. : Hype or not, there's money to be made and we are talking about trillion : dollar : business. Internet and smartphone were a hype too. Just a quote from your : boss SB. I don't know how many times you need to be proven wrong until you : can stop your bullshit. : "Now we'll get a chance to go through this again in phones and music players : . There's no chance that the iPhone is going to get any significant market
A*g
89 楼
听起来还真有点道理...
【在 N********n 的大作中提到】 : : If there's really "trillion" dollar to make then Yahoo would have made : it already. They've been using HADOOP since 2006, right? That's 7 years : in their hand to deliver. 7 years in tech world feels like a century. : If after 7 years they still earn far less than the other tech companies : then this HADOOP thing is not as useful as hyped. : Like I said it matters not how big a pile of data Hadoop is able to : store. If the data is worthless to begin w/ then there's no value to : mine from it. Useful data is usually structured.
You have the users, then you have the data, then big data analysis gives you extra values that couldn't be done. yahoo don't have the users to begin with . trillion is the amount for this industry. And company like Rocket fuel already ipo with 5b valuation.
【在 N********n 的大作中提到】 : : If there's really "trillion" dollar to make then Yahoo would have made : it already. They've been using HADOOP since 2006, right? That's 7 years : in their hand to deliver. 7 years in tech world feels like a century. : If after 7 years they still earn far less than the other tech companies : then this HADOOP thing is not as useful as hyped. : Like I said it matters not how big a pile of data Hadoop is able to : store. If the data is worthless to begin w/ then there's no value to : mine from it. Useful data is usually structured.
h*a
93 楼
data有没有用也不是说出来的,要用科学的手段去分析。现在big internet companies 的big data是不是有用,有多有用,其实正是大数据这个领域所要figure out的。简单 下结论是不够的。
【在 N********n 的大作中提到】 : : If there's really "trillion" dollar to make then Yahoo would have made : it already. They've been using HADOOP since 2006, right? That's 7 years : in their hand to deliver. 7 years in tech world feels like a century. : If after 7 years they still earn far less than the other tech companies : then this HADOOP thing is not as useful as hyped. : Like I said it matters not how big a pile of data Hadoop is able to : store. If the data is worthless to begin w/ then there's no value to : mine from it. Useful data is usually structured.
l*G
94 楼
BD is laughable compared to climate forecast data.