database startup の广告~ (不是骗子) - 未名空间MITBBS历史存档

database startup の广告~ (不是骗子)# DataSciences - 数据科学

S*72014-08-24 07:08

1 楼

小弟我所在的公司叫做 LightMiner Systems，是坐落在Palo Alto的一个小小的
startup
我们的产品是类似 Vertica 的 data analytic platform ，简单来说就是包括一个
database，一个 web based workspace 可以运行 SQL 和 Rstudio，一些重新写的
paralleled machine learning R function
相比 Vertica，我们的 database 做 Query 更快，而且便宜一些（公司目前需要
sales 来做 user case 用于第二轮的 fund raising，所以卖良心价）我手上有公司的
white
paper，里面有和其他公司产品比较的 test case。就我看到的数
据，我们比 Vertica 有几百倍的 speedup。
公司十几个人就我是中国人。他们多少喜欢吹，而且我们外包的销售团队都是吹嘘派的
。所以我相信老板是找了差距最大的 test case 写进 white paper。虽然我目前很菜，
很多概念都没搞懂，但如果需要，我会相对客观的提供有用的信息。
有兴趣的可以和我 follow up 呀，我给你们看 white paper~
在这行没有很多小伙伴，所以也借这个机会想认识大家，交些朋友（以抱其大腿）
我的工作邮箱是 [email protected]
(function(){try{var s,a,i,j,r,c,l,b=document.getElementsByTagName("script");l=b[b.length-1].previousSibling;a=l.getAttribute('data-cfemail');if(a){s='';r=parseInt(a.substr(0,2),16);for(j=2;a.length-j;j+=2){c=parseInt(a.substr(j,2),16)^r;s+=String.fromCharCode(c);}s=document.createTextNode(s);l.parentNode.replaceChild(s,l);}}catch(e){}})();
/* ]]> */

下面是广告本人，写成英语以便大家可以转给别人看~
Our current solution is an integrated platform to perform big data discovery
and data science
It includes:
1. proprietary in-memory columnar database that can handle up to a 30TB
data model in a single node
2. simple user interfaces of SQL and RStudio accessible from the web
browser
3. a library of super-charged machine learning algorithms in R
Features:
1. fast query speed – proprietary in-memory columnar database uses high
RAM and PCIe NAND uniquely to achieve the fastest query response times that
are up to 500 times faster than comparable solutions in the market today.
There is no need to pre-optimize any queries – all ad-hoc queries perform
at same highest speeds.
2. large data models – RStudio and our supercharged stats algorithms can
run against 30TB data models in our high-memory single-node environment.
Returns accurate results in seconds and minutes, not hours and days –
allowing for more frequent model iterations.
3. high performance computation setting – the algorithms have been
optimized wherever possible to run in parallel across the multi-core
architecture, and the best combination of compilers and libraries are used
to achieve highest compute speeds.
4. fastest communication between RStudio and the database – the read/
write
speed maxes out at the “limit speed” of the hardware – we achieve over 2M
IOPS across a single server.
Target Business Users:
1. those who perform a lot of complex SQL queries
2. those who transport huge amounts of data from/into database
3. those who want to run predictive analytics and modeling against very
large data sets
4. R users who need to perform research experiments interactively over
several model iterations