ARRY －基本面良好的小biotech (转载) - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>Stock

ARRY －基本面良好的小biotech (转载)

ARRY －基本面良好的小biotech (转载)# Stock

s*n2009-08-14 07:08

1 楼

Suppose you are developing a search engineer. Given a sequence of
numbers as a query string. You need to design algorithm/data structure such
that it can return whether this number sequence belongs to some famous
number sequences. For example, if the query string is “0 1 1 2 3 5”, it
might belong to a Fibonacci sequence. Note that the given sequence may not
always start from the beginning of a known number sequence.
My idea:
For those "famous" number sequences, we may cut them into smaller sequences
(e.g. three numbers each). Then save these sequences into a database (NoSQL
key-value store). When a query comes in, cut it into smaller sequences with
the same length (such 3) and then look up in the database/key-value store.
Any suggestions? Thx!

j*72009-08-14 07:08

2 楼

【以下文字转载自 pennystock 俱乐部】
发信人: jhsph07 (银杏), 信区: pennystock
标题: ARRY －基本面良好的小biotech
发信站: BBS 未名空间站 (Fri Aug 14 15:39:51 2009, 美东)
在RA, 糖尿病和Oncology area都有成功希望的药，公司有研有发，不缺钱。
ARRY-162 九月份出二期实验结果。
整盘许久，盘子小， share outstanding少，好拉。
一个药不好，可以AD.
缺点： Pipeline上的药大多为early/mid-stage的，目前没有大的partnership，等
于是还没有大药厂的背书。

g*y2009-08-14 07:08

3 楼

for each "famous" number sequence, construct a suffix tree for it?

d*o2009-08-14 07:08

4 楼

对每一个famous sequence, 建立一个suffix tree.
然后用该string 去遍历suffix tree

such
sequences
NoSQL
with

【在 s*****n 的大作中提到】

: Suppose you are developing a search engineer. Given a sequence of
: numbers as a query string. You need to design algorithm/data structure such
: that it can return whether this number sequence belongs to some famous
: number sequences. For example, if the query string is “0 1 1 2 3 5”, it
: might belong to a Fibonacci sequence. Note that the given sequence may not
: always start from the beginning of a known number sequence.
: My idea:
: For those "famous" number sequences, we may cut them into smaller sequences
: (e.g. three numbers each). Then save these sequences into a database (NoSQL
: key-value store). When a query comes in, cut it into smaller sequences with

s*n2009-08-14 07:08

5 楼

用suffix tree的话，对于这种sequence的suffix，它们不重合，比如
0 1 1 2 3 5 8 13 21 34
suffix
34
21 34
13 21 34
8 13 21 34
5 8 13 21 34
3 5 8 13 21 34
2 3 5 8 13 21 34
1 2 3 5 8 13 21 34
1 1 2 3 5 8 13 21 34
0 1 1 2 3 5 8 13 21 34

g*y2009-08-14 07:08

6 楼

那用hashtable +linked list. use linked list to store the sequence. use
hashtable to do lookup. key = number in sequence; value= pointer to the
number in the linked list.

【在 s*****n 的大作中提到】

: 用suffix tree的话，对于这种sequence的suffix，它们不重合，比如
: 0 1 1 2 3 5 8 13 21 34
: suffix
: 34
: 21 34
: 13 21 34
: 8 13 21 34
: 5 8 13 21 34
: 3 5 8 13 21 34
: 2 3 5 8 13 21 34

o*o2009-08-14 07:08

7 楼

"famous" number sequence 大都是无穷多个的，怎么办？

【在 s*****n 的大作中提到】

l*i2009-08-14 07:08

8 楼

treat the sequence as a string and use KMP to match. You have to cut the
famous sequence at some point. Usually it is not a problem as most sequences
are increasing so you know you can stop when the last integer in your
sequence is smaller than the last one in the prefix of a famous sequence.
oeis is a website to do this search quite efficiently, not sure how they
implemented it.

b*d2009-08-14 07:08

9 楼

除了上边说的suffix tree之外，
一个比较粗的想法:
很多“famous” sequence都是指数级增长，所以越界是比较快的（assume，相信几百
万个这样数之后就overflow了，那么存下来也就是几个M而已）。对于很大的数，可以
直接标记是否他们是fabonacii数or not。对比较短的数列，标记几个level的inverted
index，如上边说的3连续，4连续，5连续，6连续。6连续的话就已经是shingle了，算
得上是finger print了，查询可以非常快的。