好几天没看见新题了 - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>JobHunting - 待字闺中

好几天没看见新题了

好几天没看见新题了# JobHunting - 待字闺中

g*e2012-10-17 07:10

1 楼

二爷赏几个吧

a*x2012-10-17 07:10

2 楼

没包子不看

【在 g**e 的大作中提到】

: 二爷赏几个吧

p*22012-10-17 07:10

3 楼

我现在啥也不会了。以后要好好跟你学习了。

【在 g**e 的大作中提到】

: 二爷赏几个吧

g*e2012-10-17 07:10

4 楼

我先来一个吧，如果有人面试碰到了请给我包子。
假设有一堆海量数据，每一条信息的内容如下

e.g.

查询条件大概是这样的:

返回结果是符合条件的人名。
限制条件，范围查询的时候,比如state和state， city和city都是相邻或相近的。
比如查询多个state，就查NY,NJ,PA。不存在查询WA, NY的情况。city, zip也是一样
数据是海量的，读操作很高(say > 10,000/sec)，写操作很低(<100/sec)
1. 如何设计一个数据结构存储这些信息，使得上述查询尽可能快？
2. 你的数据结构如何灵活对应增加删减一列或者多列信息，比如在City后面增加一个
Town, name后面加个gender？
回答用trie的统统不及格。

i*e2012-10-17 07:10

5 楼

我也来一个。如何设计facebook里面的search box的search suggestion function。
返回top 8 records。
rules defined as follows:
priority #1 : 在你的friend list 里面
priority #2 : 在你的second degree friend list 里面
priority #3 ：当前hot query search like ipad3, iphone5 之类的.
还有一个要求是不一定是exactly prefix matching. 例如type一个人的middle 或者
是last name, match 的也要出现在search box 里面。

l*82012-10-17 07:10

6 楼

我自己出个简单的题目：
Input是n (1<= n <= 10 millions)个人每天工作的开始时间和结束时间。假设每个人
每天都按照这个时间上下班，不分week day和weekend. 时间格式是24小时制，精确到
分。要求输出一天中有m (0<= m <=n) 个人同时工作的时间段。
例如：
4
9:00 17:30
7:00 16:00
15:00 1:05
19:00 5:00
2
上面给出了4个人(n=4)每天的工作开始和结束时间，要求输出有且只有2个人(m=2)在
同时工作的时间段
output:
9:00 15:00
16:00 17:30
19:00 1:05

【在 g**e 的大作中提到】

: 二爷赏几个吧

g*e2012-10-17 07:10

7 楼

construct an interval tree and search for nodes with m leaf nodes？
海量数据就map reduce，先排序然后再各个node上建子树然后查询

【在 l*********8 的大作中提到】

: 我自己出个简单的题目：
: Input是n (1<= n <= 10 millions)个人每天工作的开始时间和结束时间。假设每个人
: 每天都按照这个时间上下班，不分week day和weekend. 时间格式是24小时制，精确到
: 分。要求输出一天中有m (0<= m <=n) 个人同时工作的时间段。
: 例如：
: 4
: 9:00 17:30
: 7:00 16:00
: 15:00 1:05
: 19:00 5:00

g*e2012-10-17 07:10

8 楼

你这个跟google suggest有什么不同吗？除了priority要求
friend list和secondary friend list都是提前populate好的，hot query search可以
马上pull

。

【在 i***e 的大作中提到】

: 我也来一个。如何设计facebook里面的search box的search suggestion function。
: 返回top 8 records。
: rules defined as follows:
: priority #1 : 在你的friend list 里面
: priority #2 : 在你的second degree friend list 里面
: priority #3 ：当前hot query search like ipad3, iphone5 之类的.
: 还有一个要求是不一定是exactly prefix matching. 例如type一个人的middle 或者
: 是last name, match 的也要出现在search box 里面。
:

w*x2012-10-17 07:10

9 楼

中枪了, 一眼就想到树了, 不用树咋做?

【在 g**e 的大作中提到】

: 我先来一个吧，如果有人面试碰到了请给我包子。
: 假设有一堆海量数据，每一条信息的内容如下
:
: e.g.
:
:
:
:
: 查询条件大概是这样的:
:

i*e2012-10-17 07:10

10 楼

这个是我上周5 F onsite的时候被问道。大概的要求是如何design整个system. 从用
户的页面到的search box到后台的server call的整个流程。还有就是大数据吧。。
我想跟google search suggestion 没啥区别.

【在 g**e 的大作中提到】

: 你这个跟google suggest有什么不同吗？除了priority要求
: friend list和secondary friend list都是提前populate好的，hot query search可以
: 马上pull
:
: 。

g*e2012-10-17 07:10

11 楼

加了个限制条件

【在 w****x 的大作中提到】

:
: 中枪了, 一眼就想到树了, 不用树咋做?

c*t2012-10-17 07:10

12 楼

看着像数据库，先猜一下用B+ tree

【在 g**e 的大作中提到】

: 我先来一个吧，如果有人面试碰到了请给我包子。
: 假设有一堆海量数据，每一条信息的内容如下
:
: e.g.
:
:
:
:
: 查询条件大概是这样的:
:

c*t2012-10-17 07:10

13 楼

先猜用suffix tree,再慢慢想设计

。

【在 i***e 的大作中提到】

c*t2012-10-17 07:10

14 楼

n是10 m 数量级，干脆按分钟建一个hash table, size 是 60*24=1440，value是工作
人数，然后过一遍所有人，比如第一个9:00 17:30，就把这段时间对应分钟hashtable
value+1
最后就过一遍hashtable找出所有m人数的分钟数，再按连续合并成时间段返回. O(N)

【在 l*********8 的大作中提到】

l*a2012-10-17 07:10

15 楼

Spatial search有专门的R+tree。
http://en.wikipedia.org/wiki/R%2B_tree

【在 c********t 的大作中提到】

: 看着像数据库，先猜一下用B+ tree

l*82012-10-17 07:10

16 楼

不错！
我自己的思路被interval tree给限制了，没有想到这个方法。

hashtable

【在 c********t 的大作中提到】

: n是10 m 数量级，干脆按分钟建一个hash table, size 是 60*24=1440，value是工作
: 人数，然后过一遍所有人，比如第一个9:00 17:30，就把这段时间对应分钟hashtable
: value+1
: 最后就过一遍hashtable找出所有m人数的分钟数，再按连续合并成时间段返回. O(N)

c*t2012-10-17 07:10

17 楼

还是想不出，
我的想法是来三个trie,(friends, second degree friends, hot query) 然后每个节
点都存最popular的8 records.可是怎么能做到middel, last name match 呢？
求解释？如果有google search suggestion的解释也可以啊

。

【在 i***e 的大作中提到】

c*t2012-10-17 07:10

18 楼

学习了，这行水很深啊

【在 l*****a 的大作中提到】

: Spatial search有专门的R+tree。
: http://en.wikipedia.org/wiki/R%2B_tree

w*x2012-10-17 07:10

19 楼

答案是什么? 讲解一下?

【在 g**e 的大作中提到】

: 我先来一个吧，如果有人面试碰到了请给我包子。
: 假设有一堆海量数据，每一条信息的内容如下
:
: e.g.
:
:
:
:
: 查询条件大概是这样的:
:

c*t2012-10-17 07:10

20 楼

同求，能想到的也就是对 country, state, city, zip 的查询组合按地图顺序建立 B+
tree

【在 w****x 的大作中提到】

:
: 答案是什么? 讲解一下?

g*y2012-10-17 07:10

21 楼

用suffix tree？

【在 c********t 的大作中提到】

: 还是想不出，
: 我的想法是来三个trie,(friends, second degree friends, hot query) 然后每个节
: 点都存最popular的8 records.可是怎么能做到middel, last name match 呢？
: 求解释？如果有google search suggestion的解释也可以啊
:
: 。

e*e2012-10-17 07:10

22 楼

比如第一个9:00 17:30 需要 update hashtable value+1 510次。。。

hashtable

【在 c********t 的大作中提到】

f*42012-10-17 07:10

23 楼

倒排?

【在 g**e 的大作中提到】

: 我先来一个吧，如果有人面试碰到了请给我包子。
: 假设有一堆海量数据，每一条信息的内容如下
:
: e.g.
:
:
:
:
: 查询条件大概是这样的:
:

f*42012-10-17 07:10

24 楼

时间对应一个角度, 每个人的工作时间就是一段圆弧
这样就和clrs习题求圆内相交弦的差不多了吧, 注意处理15:00-1:05的这种

【在 l*********8 的大作中提到】

e*e2012-10-17 07:10

25 楼

Think of start and end time of each time slot as points in x axis. First
find the time slot t when the most people are working and get number of
people for it. We name this number nb;
if nb < m, return none.
if nb >= m,
start from the end of slot t, go to the right till to the last point.
When encounter a point, if this point is the end point, nb = nb - 1, if
this point is the start point, nb = nb + 1, when nb = m, mark this point as
p1, continue to the right, until nb != m, mark this point as p2, keep this
slot (p1, p2) as a record of final results.
start from the beginning of slot t, go to the left till to the first point
, apply the same logic as last step.

【在 l*********8 的大作中提到】

e*e2012-10-17 07:10

26 楼

Thank you for the sharing. It's a good one. Here I'm talking about the data
structure for name matching.
Show ugly.
prefix trie with pointers. When building the trie, don't make one entry on
firstName + lastName, instead make two entries on firstName, lastName
separately. At the end of entry, have a pointer pointing to the other (first
pointing to last name, last pointing to fist name) entry. In pointer, there
is some field marking if the name it pointing is a first or middle or last
name. So when concatenate the names at final stage, it knows where they go.
i.e. Tiger Woods.
-- (root)
...
- -
t w
| |
i o
| |
g o
| |
e d
| |
r s
| |
$pt1 $pt2
$pt1 points to w
$pt2 points to t
Just my 2 cents.

。

【在 i***e 的大作中提到】

e*e2012-10-17 07:10

27 楼

B+ tree, totally.

B+

【在 c********t 的大作中提到】

: 同求，能想到的也就是对 country, state, city, zip 的查询组合按地图顺序建立 B+
: tree