两道很有意思的面试题目，大家议一议 - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>JobHunting - 待字闺中

两道很有意思的面试题目，大家议一议

两道很有意思的面试题目，大家议一议# JobHunting - 待字闺中

b*a2013-09-18 07:09

1 楼

A constant incoming requests, each associated with a unique key, estimate
the total amount of unique requests within a period of time. Memory cannot
hold all the keys. Don't save data on disk. Rough estimation will be fine.
这个感觉是Count-min Sketch，但是具体怎么做呢？Count-min Sketch不保存全部原始
key，
另一个也是 A constant incoming requests, 要求给出过去任意一分钟/五分钟/一小
时/一天/一个月/ 的统计数据，如number of requests, number of unique request,
etc.
这个问题更加Open, 怎么实现比较好呢？

M*r2013-09-18 07:09

2 楼

同问。第二题见过到过类似的问题，大意是用户每搜索一次产生一个cookie。
然后统计过去5分钟/一小时/一天的搜索历史。哪位大牛能讲讲思路吗？

d*n2013-09-18 07:09

3 楼

The first one:
each request should have start time/ end time, calculate how many
concurrent requests at a given timestamp, calculate the average latency of a
requests in milliseconds. so the total requests in a period of time =
concurrency * time / average latency.
Second:
Dump the data into a database and query.

,

【在 b*******a 的大作中提到】

: A constant incoming requests, each associated with a unique key, estimate
: the total amount of unique requests within a period of time. Memory cannot
: hold all the keys. Don't save data on disk. Rough estimation will be fine.
: 这个感觉是Count-min Sketch，但是具体怎么做呢？Count-min Sketch不保存全部原始
: key，
: 另一个也是 A constant incoming requests, 要求给出过去任意一分钟/五分钟/一小
: 时/一天/一个月/ 的统计数据，如number of requests, number of unique request,
: etc.
: 这个问题更加Open, 怎么实现比较好呢？

l*n2013-09-18 07:09

4 楼

第一题应该是假设不考虑request持续时间，只管对unique id进行处理，比如1,2,3,2,
4,1,5,6结果是6个。如果是如你所说的考虑起始、终止时间，怎么会有所有id不能都存
下来的问题呢？没被终止的request还在联络中，每个id都是必须存储的。
这题应该用bloomfilter，用n个hash对unique id进行校验，有一个hash的结果位是0就
算新的id，count++。个别新id因为collision会没被计入，但题目正好说了不用精确。

a

【在 d****n 的大作中提到】

: The first one:
: each request should have start time/ end time, calculate how many
: concurrent requests at a given timestamp, calculate the average latency of a
: requests in milliseconds. so the total requests in a period of time =
: concurrency * time / average latency.
: Second:
: Dump the data into a database and query.
:
: ,

j*r2013-09-18 07:09

5 楼

如何定n的大小呢？

2,

【在 l*n 的大作中提到】

: 第一题应该是假设不考虑request持续时间，只管对unique id进行处理，比如1,2,3,2,
: 4,1,5,6结果是6个。如果是如你所说的考虑起始、终止时间，怎么会有所有id不能都存
: 下来的问题呢？没被终止的request还在联络中，每个id都是必须存储的。
: 这题应该用bloomfilter，用n个hash对unique id进行校验，有一个hash的结果位是0就
: 算新的id，count++。个别新id因为collision会没被计入，但题目正好说了不用精确。
:
: a