那个快？ - 未名空间MITBBS历史存档

那个快？# Java - 爪哇娇娃

m*r2013-03-14 07:03

1 楼

我有一个40个entry的string array。现在需要implement contains。不知道直接
scan快还是放在map里面快。谁来说说？感觉还是放在map里面快点。但是这个
threshold到底在哪里？

b*l2013-03-14 07:03

2 楼

折中一下，用treemap

S*C2013-03-14 07:03

3 楼

40 entries? It doesn't make any difference, don't bother.

【在 m****r 的大作中提到】

: 我有一个40个entry的string array。现在需要implement contains。不知道直接
: scan快还是放在map里面快。谁来说说？感觉还是放在map里面快点。但是这个
: threshold到底在哪里？

m*r2013-03-14 07:03

4 楼

at the end of the day, it probably doesn't matter. but i was just wondering
since the issue came up.

【在 S**********C 的大作中提到】

: 40 entries? It doesn't make any difference, don't bother.

g*g2013-03-14 07:03

5 楼

Suffix tree最快，但你这数量级，是吃饱了撑的。到4M级别，差别就很大。

【在 m****r 的大作中提到】

: 我有一个40个entry的string array。现在需要implement contains。不知道直接
: scan快还是放在map里面快。谁来说说？感觉还是放在map里面快点。但是这个
: threshold到底在哪里？

m*r2013-03-14 07:03

6 楼

我日。
我的问题就是，在这个数量级，那个快？
如果只有三个，肯定直接scan快。对吧。如果4m， suffix tree快？hashmap快？
反正比linear scan快。
但是中间，有个时候， linear scan 变成不快了。对吧？这个时候在那里？

【在 g*****g 的大作中提到】

: Suffix tree最快，但你这数量级，是吃饱了撑的。到4M级别，差别就很大。

g*g2013-03-14 07:03

7 楼

40个都是suffix tree快，只不过差不了太多。

【在 m****r 的大作中提到】

: 我日。
: 我的问题就是，在这个数量级，那个快？
: 如果只有三个，肯定直接scan快。对吧。如果4m， suffix tree快？hashmap快？
: 反正比linear scan快。
: 但是中间，有个时候， linear scan 变成不快了。对吧？这个时候在那里？

p*i2013-03-14 07:03

8 楼

答案是： depends
姐给你分析一下：
先说linear scan，这个容易
空间： one table with 40 entries. 40*sizeof （yourobject）
时间：
如果查找不成功，你需要40 个 index＋＋操作外加40个比较操作： 40 ＋＋
operations and 40 compare operations
如果成功， averagely 你需要 40/2 ＝20 个index ＋＋ operations and 20 compare
operations

【在 m****r 的大作中提到】

: 我日。
: 我的问题就是，在这个数量级，那个快？
: 如果只有三个，肯定直接scan快。对吧。如果4m， suffix tree快？hashmap快？
: 反正比linear scan快。
: 但是中间，有个时候， linear scan 变成不快了。对吧？这个时候在那里？

g*g2013-03-14 07:03

9 楼

This is wrong. Both the hotspot JDK and OpenJDK implementation uses a native
O(nm) algorithm for contains operation. Not a Boyer–Moore string which is
O(n).
So worst case can be 40 * sizeof(search key) * average_sizeof(keys)

compare

【在 p****i 的大作中提到】

: 答案是： depends
: 姐给你分析一下：
: 先说linear scan，这个容易
: 空间： one table with 40 entries. 40*sizeof （yourobject）
: 时间：
: 如果查找不成功，你需要40 个 index＋＋操作外加40个比较操作： 40 ＋＋
: operations and 40 compare operations
: 如果成功， averagely 你需要 40/2 ＝20 个index ＋＋ operations and 20 compare
: operations

p*i2013-03-14 07:03

10 楼

然后来看hashmap
hashmap的特点就是用空间换时间
重点就是你准备用多大的空间来换你的时间呢？
而且这个也跟你的key密切相关
举个简单的例子，如果你的key是integer， range 1-100
那最简单的hash就是建立一个空间为100个entry 的hash table
直接用key作index
这样你的查找时间无论是miss还是hit，都只有一个compare operation而已
稍微变化一下这个例子，如果你key的range是1-10000呢？
你是否还会愿意用一个空间为10000个entry的hashtable来储存你仅仅40个的数据？
如果说，你不愿意，那你需要在hash function上下功夫，
generally speaking， hash function通常的形势是： Key part 1 * prime number A
+ Key part 2 * prime number B +...
通过认真设计hash function来达到节约一定空间的目的
但是，不是没有代价的，代价就是这个hash function的复杂度。
不考虑collision
这时候查找的cost，无论miss还是hit都是：一次hash function的运算 cost ＋一
次compare 的cost
在这个简单的例子里：通常一个hashfunction的运算cost已经超过了你的20次＋＋
and 20 次compare的cost

compare

【在 p****i 的大作中提到】

: 答案是： depends
: 姐给你分析一下：
: 先说linear scan，这个容易
: 空间： one table with 40 entries. 40*sizeof （yourobject）
: 时间：
: 如果查找不成功，你需要40 个 index＋＋操作外加40个比较操作： 40 ＋＋
: operations and 40 compare operations
: 如果成功， averagely 你需要 40/2 ＝20 个index ＋＋ operations and 20 compare
: operations

p*i2013-03-14 07:03

11 楼

我Java不熟，就是high level， theoretical的给他分析一下
而且我不太理解：你的这个说的是空间还是时间？

native
is

【在 g*****g 的大作中提到】

: This is wrong. Both the hotspot JDK and OpenJDK implementation uses a native
: O(nm) algorithm for contains operation. Not a Boyer–Moore string which is
: O(n).
: So worst case can be 40 * sizeof(search key) * average_sizeof(keys)
:
: compare

p*i2013-03-14 07:03

12 楼

上面是general speaking，
回到你的问题本身，如果object本身是string的话，
重点就是看hash string －－》 index 的cost
和compare two strings的cost了
这个对java熟的人可以具体分析下
好了，我也要问个问题了，大家帮忙看下

【在 p****i 的大作中提到】

: 然后来看hashmap
: hashmap的特点就是用空间换时间
: 重点就是你准备用多大的空间来换你的时间呢？
: 而且这个也跟你的key密切相关
: 举个简单的例子，如果你的key是integer， range 1-100
: 那最简单的hash就是建立一个空间为100个entry 的hash table
: 直接用key作index
: 这样你的查找时间无论是miss还是hit，都只有一个compare operation而已
: 稍微变化一下这个例子，如果你key的range是1-10000呢？
: 你是否还会愿意用一个空间为10000个entry的hashtable来储存你仅仅40个的数据？

r*s2013-03-14 07:03

13 楼

要考虑节约空间的话
用bloom filter
还是要看具体要求。

【在 p****i 的大作中提到】

: 然后来看hashmap
: hashmap的特点就是用空间换时间
: 重点就是你准备用多大的空间来换你的时间呢？
: 而且这个也跟你的key密切相关
: 举个简单的例子，如果你的key是integer， range 1-100
: 那最简单的hash就是建立一个空间为100个entry 的hash table
: 直接用key作index
: 这样你的查找时间无论是miss还是hit，都只有一个compare operation而已
: 稍微变化一下这个例子，如果你key的range是1-10000呢？
: 你是否还会愿意用一个空间为10000个entry的hashtable来储存你仅仅40个的数据？

w*z2013-03-14 07:03

14 楼

试一下，不就知道了？用啥hash ,存啥value 都有关系。

【在 m****r 的大作中提到】

: 我日。
: 我的问题就是，在这个数量级，那个快？
: 如果只有三个，肯定直接scan快。对吧。如果4m， suffix tree快？hashmap快？
: 反正比linear scan快。
: 但是中间，有个时候， linear scan 变成不快了。对吧？这个时候在那里？

m*r2013-03-14 07:03

15 楼

contains() for what?

native
is

【在 g*****g 的大作中提到】

: This is wrong. Both the hotspot JDK and OpenJDK implementation uses a native
: O(nm) algorithm for contains operation. Not a Boyer–Moore string which is
: O(n).
: So worst case can be 40 * sizeof(search key) * average_sizeof(keys)
:
: compare

g*g2013-03-14 07:03

16 楼

说的是JDK里 String.contains是一个O(nm)时间复杂度的实现，不是最好的算法。

【在 m****r 的大作中提到】

: contains() for what?
:
: native
: is

m*r2013-03-14 07:03

17 楼

yes. i understand all this. but again, the question i am asking is, at how
many strings does the linear scan becomes slower.

【在 p****i 的大作中提到】

: 然后来看hashmap
: hashmap的特点就是用空间换时间
: 重点就是你准备用多大的空间来换你的时间呢？
: 而且这个也跟你的key密切相关
: 举个简单的例子，如果你的key是integer， range 1-100
: 那最简单的hash就是建立一个空间为100个entry 的hash table
: 直接用key作index
: 这样你的查找时间无论是miss还是hit，都只有一个compare operation而已
: 稍微变化一下这个例子，如果你key的range是1-10000呢？
: 你是否还会愿意用一个空间为10000个entry的hashtable来储存你仅仅40个的数据？

m*r2013-03-14 07:03

18 楼

why do i need contains()? i would used equals.

【在 g*****g 的大作中提到】

: 说的是JDK里 String.contains是一个O(nm)时间复杂度的实现，不是最好的算法。

g*g2013-03-14 07:03

19 楼

你自己说 "现在需要implement contains"。
完全匹配简单，hashmap性能就不错。

【在 m****r 的大作中提到】

: why do i need contains()? i would used equals.

m*r2013-03-14 07:03

20 楼

我再日。 contains 是针对这个40个string的list说的。不是per string。

【在 g*****g 的大作中提到】

: 你自己说 "现在需要implement contains"。
: 完全匹配简单，hashmap性能就不错。

m*r2013-03-14 07:03

21 楼

看来我的语言障碍很大。不适合xiecode。

【在 m****r 的大作中提到】

: 我再日。 contains 是针对这个40个string的list说的。不是per string。

p*i2013-03-14 07:03

22 楼

你去研究一下java default string hash function
然后回来update我们一下吧

how

【在 m****r 的大作中提到】

: yes. i understand all this. but again, the question i am asking is, at how
: many strings does the linear scan becomes slower.

c*e2013-03-14 07:03

23 楼

恩，我也觉得hashmap可以解决楼主的问题，既然楼主要快的那个。hashmap其实是
hashtable,就是o(1),這個速度绝对快。

【在 p****i 的大作中提到】

: 然后来看hashmap
: hashmap的特点就是用空间换时间
: 重点就是你准备用多大的空间来换你的时间呢？
: 而且这个也跟你的key密切相关
: 举个简单的例子，如果你的key是integer， range 1-100
: 那最简单的hash就是建立一个空间为100个entry 的hash table
: 直接用key作index
: 这样你的查找时间无论是miss还是hit，都只有一个compare operation而已
: 稍微变化一下这个例子，如果你key的range是1-10000呢？
: 你是否还会愿意用一个空间为10000个entry的hashtable来储存你仅仅40个的数据？

f*r2013-03-14 07:03

24 楼

想要知道的话写个benchmark test试试呗。:-)

F*n2013-03-14 07:03

25 楼

Depends. Usually a Map uses hash table and thus depends on how you calculate
the hashcode for a string.
Therefore, your map's performance will vary for different values in your
array because the default String.hashCode depends on the char sequence.

【在 m****r 的大作中提到】

: 我有一个40个entry的string array。现在需要implement contains。不知道直接
: scan快还是放在map里面快。谁来说说？感觉还是放在map里面快点。但是这个
: threshold到底在哪里？

F*n2013-03-14 07:03

26 楼

Again, the answer is "depend on what specific strings"
{"1", "2", "3"....} is different from {"one", "two", "three"...}

how

【在 m****r 的大作中提到】

: yes. i understand all this. but again, the question i am asking is, at how
: many strings does the linear scan becomes slower.

m*r2013-03-14 07:03

27 楼

ok. what i have is a list of locale strings, such as "en_US", "nl_BE", etc.

【在 F****n 的大作中提到】

: Again, the answer is "depend on what specific strings"
: {"1", "2", "3"....} is different from {"one", "two", "three"...}
:
: how