Redian新闻
>
一道c++ 题, 找出duplicate numbers
avatar
一道c++ 题, 找出duplicate numbers# Programming - 葵花宝典
s*e
1
You are given an array of 16 bit numbers.
The array may contain few duplicates not to exceed 0.1%
of array size. Each duplicated number can occur multiple tiimes.
The goal is to write high performance function to find out those
duplicated numbers, and how many times each number occurs.
avatar
m*t
2
hash table?

【在 s*********e 的大作中提到】
: You are given an array of 16 bit numbers.
: The array may contain few duplicates not to exceed 0.1%
: of array size. Each duplicated number can occur multiple tiimes.
: The goal is to write high performance function to find out those
: duplicated numbers, and how many times each number occurs.

avatar
c*e
3
16 bits so there are 2^16 possible numbers, pure hash will be too space
ineffective.
hash on the first 8 bits 256 basks, then do binary search on the lower
8 bits.

【在 m***t 的大作中提到】
: hash table?
avatar
c*e
4
well, i guess this really depends on the actual array size. if the size of
the
array is close to 2^16 then a pure hash will be it since there is very few
dups.

【在 c********e 的大作中提到】
: 16 bits so there are 2^16 possible numbers, pure hash will be too space
: ineffective.
: hash on the first 8 bits 256 basks, then do binary search on the lower
: 8 bits.

avatar
j*h
5
you can also use a bitmap (8K bytes) to detect duplicates,
and use a hashtable to store the counters for each
duplicate. Since only there're only 0.1% duplicates, the hash
table should be very small.

【在 c********e 的大作中提到】
: well, i guess this really depends on the actual array size. if the size of
: the
: array is close to 2^16 then a pure hash will be it since there is very few
: dups.

avatar
m*k
6
why not directly use bitmap. Since duplicate rate <0.1%. So the maximium
duplicate occurrences will be less than 64K * 0.001 <=66. So for each number
in bitmap, I would use 7 bit to represent. So total memory consuption is
7bits * 64K =56KB. Then I can directly count the number of duplicates in
this bitmaps
In the other side, I just wonder if you use an extra Hashtable, How many
slots are you gonna use? I think it's O(n) where n is the size of the array.

you can also use a bitmap (8K bytes) to

【在 j*****h 的大作中提到】
: you can also use a bitmap (8K bytes) to detect duplicates,
: and use a hashtable to store the counters for each
: duplicate. Since only there're only 0.1% duplicates, the hash
: table should be very small.

avatar
m*k
7
for the binary search method, are you chaining an binary tree at each slot?

16 bits so there are 2^16 possible numbers, pure hash will be too space
ineffective.
hash on the first 8 bits 256 basks, then do binary search on the lower
8 bits.

【在 c********e 的大作中提到】
: 16 bits so there are 2^16 possible numbers, pure hash will be too space
: ineffective.
: hash on the first 8 bits 256 basks, then do binary search on the lower
: 8 bits.

avatar
c*e
8
when u insert the number into each basket, contruct a b-tree

?

【在 m******k 的大作中提到】
: for the binary search method, are you chaining an binary tree at each slot?
:
: 16 bits so there are 2^16 possible numbers, pure hash will be too space
: ineffective.
: hash on the first 8 bits 256 basks, then do binary search on the lower
: 8 bits.

avatar
t*t
9
7bit还不如8bit呢.直接变成hash了.

number
array.

【在 m******k 的大作中提到】
: why not directly use bitmap. Since duplicate rate <0.1%. So the maximium
: duplicate occurrences will be less than 64K * 0.001 <=66. So for each number
: in bitmap, I would use 7 bit to represent. So total memory consuption is
: 7bits * 64K =56KB. Then I can directly count the number of duplicates in
: this bitmaps
: In the other side, I just wonder if you use an extra Hashtable, How many
: slots are you gonna use? I think it's O(n) where n is the size of the array.
:
: you can also use a bitmap (8K bytes) to

avatar
t*t
10
搞那么复杂...弄个64K的表格有啥不好的?也没浪费多少空间吧.

【在 c********e 的大作中提到】
: when u insert the number into each basket, contruct a b-tree
:
: ?

avatar
c*e
11
en, bt neh

【在 t****t 的大作中提到】
: 搞那么复杂...弄个64K的表格有啥不好的?也没浪费多少空间吧.
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。