Redian新闻
>
google老题:Find kth largest of sum of elements in 2 sorted array
avatar
google老题:Find kth largest of sum of elements in 2 sorted array# JobHunting - 待字闺中
f*4
1
原帖:http://www.mitbbs.com/article_t/Programming/31198701.html
Two reverse sorted arrays A and B have been given.such that size of A is m
and size of B is n You need to find the k th largest sum (a+b) where a is
taken from A and b is taken from B. such that k < m*n
有O(n)的解么?怎么转化成杨矩啊?
谢谢
avatar
l*a
2
为什么非往矩阵上靠
跟N有什么关系?
assume求最小的吧
obviously a[0]+b[0] is the least
we can create a minimum heap with the size of k
each time ,pop up the smallest as a[i]+b[j]
then we can insert a[i+1]+B[j],a[i]+b[j+1] into the heap
each time adjust the heap need lgk
we need to do k times.
so time complexity will be O(klgk)
another tricky here,think about it at first

【在 f****4 的大作中提到】
: 原帖:http://www.mitbbs.com/article_t/Programming/31198701.html
: Two reverse sorted arrays A and B have been given.such that size of A is m
: and size of B is n You need to find the k th largest sum (a+b) where a is
: taken from A and b is taken from B. such that k < m*n
: 有O(n)的解么?怎么转化成杨矩啊?
: 谢谢

avatar
f*4
3
k 最坏的情况: n=m,k=n^2,你的方法是O(n^2*lgn)
我问的是有没有O(n)的解法。
有人说能转换成杨矩,有点不明白:对一个满的n*m杨矩,有O(n)的解法找到第k大数么?

【在 l*****a 的大作中提到】
: 为什么非往矩阵上靠
: 跟N有什么关系?
: assume求最小的吧
: obviously a[0]+b[0] is the least
: we can create a minimum heap with the size of k
: each time ,pop up the smallest as a[i]+b[j]
: then we can insert a[i+1]+B[j],a[i]+b[j+1] into the heap
: each time adjust the heap need lgk
: we need to do k times.
: so time complexity will be O(klgk)

avatar
H*s
4
貌似这题O(n)不大可能, 我也想知道O(n)的解法, 帮顶!
avatar
f*4
8
Median of Medians algorithm
就是这个wiki的例子,当47找到之后,如果我们是要找第80大的数,
是不是要把大于47的数中,再找第32大数(重新按5行分列,重新计算medians,再算
median of medians)。直到找到那个数?
但是49匹马的赛马题能套这个方法,但我不能证明这个办法赛马次数最少。。。

【在 j**y 的大作中提到】
: http://en.wikipedia.org/wiki/Selection_algorithm
avatar
f*4
9
找到一个对的O(N)的解法
http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi?board=ri
This is a pretty good puzzle. You can actually find the cutoff and a
description of exactly which pairs are in the solution in less than O(N)
time, but outputting all the solutions takes O(N) time, so it doesn't help
you overall. This won't be much of a hint, but finding the cutoff point can
be done in O(sqrt(N)*log2N) time. Forgive me for not writing code, but this
gets pretty nitty-gritty in spots.
I think of the problem as a two-dimensional array, S(i,j), of numbers formed
by the addition S(i,j) = A[i] + B[j]. A cutoff value C defines a cutoff
line through the array so that above the cutoff line S(i,j) >= C and below
the line S(i,j) <= C. The particular value C0 that we want gives S(i,j) >=
C0 for at least N entries in the array, and S(i,j) <= C0 for at least N(N-1)
entries in the array.
Here is a better hint, posed as a question:
hidden:
Noting that the problem is to find the partition that contains the highest N
entries in the square, devise a method for describing a generic partition
containing N items that takes less than O(N) memory.
Answer:
hidden:
We describe a partition as a set of "partition points" that specify the
corners of the partition, such that the partition is exactly those S(i,j),
so that for some partition point with coordinates i0,j0, i <= i0 and j <= j0
. Note, however, that the value S(sqrt(N)+1,sqrt(N)+1) cannot be in the
result, so the possible partitions are all skinny, spread along the i=1 and
j=1 sides of the array. Therefore, we divide the partition into two pieces
along the line i=j. Now both pieces of the partition have a height or width
of at most sqrt(N). Therefore, we need to allocate space for only 2*sqrt(N)
partition points.
Continuing:
hidden:
Now our general aim is to find the particular cutoff C0 such that the
partition defined by C0 contains exactly 100 items (I'll assume all the
items have different values, but no fundamental problem is introduced
otherwise). First, note that any C0 must be less than A[1] + B[1], and must
be greater than or equal to A[sqrt(N)+1] + B[sqrt(N)+1]. We therefore do a
binary search on this range of values, to find C0.
As we do our binary search, we must determine how many items are above a
trial cutoff C. Doing this is not particularly difficult. First, we move
each partition point to the value in its row or column corresponding to the
smallest sum larger than C. After all partition points are placed (which
takes O(logN) per partition point, for O(sqrt(N)logN) total), we add up the
number of elements larger than C in the entire array. This also takes O(sqrt
(N)logN) time.
Now we just have to try this with various C until we get C0, our ideal value.
Now for a wrinkle:
hidden:
My first thought was that it would take O(logN) trials of C to get our ideal
value, but that is not entirely true. The number of trials depends on the
data type of the values we are comparing, and their distribution in that
data type. Using integers, it takes a maximum of 16 steps to find a value
with precision 1. But if A[i] and B[j] are real numbers, with large values
but potentially small differences between them, it could take much longer.
So the order of the naive algorithm is O(sqrt(N)logNlogK), where K is a
measure of the granularity of the numbers.
A refinement:
hidden:
In order to have a proper binary search, so that we take O(logN) trial
values of C, we must use values that are within the array. The way to do
this is to keep track of more information. For each partition point, we will
keep three values: one value is known to be too high, one value is known to
be too low, and one value is our current guess (which will be proven too
high or too low). Putting all the partitions together, we now have something
like O(Nsqrt(N)) numbers. My initial thought was to choose a new C from the
largest row in the array (corresponding to i=1). But what if we find that
all the values in the largest row are either too large or too small? Then we
must select a value from a different row or column. We might have to repeat
this O(sqrt(N)) times, which would give us O(NlogN) performance overall.
So instead, we can use a median-of-medians approach by which we choose the
median from each potential partition point range, then choose the median of
those medians as a new trial C. This should give us fairly good performance.
Choosing the median of each set can be done in constant time, and we need
to choose O(sqrt(N)) medians, then find the median of those, which can be
done by sorting then indexing, taking O(sqrt(N)log(sqrt(N)) = O(sqrt(N)logN)
time, which is the same time it takes to evaluate the new C.
The second wrinkle:
hidden:
But another wrinkle crops up, because a median-of-median calculation
guarantees performance by guaranteeing that a certain fraction of the total
values are above the median, and a similar fraction are below the median.
Because the potential partition point ranges have different sizes, we can't
come up with a similar guarantee for this method. So we actually fail to
ensure that we only need to choose O(logN) trial values for C.
A further refinement:
hidden:
One simple way to address this shortcoming is to store the size of the
potential partition point range along with its median, sorting them as a
pair, and then after sorting select a median based on the range sizes. This
provides a guarantee of a minimum fraction of values both above and below
the median. Now we know it will take only O(logN) trial values of C to find
C0. This gives a total time of O(logN(sqrt(N)logN + sqrt(N)logN)) = O(sqrt(N
)log2N) time to find C0. It then takes O(N + sqrt(N)) = O(N) to output the
values.
avatar
g*i
10
只有2个数组的话不需要用heap吧,就每个数组维系1个pointer,当前最大值是a[i]+b[j]
的话,第二大值只会在a[i]+b[j+1]或者a[i+1]b[j]里面产生,只有一个pointer要移动

【在 l*****a 的大作中提到】
: 为什么非往矩阵上靠
: 跟N有什么关系?
: assume求最小的吧
: obviously a[0]+b[0] is the least
: we can create a minimum heap with the size of k
: each time ,pop up the smallest as a[i]+b[j]
: then we can insert a[i+1]+B[j],a[i]+b[j+1] into the heap
: each time adjust the heap need lgk
: we need to do k times.
: so time complexity will be O(klgk)

avatar
f*4
11
最大值候选对象会线性增加的

j]

【在 g*****i 的大作中提到】
: 只有2个数组的话不需要用heap吧,就每个数组维系1个pointer,当前最大值是a[i]+b[j]
: 的话,第二大值只会在a[i]+b[j+1]或者a[i+1]b[j]里面产生,只有一个pointer要移动

avatar
f*4
12
原帖:http://www.mitbbs.com/article_t/Programming/31198701.html
Two reverse sorted arrays A and B have been given.such that size of A is m
and size of B is n You need to find the k th largest sum (a+b) where a is
taken from A and b is taken from B. such that k < m*n
有O(n)的解么?怎么转化成杨矩啊?
谢谢
avatar
l*a
13
为什么非往矩阵上靠
跟N有什么关系?
assume求最小的吧
obviously a[0]+b[0] is the least
we can create a minimum heap with the size of k
each time ,pop up the smallest as a[i]+b[j]
then we can insert a[i+1]+B[j],a[i]+b[j+1] into the heap
each time adjust the heap need lgk
we need to do k times.
so time complexity will be O(klgk)
another tricky here,think about it at first

【在 f****4 的大作中提到】
: 原帖:http://www.mitbbs.com/article_t/Programming/31198701.html
: Two reverse sorted arrays A and B have been given.such that size of A is m
: and size of B is n You need to find the k th largest sum (a+b) where a is
: taken from A and b is taken from B. such that k < m*n
: 有O(n)的解么?怎么转化成杨矩啊?
: 谢谢

avatar
f*4
14
k 最坏的情况: n=m,k=n^2,你的方法是O(n^2*lgn)
我问的是有没有O(n)的解法。
有人说能转换成杨矩,有点不明白:对一个满的n*m杨矩,有O(n)的解法找到第k大数么?

【在 l*****a 的大作中提到】
: 为什么非往矩阵上靠
: 跟N有什么关系?
: assume求最小的吧
: obviously a[0]+b[0] is the least
: we can create a minimum heap with the size of k
: each time ,pop up the smallest as a[i]+b[j]
: then we can insert a[i+1]+B[j],a[i]+b[j+1] into the heap
: each time adjust the heap need lgk
: we need to do k times.
: so time complexity will be O(klgk)

avatar
H*s
15
貌似这题O(n)不大可能, 我也想知道O(n)的解法, 帮顶!
avatar
f*4
19
Median of Medians algorithm
就是这个wiki的例子,当47找到之后,如果我们是要找第80大的数,
是不是要把大于47的数中,再找第32大数(重新按5行分列,重新计算medians,再算
median of medians)。直到找到那个数?
但是49匹马的赛马题能套这个方法,但我不能证明这个办法赛马次数最少。。。

【在 j**y 的大作中提到】
: http://en.wikipedia.org/wiki/Selection_algorithm
avatar
f*4
20
找到一个对的O(N)的解法
http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi?board=ri
This is a pretty good puzzle. You can actually find the cutoff and a
description of exactly which pairs are in the solution in less than O(N)
time, but outputting all the solutions takes O(N) time, so it doesn't help
you overall. This won't be much of a hint, but finding the cutoff point can
be done in O(sqrt(N)*log2N) time. Forgive me for not writing code, but this
gets pretty nitty-gritty in spots.
I think of the problem as a two-dimensional array, S(i,j), of numbers formed
by the addition S(i,j) = A[i] + B[j]. A cutoff value C defines a cutoff
line through the array so that above the cutoff line S(i,j) >= C and below
the line S(i,j) <= C. The particular value C0 that we want gives S(i,j) >=
C0 for at least N entries in the array, and S(i,j) <= C0 for at least N(N-1)
entries in the array.
Here is a better hint, posed as a question:
hidden:
Noting that the problem is to find the partition that contains the highest N
entries in the square, devise a method for describing a generic partition
containing N items that takes less than O(N) memory.
Answer:
hidden:
We describe a partition as a set of "partition points" that specify the
corners of the partition, such that the partition is exactly those S(i,j),
so that for some partition point with coordinates i0,j0, i <= i0 and j <= j0
. Note, however, that the value S(sqrt(N)+1,sqrt(N)+1) cannot be in the
result, so the possible partitions are all skinny, spread along the i=1 and
j=1 sides of the array. Therefore, we divide the partition into two pieces
along the line i=j. Now both pieces of the partition have a height or width
of at most sqrt(N). Therefore, we need to allocate space for only 2*sqrt(N)
partition points.
Continuing:
hidden:
Now our general aim is to find the particular cutoff C0 such that the
partition defined by C0 contains exactly 100 items (I'll assume all the
items have different values, but no fundamental problem is introduced
otherwise). First, note that any C0 must be less than A[1] + B[1], and must
be greater than or equal to A[sqrt(N)+1] + B[sqrt(N)+1]. We therefore do a
binary search on this range of values, to find C0.
As we do our binary search, we must determine how many items are above a
trial cutoff C. Doing this is not particularly difficult. First, we move
each partition point to the value in its row or column corresponding to the
smallest sum larger than C. After all partition points are placed (which
takes O(logN) per partition point, for O(sqrt(N)logN) total), we add up the
number of elements larger than C in the entire array. This also takes O(sqrt
(N)logN) time.
Now we just have to try this with various C until we get C0, our ideal value.
Now for a wrinkle:
hidden:
My first thought was that it would take O(logN) trials of C to get our ideal
value, but that is not entirely true. The number of trials depends on the
data type of the values we are comparing, and their distribution in that
data type. Using integers, it takes a maximum of 16 steps to find a value
with precision 1. But if A[i] and B[j] are real numbers, with large values
but potentially small differences between them, it could take much longer.
So the order of the naive algorithm is O(sqrt(N)logNlogK), where K is a
measure of the granularity of the numbers.
A refinement:
hidden:
In order to have a proper binary search, so that we take O(logN) trial
values of C, we must use values that are within the array. The way to do
this is to keep track of more information. For each partition point, we will
keep three values: one value is known to be too high, one value is known to
be too low, and one value is our current guess (which will be proven too
high or too low). Putting all the partitions together, we now have something
like O(Nsqrt(N)) numbers. My initial thought was to choose a new C from the
largest row in the array (corresponding to i=1). But what if we find that
all the values in the largest row are either too large or too small? Then we
must select a value from a different row or column. We might have to repeat
this O(sqrt(N)) times, which would give us O(NlogN) performance overall.
So instead, we can use a median-of-medians approach by which we choose the
median from each potential partition point range, then choose the median of
those medians as a new trial C. This should give us fairly good performance.
Choosing the median of each set can be done in constant time, and we need
to choose O(sqrt(N)) medians, then find the median of those, which can be
done by sorting then indexing, taking O(sqrt(N)log(sqrt(N)) = O(sqrt(N)logN)
time, which is the same time it takes to evaluate the new C.
The second wrinkle:
hidden:
But another wrinkle crops up, because a median-of-median calculation
guarantees performance by guaranteeing that a certain fraction of the total
values are above the median, and a similar fraction are below the median.
Because the potential partition point ranges have different sizes, we can't
come up with a similar guarantee for this method. So we actually fail to
ensure that we only need to choose O(logN) trial values for C.
A further refinement:
hidden:
One simple way to address this shortcoming is to store the size of the
potential partition point range along with its median, sorting them as a
pair, and then after sorting select a median based on the range sizes. This
provides a guarantee of a minimum fraction of values both above and below
the median. Now we know it will take only O(logN) trial values of C to find
C0. This gives a total time of O(logN(sqrt(N)logN + sqrt(N)logN)) = O(sqrt(N
)log2N) time to find C0. It then takes O(N + sqrt(N)) = O(N) to output the
values.
avatar
g*i
21
只有2个数组的话不需要用heap吧,就每个数组维系1个pointer,当前最大值是a[i]+b[j]
的话,第二大值只会在a[i]+b[j+1]或者a[i+1]b[j]里面产生,只有一个pointer要移动

【在 l*****a 的大作中提到】
: 为什么非往矩阵上靠
: 跟N有什么关系?
: assume求最小的吧
: obviously a[0]+b[0] is the least
: we can create a minimum heap with the size of k
: each time ,pop up the smallest as a[i]+b[j]
: then we can insert a[i+1]+B[j],a[i]+b[j+1] into the heap
: each time adjust the heap need lgk
: we need to do k times.
: so time complexity will be O(klgk)

avatar
f*4
22
最大值候选对象会线性增加的

j]

【在 g*****i 的大作中提到】
: 只有2个数组的话不需要用heap吧,就每个数组维系1个pointer,当前最大值是a[i]+b[j]
: 的话,第二大值只会在a[i]+b[j+1]或者a[i+1]b[j]里面产生,只有一个pointer要移动

avatar
B*1
23
可否仔细解释一下这哥们的算法啊,看得很晕,不知道他说啥。
thanks

can
this
formed

【在 f****4 的大作中提到】
: 找到一个对的O(N)的解法
: http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi?board=ri
: This is a pretty good puzzle. You can actually find the cutoff and a
: description of exactly which pairs are in the solution in less than O(N)
: time, but outputting all the solutions takes O(N) time, so it doesn't help
: you overall. This won't be much of a hint, but finding the cutoff point can
: be done in O(sqrt(N)*log2N) time. Forgive me for not writing code, but this
: gets pretty nitty-gritty in spots.
: I think of the problem as a two-dimensional array, S(i,j), of numbers formed
: by the addition S(i,j) = A[i] + B[j]. A cutoff value C defines a cutoff

avatar
r*h
24
多显然的一道题,还要什么矩阵。既然a和b都已经降序排列,直接p=k/n取上整,如果p
==1,直接从b中取第k个和a最大相加,如果p>1,q=k mod n,取a中第p个和b中第q个相
加。
avatar
e*s
25
不会吧。。。
你的解是在b最大的元素小于a最小的元素才成立吧。

果p

【在 r*******h 的大作中提到】
: 多显然的一道题,还要什么矩阵。既然a和b都已经降序排列,直接p=k/n取上整,如果p
: ==1,直接从b中取第k个和a最大相加,如果p>1,q=k mod n,取a中第p个和b中第q个相
: 加。

avatar
h*z
26
根据Young tableau的思路下了以下算法。
写的有点罗嗦。简单说就是在Young tableau里保持两个索引(a,b), (c,d), a >= c, b
<= d, 这两个中有一个是第k步最大的和。下面有些condition case省略了不可能的情
况。第4步,k是一个常数。25行O(m)。可以把横竖换一下变成O(n)。
M(x, y)
1 return A[x] + B[y]
LARGEST(k)
1 (a, b) = (1, 1)
2 (c, d) = (1, 1)
3 (e, f) = (1, 1)
4 for i = 1 to k
5 if a = c and b = d then
6 (e, f) = (a, b)
7 if a = m and b = n then
8 // last element, stop
9 else if a = m and b < n then
10 (a, b) = (1, b + 1)
11 (c, d) = (1, d + 1)
12 else if a < m and b = n then
13 (a, b) = (a + 1, b)
14 (c, d) = (c + 1, d)
15 else
16 (a, b) = (a + 1, b)
17 if M(c + 1, d) >= M(c, d+ 1) then
18 (c, d) = (c + 1, d)
19 else
20 (c, d) = (c, d + 1)
21 else if M(a, b) >= M(c, d) then
22 (e, f) = (a, b)
23 if a = m then
24 if d = b + 1 then
25 (a, b) = (c, d)
26 else
25 for j = 1 to m
26 if M(j, b + 1) < M(c, d) then
27 break
29 (a, b) = (j, b + 1)
30 else
31 (a, b) = (a + 1, b)
32 else //M(a, b) < M(c, d)
33 (e, f) = (c, d)
34 if d = n then
35 (c, d) = (c + 1, d)
36 else
37 if M(c + 1, d) >= M(1, d + 1) then
38 (c, d) = (c + 1, d)
39 else
40 (c, d) = (1, d + 1)
avatar
f*4
27
没仔细看,不过你想维护2个索引来找第k大sum是不对的
楼上有说,第k大sum的可能对象是线性增加的

b

【在 h*z 的大作中提到】
: 根据Young tableau的思路下了以下算法。
: 写的有点罗嗦。简单说就是在Young tableau里保持两个索引(a,b), (c,d), a >= c, b
: <= d, 这两个中有一个是第k步最大的和。下面有些condition case省略了不可能的情
: 况。第4步,k是一个常数。25行O(m)。可以把横竖换一下变成O(n)。
: M(x, y)
: 1 return A[x] + B[y]
: LARGEST(k)
: 1 (a, b) = (1, 1)
: 2 (c, d) = (1, 1)
: 3 (e, f) = (1, 1)

avatar
s*c
28
hiz的不对吧。。。
avatar
h*z
29
的确不对。
现在有另外一个想法:
我们知道每一行中,从大到小排序的。假设我们已经找到k-1个最大的数,这k-1个数必
然是靠左上角的。也就是以下形状:x表示确定了,也就是前k-1个最大数。
xxxxo-
xxo---
xo----
o-----
下一个最大数必然在o中间产生,也就是每一行没有确定的,最大的那个数。一共n行,
所以O(kn)=O(n)。
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。