g*g
2 楼
Do something in between, let's say you keep a "file pool",
you can open a maximum of 5000, and you keep the most recent 5000
open. Put it in a queue, pop the head out and append the new one
at the tail when it's over 5000. When you write a file and the file
is already in the queue, remove it and append it to the tail.
To speed up search, you can use a hashmap to track if the files are
open.
【在 j*******s 的大作中提到】![](/moin_static193/solenoid/img/up.png)
: 请教一个问题,有一个大文件,是个txt表格,按照第一列的关键字分割成若干文件。
: 比如
you can open a maximum of 5000, and you keep the most recent 5000
open. Put it in a queue, pop the head out and append the new one
at the tail when it's over 5000. When you write a file and the file
is already in the queue, remove it and append it to the tail.
To speed up search, you can use a hashmap to track if the files are
open.
【在 j*******s 的大作中提到】
![](/moin_static193/solenoid/img/up.png)
: 请教一个问题,有一个大文件,是个txt表格,按照第一列的关键字分割成若干文件。
: 比如
j*s
3 楼
好方法,多谢多谢,堆栈这个方法好极了。
【在 g*****g 的大作中提到】![](/moin_static193/solenoid/img/up.png)
: Do something in between, let's say you keep a "file pool",
: you can open a maximum of 5000, and you keep the most recent 5000
: open. Put it in a queue, pop the head out and append the new one
: at the tail when it's over 5000. When you write a file and the file
: is already in the queue, remove it and append it to the tail.
: To speed up search, you can use a hashmap to track if the files are
: open.
【在 g*****g 的大作中提到】
![](/moin_static193/solenoid/img/up.png)
: Do something in between, let's say you keep a "file pool",
: you can open a maximum of 5000, and you keep the most recent 5000
: open. Put it in a queue, pop the head out and append the new one
: at the tail when it's over 5000. When you write a file and the file
: is already in the queue, remove it and append it to the tail.
: To speed up search, you can use a hashmap to track if the files are
: open.
j*s
4 楼
用队列还是堆栈好?第一列的关键字是随机的,FIFO还是LIFO没区别吧?
【在 g*****g 的大作中提到】![](/moin_static193/solenoid/img/up.png)
: Do something in between, let's say you keep a "file pool",
: you can open a maximum of 5000, and you keep the most recent 5000
: open. Put it in a queue, pop the head out and append the new one
: at the tail when it's over 5000. When you write a file and the file
: is already in the queue, remove it and append it to the tail.
: To speed up search, you can use a hashmap to track if the files are
: open.
【在 g*****g 的大作中提到】
![](/moin_static193/solenoid/img/up.png)
: Do something in between, let's say you keep a "file pool",
: you can open a maximum of 5000, and you keep the most recent 5000
: open. Put it in a queue, pop the head out and append the new one
: at the tail when it's over 5000. When you write a file and the file
: is already in the queue, remove it and append it to the tail.
: To speed up search, you can use a hashmap to track if the files are
: open.
A*o
9 楼
or keep all file names in memory,
and only write to 10k files each iteration reading through the raw file.
【在 g*****g 的大作中提到】![](/moin_static193/solenoid/img/up.png)
: Do something in between, let's say you keep a "file pool",
: you can open a maximum of 5000, and you keep the most recent 5000
: open. Put it in a queue, pop the head out and append the new one
: at the tail when it's over 5000. When you write a file and the file
: is already in the queue, remove it and append it to the tail.
: To speed up search, you can use a hashmap to track if the files are
: open.
and only write to 10k files each iteration reading through the raw file.
【在 g*****g 的大作中提到】
![](/moin_static193/solenoid/img/up.png)
: Do something in between, let's say you keep a "file pool",
: you can open a maximum of 5000, and you keep the most recent 5000
: open. Put it in a queue, pop the head out and append the new one
: at the tail when it's over 5000. When you write a file and the file
: is already in the queue, remove it and append it to the tail.
: To speed up search, you can use a hashmap to track if the files are
: open.
相关阅读
Google Web Toolkit 令人失望为啥画不出来?请问各位(巨,大,中,小,微,不)牛JAVA和.NETAmex Blue Cash preferred 6% Cashback+$150bonus超市专用信用卡这段程序的输出是什么? 为什么Amex Blue Cash preferred 6% Cashback+$150bonus超市专用信用卡小日本真不是一般的垃圾添加folder到class path$10 gift from Rebtel! 万圣节奖励冲10刀送10刀,Rebtel电话卡大促销Re: 安卓速度流畅的关键在哪儿? (转载)谁能说说Perm Gen Size太大有什么坏处?请教一个 JList + JScrollPane 的问题Amex Blue Cash preferred 6% Cashback+$150bonus超市专用信用卡外行求助javascipt。。Ing Direct checking account免费送50刀, 赶紧申请htmlunit及多线程问题Re: 要不要跳ASP.net and C# 坑? (转载)菜鸟请教jsp和ejbMP4/RTP support in JMF