Redian新闻
>
如何快速处理上万个文件?
avatar
如何快速处理上万个文件?# Java - 爪哇娇娃
c*t
1
程序要求能快速的从硬盘读取上万个文件,
并把文件内容放在database中。
如果用循环逐个打开文件读就太慢了。不知有什么好办法?
多谢。
avatar
xt
2

既然这样太慢那就没有办法了.我想不出有更快的办法

【在 c*******t 的大作中提到】
: 程序要求能快速的从硬盘读取上万个文件,
: 并把文件内容放在database中。
: 如果用循环逐个打开文件读就太慢了。不知有什么好办法?
: 多谢。

avatar
c*t
3
用多线程会不会快一点?

【在 xt 的大作中提到】
:
: 既然这样太慢那就没有办法了.我想不出有更快的办法

avatar
xt
4

可能吧.不好说

【在 c*******t 的大作中提到】
: 用多线程会不会快一点?
avatar
e*g
5
not likely, this is IO bound.

【在 xt 的大作中提到】
:
: 可能吧.不好说

avatar
m*t
6
If his "database" doesn't happen to be another file located
on the same harddrive, I think multithread would improve the
performance for a lot. It would take some experiment to find
an optimal number of "worker thread" though.

【在 e***g 的大作中提到】
: not likely, this is IO bound.
avatar
r*s
7
the bottleneck here is the HD and file system. Even if you
read the files sequentially, it won't be much different from
the multithread solutions, as it is IO bound only. Multithread
works only if it is IO+CPU bound.
therefore, you might need high performance file system, such as
IBM GPFS ...

【在 m******t 的大作中提到】
: If his "database" doesn't happen to be another file located
: on the same harddrive, I think multithread would improve the
: performance for a lot. It would take some experiment to find
: an optimal number of "worker thread" though.

avatar
xt
8

A SCSI will be good enough to handle that

【在 r*****s 的大作中提到】
: the bottleneck here is the HD and file system. Even if you
: read the files sequentially, it won't be much different from
: the multithread solutions, as it is IO bound only. Multithread
: works only if it is IO+CPU bound.
: therefore, you might need high performance file system, such as
: IBM GPFS ...

avatar
e*g
9
in that case, typical producer/consumer, 2 thread should be enough
with a queue in between. more threads writing to database will cause
unnecessary concurrency control to already busy database.

【在 m******t 的大作中提到】
: If his "database" doesn't happen to be another file located
: on the same harddrive, I think multithread would improve the
: performance for a lot. It would take some experiment to find
: an optimal number of "worker thread" though.

avatar
m*t
10

It's not the only bottleneck. Another potential bottleneck would
be the DB+network roundtrip. A multi-thread design would allow
the application to do DB and local I/O concurrently (again, assuming
the DB is not local).
Also, before knowing more about the details of the OP application,
it's not unusual that some processing does happen to the data
once it's read into the memory. A multi-thread design would also
allow the application to improve its CPU utlization in this case.

【在 r*****s 的大作中提到】
: the bottleneck here is the HD and file system. Even if you
: read the files sequentially, it won't be much different from
: the multithread solutions, as it is IO bound only. Multithread
: works only if it is IO+CPU bound.
: therefore, you might need high performance file system, such as
: IBM GPFS ...

avatar
m*t
11

Well it depends. If the data is written to different tables,
or different pages in the same table, most modern database
products have very sophisticated concurrency support to avoid resource
competing.

【在 e***g 的大作中提到】
: in that case, typical producer/consumer, 2 thread should be enough
: with a queue in between. more threads writing to database will cause
: unnecessary concurrency control to already busy database.

avatar
c*e
12

1st on the server side, trust your database and let it to optimization
2nd, on the client(ur)side, if networking is really the bottle next,
asynchronous handling could be a good measure.

【在 m******t 的大作中提到】
:
: Well it depends. If the data is written to different tables,
: or different pages in the same table, most modern database
: products have very sophisticated concurrency support to avoid resource
: competing.

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。