Redian新闻
>
[转载] a question on XML parser
avatar
[转载] a question on XML parser# Java - 爪哇娇娃
l*r
1
【 以下文字转载自 Programming 讨论区,原文如下 】
发信人: laoer (You know what!), 信区: Programming
标 题: a question on XML parser
发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST
Greetings,
I have several "<.. divide="" file.="" first="" i="" in="" now="" one="" right="" this="">file to many string. Each string is one xml record. Then, I use Java SAX
parser to parse it. It turns out that it performs very slowly on dividing and
parsing. Is there any better way? like parsing all records in this file in one
time?
Tha
avatar
x*n
2
my guess is ur file operation takes long time.
post the snippet where you chop down the files into pieces.

【在 l***r 的大作中提到】
: 【 以下文字转载自 Programming 讨论区,原文如下 】
: 发信人: laoer (You know what!), 信区: Programming
: 标 题: a question on XML parser
: 发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST
: Greetings,
: I have several "<.. divide="" file.="" first="" i="" in="" now="" one="" right="" this="">: file to many string. Each string is one xml record. Then, I use Java SAX
: parser to parse it. It turns out that it performs very slowly on dividing and
: parsing. Is there any better way? like parsing all records in this file in one
: time?

avatar
z*g
3
why do you dividing the file to string?
SAX parser is a progressing parser, go line by line.
If you use JDOM, it will read all the file in.
For large file, SAX Parser performs better than DOM
parser.

【在 l***r 的大作中提到】
: 【 以下文字转载自 Programming 讨论区,原文如下 】
: 发信人: laoer (You know what!), 信区: Programming
: 标 题: a question on XML parser
: 发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST
: Greetings,
: I have several "<.. divide="" file.="" first="" i="" in="" now="" one="" right="" this="">: file to many string. Each string is one xml record. Then, I use Java SAX
: parser to parse it. It turns out that it performs very slowly on dividing and
: parsing. Is there any better way? like parsing all records in this file in one
: time?

avatar
l*r
4
Sorry, I guess I didn't describe the problem clearly. The xml file looks like
this:

br />"NCBI_BlastOutput.dtd">

...


br />"NCBI_BlastOutput.dtd">

...


br />"NCBI_BlastOutput.dtd">

...


【在 z****g 的大作中提到】
: why do you dividing the file to string?
: SAX parser is a progressing parser, go line by line.
: If you use JDOM, it will read all the file in.
: For large file, SAX Parser performs better than DOM
: parser.

avatar
w*r
5
okey, your xml file is not well formated for parsing. My suggestion is
that you can write a class to get rid of the all document head at the first
place and put all record well-formated [Cin one file(or stream).
Then what you need to do is just write a xslt to transform the
xml to whatever the format you want and parse it into your application.

【在 l***r 的大作中提到】
: Sorry, I guess I didn't describe the problem clearly. The xml file looks like
: this:
:
: br />: "NCBI_BlastOutput.dtd">
:
: ...
:

:
: /span>
avatar
z*g
6
重新读一下SAX parser的sample code.
你的理解是错误的。
SAX parser是循序解读每个element.
另外, 你的xml doc好像有些问题
XML Declaration, DTD怎么有那么多个? 这个如同html的header, 只
应该有一个啊。
avatar
l*r
7

Really? what's my mistake?

It should be no problem because this is created by commercial program. And, my
SAX parser works for this format.
like
should
XML
parser

【在 z****g 的大作中提到】
: 重新读一下SAX parser的sample code.
: 你的理解是错误的。
: SAX parser是循序解读每个element.
: 另外, 你的xml doc好像有些问题
: XML Declaration, DTD怎么有那么多个? 这个如同html的header, 只
: 应该有一个啊。

avatar
z*g
8
不是一个chunk 一个chunk读的, 是
按顺序,或者说一行一行的读的。
DOM才是整个文件送进去。
明白?

【在 l***r 的大作中提到】
:
: Really? what's my mistake?
:
: It should be no problem because this is created by commercial program. And, my
: SAX parser works for this format.
: like
: should
: XML
: parser

avatar
x*n
9
1. chop ur monolithic(?) file (collection of xmls) into collection of
xml files, parse one by one
2. find a fast way to feel an xml document (part of the file) to a parser,
then the second parsing for the second xml DOCUMENT (unfortunately
it's the second part of ur physical file), and so on.
1 or 2.

【在 z****g 的大作中提到】
: 不是一个chunk 一个chunk读的, 是
: 按顺序,或者说一行一行的读的。
: DOM才是整个文件送进去。
: 明白?

avatar
w*r
10
man, you got no other choice, your xml doc has multiple xml declaration
header,
what can you expect from the parser ? magic? NO! all you can do is to
design your own 'feeder' to the parser, skip the declare part and feed the
record to the parser. Both 1 and 2 will work, it just depends how big your
file is, if its millions millions record, i suggest 2 if small number of
records, 1 is okey.

And, my

【在 x***n 的大作中提到】
: 1. chop ur monolithic(?) file (collection of xmls) into collection of
: xml files, parse one by one
: 2. find a fast way to feel an xml document (part of the file) to a parser,
: then the second parsing for the second xml DOCUMENT (unfortunately
: it's the second part of ur physical file), and so on.
: 1 or 2.

avatar
x*n
11
do u expect him to do a big project?
//btw, not ME does this project.

【在 w*r 的大作中提到】
: man, you got no other choice, your xml doc has multiple xml declaration
: header,
: what can you expect from the parser ? magic? NO! all you can do is to
: design your own 'feeder' to the parser, skip the declare part and feed the
: record to the parser. Both 1 and 2 will work, it just depends how big your
: file is, if its millions millions record, i suggest 2 if small number of
: records, 1 is okey.
:
: And, my

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。