avatar
简单算法问题# Java - 爪哇娇娃
g*y
1
给一个HTML String, 怎么找
的结尾
?
直觉办法是:counter=1, search ", counter++, 如果是div>, counter--, counter归零时就是结尾。
有更简单有效的办法吗?或者说用Java自己的library, 有更简洁的coding吗?
avatar
m*t
2

Use one of those html parsing libraries - jericho, htmlunit, etc.

【在 g**********y 的大作中提到】
: 给一个HTML String, 怎么找
的结尾
?
: 直觉办法是:counter=1, search ", counter++, 如果是: div>, counter--, counter归零时就是结尾。
: 有更简单有效的办法吗?或者说用Java自己的library, 有更简洁的coding吗?

avatar
S*a
3
what do u mean 怎么找?
java has regular expression utils.

【在 g**********y 的大作中提到】
: 给一个HTML String, 怎么找
的结尾
?
: 直觉办法是:counter=1, search ", counter++, 如果是: div>, counter--, counter归零时就是结尾。
: 有更简单有效的办法吗?或者说用Java自己的library, 有更简洁的coding吗?

avatar
g*y
4
那些library太heavy了,对于简单问题,我现在越来越不想用别人的library, 冷不丁
地升级一下,有可能引进一些莫名其妙的bug.
我倒是该去读读他们的source code, 他们的解法可能有启发。

【在 m******t 的大作中提到】
:
: Use one of those html parsing libraries - jericho, htmlunit, etc.

avatar
g*y
5
对于recursive div, regular expression怎么写?

【在 S********a 的大作中提到】
: what do u mean 怎么找?
: java has regular expression utils.

avatar
g*g
6
If you are sure your html string is well formed, the way you propose
(search by regex and use a counter) is pretty efficient.
It's when you try to ignore/not ignore certain errors you'll find out
3rd party library is very helpful.
Html is not well formed, don't be surprised to see something like
.
And don't be surprised to find out the divs are not matching exactly.
If you have code like
while(counter > 0) {
....
}
You may loop forever in production, just keep those in mind.

【在 g**********y 的大作中提到】
: 那些library太heavy了,对于简单问题,我现在越来越不想用别人的library, 冷不丁
: 地升级一下,有可能引进一些莫名其妙的bug.
: 我倒是该去读读他们的source code, 他们的解法可能有启发。

avatar
c*t
7
嗯。这个问题不是 regex 能解决的。这是 context free grammar 。
其实你自己写个 html parser 也不是太难 :P

【在 g**********y 的大作中提到】
: 对于recursive div, regular expression怎么写?
avatar
g*y
8
For my specific problem, input is just a bunch of well formed xhtml code. I
used XML parser to manage content according to user's role. Then I realize
that XML parser is too heavy. All I need to remove some
under some
condition. That's how the problem comes.

【在 g*****g 的大作中提到】
: If you are sure your html string is well formed, the way you propose
: (search by regex and use a counter) is pretty efficient.
: It's when you try to ignore/not ignore certain errors you'll find out
: 3rd party library is very helpful.
: Html is not well formed, don't be surprised to see something like
.
: And don't be surprised to find out the divs are not matching exactly.
: If you have code like
: while(counter > 0) {
: ....
: }
avatar
F*n
9
Then you should directly iterate through character sequence. It's not a very
complicated issue.

I

【在 g**********y 的大作中提到】
: For my specific problem, input is just a bunch of well formed xhtml code. I
: used XML parser to manage content according to user's role. Then I realize
: that XML parser is too heavy. All I need to remove some
under some
: condition. That's how the problem comes.
avatar
c*t
10
if it's well formated, try xpath

I

【在 g**********y 的大作中提到】
: For my specific problem, input is just a bunch of well formed xhtml code. I
: used XML parser to manage content according to user's role. Then I realize
: that XML parser is too heavy. All I need to remove some
under some
: condition. That's how the problem comes.
avatar
m*t
11

I
So put your counter in a sax parser, which is lightweight enough.
I still don't see any point in reinventing the whole wheel.

【在 g**********y 的大作中提到】
: For my specific problem, input is just a bunch of well formed xhtml code. I
: used XML parser to manage content according to user's role. Then I realize
: that XML parser is too heavy. All I need to remove some
under some
: condition. That's how the problem comes.
avatar
v*s
12
regexp

【在 g**********y 的大作中提到】
: 给一个HTML String, 怎么找
的结尾
?
: 直觉办法是:counter=1, search ", counter++, 如果是: div>, counter--, counter归零时就是结尾。
: 有更简单有效的办法吗?或者说用Java自己的library, 有更简洁的coding吗?

avatar
T*g
13
stack解决
举个例子




id是1的div先进站,id 2 div进站,div/进站,发现栈底的div成对,于是取出。
/div id 2进站,和这个时候的栈底div id 2成对, 于是取出。 最后栈里还剩一个,
于是不成对。

【在 g**********y 的大作中提到】
: 给一个HTML String, 怎么找
的结尾
?
: 直觉办法是:counter=1, search ", counter++, 如果是: div>, counter--, counter归零时就是结尾。
: 有更简单有效的办法吗?或者说用Java自己的library, 有更简洁的coding吗?

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。