Redian新闻
>
ETL process in JAVA. --有建议的请回这个贴。
avatar
ETL process in JAVA. --有建议的请回这个贴。# Java - 爪哇娇娃
B*g
1
感谢wyr, goodbug和唐僧, 每人都有包子。
新开一个贴子,原因是技术不拘泥spring, 但还是以java为主,其实就是only java。
不许用M$,不许用很多DB source。
要求如下:
1. ETL process loading data.
2. Source format storing in DB. (may be also target)
3. Clients can revise their source format through interface. The ETL process
does not need any revise to process the source file in new format.
4. Auto handle performance, like use multi CPU without complicate code.
5. Not too expensive.
avatar
g*g
2
You can write an XML schema (XSD) that validates rules.
Below would be a rule to map character 1-8 to cl1.

characters
1
8

You can write an web interface for easier editing of this XML for
client. Considering you will have only a few rules, this shouldn't
be very difficult. Now all it takes it an XML parser to do the actual
mapping.


process

【在 B*****g 的大作中提到】
: 感谢wyr, goodbug和唐僧, 每人都有包子。
: 新开一个贴子,原因是技术不拘泥spring, 但还是以java为主,其实就是only java。
: 不许用M$,不许用很多DB source。
: 要求如下:
: 1. ETL process loading data.
: 2. Source format storing in DB. (may be also target)
: 3. Clients can revise their source format through interface. The ETL process
: does not need any revise to process the source file in new format.
: 4. Auto handle performance, like use multi CPU without complicate code.
: 5. Not too expensive.

avatar
B*g
3
I am interested in "actual mapping" part. thanks

【在 g*****g 的大作中提到】
: You can write an XML schema (XSD) that validates rules.
: Below would be a rule to map character 1-8 to cl1.
:
: characters
: 1
: 8
:

: You can write an web interface for easier editing of this XML for
: client. Considering you will have only a few rules, this shouldn't
: be very difficult. Now all it takes it an XML parser to do the actual

avatar
c*t
4
其实你这个说难不难,说易不易。复杂点话,很大程度上,你这个问题类似
根据 DDL,然后 auto-generate marshalling/serializing/converting program 。
这个难度较大。另外一种是 interpretative 方法,也就是想办法设计你的
DDL 可以在 parse 该 DDL 的时候,chain 几个小的 converter (比如
run-length, delimited etc)一起。这个稍微容易一些。比如 DDL 是




对应的就是
parseRunLength (file, 10);
parseDelimited (",", true);
这个稍微容易一些。如果你没有写 compiler 的经验,可以考虑把这些
runlength / delimited 当作 java object (和对应

【在 B*****g 的大作中提到】
: 感谢wyr, goodbug和唐僧, 每人都有包子。
: 新开一个贴子,原因是技术不拘泥spring, 但还是以java为主,其实就是only java。
: 不许用M$,不许用很多DB source。
: 要求如下:
: 1. ETL process loading data.
: 2. Source format storing in DB. (may be also target)
: 3. Clients can revise their source format through interface. The ETL process
: does not need any revise to process the source file in new format.
: 4. Auto handle performance, like use multi CPU without complicate code.
: 5. Not too expensive.

avatar
g*g
5
I don't think they need many rules, or flexibile hierachy.
It would be more like parsing a hibernate hbm file, which
shouldn't be very difficult.
On presentation side, it can be easier than you thought too.
Maybe show a few examples and ask clients to edit raw xml
would be enough.

【在 c*****t 的大作中提到】
: 其实你这个说难不难,说易不易。复杂点话,很大程度上,你这个问题类似
: 根据 DDL,然后 auto-generate marshalling/serializing/converting program 。
: 这个难度较大。另外一种是 interpretative 方法,也就是想办法设计你的
: DDL 可以在 parse 该 DDL 的时候,chain 几个小的 converter (比如
: run-length, delimited etc)一起。这个稍微容易一些。比如 DDL 是
:
:
:
:

: 对应的就是

avatar
B*g
6
I only have problem with the parsing part. I want to do it simple and with
good performance.

【在 g*****g 的大作中提到】
: I don't think they need many rules, or flexibile hierachy.
: It would be more like parsing a hibernate hbm file, which
: shouldn't be very difficult.
: On presentation side, it can be easier than you thought too.
: Maybe show a few examples and ask clients to edit raw xml
: would be enough.

avatar
a*i
7
不太明白这些要求,JAXB行不行?


process

【在 B*****g 的大作中提到】
: 感谢wyr, goodbug和唐僧, 每人都有包子。
: 新开一个贴子,原因是技术不拘泥spring, 但还是以java为主,其实就是only java。
: 不许用M$,不许用很多DB source。
: 要求如下:
: 1. ETL process loading data.
: 2. Source format storing in DB. (may be also target)
: 3. Clients can revise their source format through interface. The ETL process
: does not need any revise to process the source file in new format.
: 4. Auto handle performance, like use multi CPU without complicate code.
: 5. Not too expensive.

avatar
c*t
8
最简单的办法我已经说了,你只要弄几个 class,对应 runlength scanning,
delimit scanning 等(这两个你该会吧,不会就没辙了)。然后就用 Spring
的 xml 做 record format 弄一 container,里面就是这几个 class object
和具体的 setting (比如 runlength 的长度等)。这样当用户给你个 xml,
你通过 Spring 读进该 xml,得到的是一个 list 的 scan action 。然后你
loop 这个 list 里的 action 不停的 parse record 就行了。
其它通过 Xml schema 设计的道理一样。自己 parse xml 说实在也很容易。

【在 B*****g 的大作中提到】
: I only have problem with the parsing part. I want to do it simple and with
: good performance.

avatar
B*g
9
不明白,这里class怎么用?

【在 c*****t 的大作中提到】
: 最简单的办法我已经说了,你只要弄几个 class,对应 runlength scanning,
: delimit scanning 等(这两个你该会吧,不会就没辙了)。然后就用 Spring
: 的 xml 做 record format 弄一 container,里面就是这几个 class object
: 和具体的 setting (比如 runlength 的长度等)。这样当用户给你个 xml,
: 你通过 Spring 读进该 xml,得到的是一个 list 的 scan action 。然后你
: loop 这个 list 里的 action 不停的 parse record 就行了。
: 其它通过 Xml schema 设计的道理一样。自己 parse xml 说实在也很容易。

avatar
g*g
10
Parsing can be pretty simple

1
8
test

you then construct a Rule1 class with all these parameters and
construct a rule which can be used for transforming data.
It's always easier to build something that's functioning and
add pieces to it and try to design every detail before hand.

【在 B*****g 的大作中提到】
: 不明白,这里class怎么用?
avatar
n*w
11
好像有open source现成的。忘了名字。
avatar
B*g
12
明白了。不过这样parse会不会慢?

【在 g*****g 的大作中提到】
: Parsing can be pretty simple
:
: 1
: 8
: test
:

: you then construct a Rule1 class with all these parameters and
: construct a rule which can be used for transforming data.
: It's always easier to build something that's functioning and
: add pieces to it and try to design every detail before hand.

avatar
B*g
13
使劲想

【在 n*w 的大作中提到】
: 好像有open source现成的。忘了名字。
avatar
g*g
14
How slow could it be for one time rules loading?
The bottleneck is always applying the rules on the data.

【在 B*****g 的大作中提到】
: 明白了。不过这样parse会不会慢?
avatar
j*z
15
google "jasper bi" or jasper etl
avatar
B*g
16
建议交上去了,上面说暂时先不做。
kao,说搞java都1年多了,现在连一个java developer都没有。


process

【在 B*****g 的大作中提到】
: 感谢wyr, goodbug和唐僧, 每人都有包子。
: 新开一个贴子,原因是技术不拘泥spring, 但还是以java为主,其实就是only java。
: 不许用M$,不许用很多DB source。
: 要求如下:
: 1. ETL process loading data.
: 2. Source format storing in DB. (may be also target)
: 3. Clients can revise their source format through interface. The ETL process
: does not need any revise to process the source file in new format.
: 4. Auto handle performance, like use multi CPU without complicate code.
: 5. Not too expensive.

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。