问个database design的问题 - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>Database - 数据库

问个database design的问题

问个database design的问题# Database - 数据库

F*e2007-06-14 07:06

1 楼

用的是MySQL. 要用来记录很多user的timestamp based的数据。做query的时候都是针
对单独的一个user. 每个记录象这样：
user_i timestamp1 value1
user_j timestamp2 value2
user_i timestamp3 value3
user_i timestamp3 value4
有三种选择，第一种就是用一个table来放所有数据，象上面一样。这样的坏处就是每
次要scan很多不相关user的record.
选择二：每个user用一个单独的table, 当然相同的schema.
选择三：每个user用一个单独的database.
二，三的好处就是query会快一点。但挺怪的。DB的高手们指教一下。先谢了。

n*c2007-06-14 07:06

2 楼

首先，你这数据库主要是用来干嘛的？估计得多大？是输入数据的时候多还是做报告的
时候多？server的configurations怎么样？
对于你描述的情况，比较常见的做法是，如果你是用来记录数据为主要目的，也就是所
谓的transactional database, 你首先要normalize, 也就是说，把user data放在一个
单独的table中，然后把加一个transactional table就够了。不管怎么样，你上面列的
3种方法都不专业，不可取。
举个简单的例子，你可以有两个table: Employee, Attandence
Employee (Emp_ID, Last_Name, First_Name, Emp_Status, Last_Updated)
Attandence(Emp_ID,Time_Stamp, Value, Date,...)
因为不了解你的具体要求，只能提这么点简单的建议。希望有点用。

F*e2007-06-14 07:06

3 楼

非常感谢你的回复.
不专业是肯定的啦.呵呵.
举个例子吧:用DB来存股票价格历史. 每个记录包括股票代号,时间,价格:
SYMB1 timestamp1 price1
SYMB2 timestamp2 price2
SYMB3 timestamp3 price3
SYMB1 timestamp3 price4
...
imagine 1000 symbols, each symbol(stock) has 1million records.
做报告的时候比输入数据多.但每次输入tends to be in bulk volume.
这样,如果把所有记录放在一个table中,要严究一个股票(symbol),就要scan
1000million records.如果每个股票一个table, 那就只要scan 1million records.
谢谢谢谢

【在 n****c 的大作中提到】

: 首先，你这数据库主要是用来干嘛的？估计得多大？是输入数据的时候多还是做报告的
: 时候多？server的configurations怎么样？
: 对于你描述的情况，比较常见的做法是，如果你是用来记录数据为主要目的，也就是所
: 谓的transactional database, 你首先要normalize, 也就是说，把user data放在一个
: 单独的table中，然后把加一个transactional table就够了。不管怎么样，你上面列的
: 3种方法都不专业，不可取。
: 举个简单的例子，你可以有两个table: Employee, Attandence
: Employee (Emp_ID, Last_Name, First_Name, Emp_Status, Last_Updated)
: Attandence(Emp_ID,Time_Stamp, Value, Date,...)
: 因为不了解你的具体要求，只能提这么点简单的建议。希望有点用。

n*c2007-06-14 07:06

4 楼

You still didn't answer my question in the first paragraph. So I assume that
your database is a transactional database and would make suggestions based
on this assumption.
Based on your additional info, you may want to add at least another table,
let's call it STOCK.
STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)
With another two tables shown below:
TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)
STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE, LAST
_

w*e2007-06-14 07:06

5 楼

zan!说得挺清楚的

that
based
LAST

【在 n****c 的大作中提到】

: You still didn't answer my question in the first paragraph. So I assume that
: your database is a transactional database and would make suggestions based
: on this assumption.
: Based on your additional info, you may want to add at least another table,
: let's call it STOCK.
: STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)
: With another two tables shown below:
: TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)
: STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE, LAST
: _

F*e2007-06-14 07:06

6 楼

Really appreciate it!
我没有糖,但你也不用用牙膏皮来换了,送你5个包子略表谢意.呵呵.
OK. back to the question.
Here, assume I'm doing research on stock price history. I have tick data for
1000 stocks, each with 1 million ticks, in separate text(.csv) files, one
file per stock.
I would like to import the data (csv files) into database and run queries on
the DB, and I only run query on one stock at a time. This is not really a
transactional database, if I understand what it is -- the import part
happens very infrequently, whil

【在 n****c 的大作中提到】

x*e2007-06-14 07:06

7 楼

one additional recommendation.
if you have 1 billion records for short period, you might have to do table
partitioning, depends on your DB.

that
based
LAST

【在 n****c 的大作中提到】

F*e2007-06-14 07:06

8 楼

thanks. I don't have 1 billion records, just several millions or even less.

【在 x***e 的大作中提到】

: one additional recommendation.
: if you have 1 billion records for short period, you might have to do table
: partitioning, depends on your DB.
:
: that
: based
: LAST

q*12007-06-14 07:06

9 楼

我看你应该先看看这些table columns有没有normalization的空间再决定
怎么分表，我估计你这情况用 star schema database model 最简单实用，
或者根本就一个表就ok。。
后两个一看就不行，如果你有很多用户，怎么解决scability的问题。。

【在 F**e 的大作中提到】

: 用的是MySQL. 要用来记录很多user的timestamp based的数据。做query的时候都是针
: 对单独的一个user. 每个记录象这样：
: user_i timestamp1 value1
: user_j timestamp2 value2
: user_i timestamp3 value3
: user_i timestamp3 value4
: 有三种选择，第一种就是用一个table来放所有数据，象上面一样。这样的坏处就是每
: 次要scan很多不相关user的record.
: 选择二：每个user用一个单独的table, 当然相同的schema.
: 选择三：每个user用一个单独的database.

f*t2007-06-14 07:06

10 楼

How about define index on userid?

n*c2007-06-14 07:06

11 楼

I believe table partition will be a good option for your data. Also, create
an index on stock_id AFTER table is partitioned.1000 stocks for a multi-
million row table is still considered low cardinality and an index will
improve the performance.

for
on

【在 F**e 的大作中提到】

: Really appreciate it!
: 我没有糖,但你也不用用牙膏皮来换了,送你5个包子略表谢意.呵呵.
: OK. back to the question.
: Here, assume I'm doing research on stock price history. I have tick data for
: 1000 stocks, each with 1 million ticks, in separate text(.csv) files, one
: file per stock.
: I would like to import the data (csv files) into database and run queries on
: the DB, and I only run query on one stock at a time. This is not really a
: transactional database, if I understand what it is -- the import part
: happens very infrequently, whil