M*t
2 楼
【 以下文字转载自 EE 讨论区 】
发信人: Mrabbit (Zoolander), 信区: EE
标 题: Pattern recognition problem
发信站: BBS 未名空间站 (Wed Jul 30 15:44:16 2008), 转信
I used Euclidean distance to measure the similarity between two feature vect
ors. The feature vector includes continuous variables with values like 100,
and discrete variables 1/0. So, I guess when comparing the similarity of two
vectors, I need to scale the values in the feature vector...Am I correct? I
f yes, how to scale it?
发信人: Mrabbit (Zoolander), 信区: EE
标 题: Pattern recognition problem
发信站: BBS 未名空间站 (Wed Jul 30 15:44:16 2008), 转信
I used Euclidean distance to measure the similarity between two feature vect
ors. The feature vector includes continuous variables with values like 100,
and discrete variables 1/0. So, I guess when comparing the similarity of two
vectors, I need to scale the values in the feature vector...Am I correct? I
f yes, how to scale it?
y*o
3 楼
我现在用的是LINKSYS WRT54G。
我想要一个可以设置成屏蔽除了X, Y, Z网站的所有网站的路由器。
多数路由器只能设置成
1. 屏蔽网址1, 网址2,网址3,其余所有通过。 ---不是我要的。
2. 屏蔽所有网站。---- 不是我要的。
我要的是,只允许访问我开出的清单中的网站。
你想访问 games.com? 对不起,不在我的列表,不予访问。youtube? 不在列表,访问
不了。Wikipedia? 在我的列表里面,可以放行。
我的研究是好像只有DLINK DIR 625有这个功能,但是以前用过DLINK, 非常不稳定,很
不喜欢。
我只要router level的控制,编辑hosts, 或者微软的family security就免了。
请推荐,谢谢。
我想要一个可以设置成屏蔽除了X, Y, Z网站的所有网站的路由器。
多数路由器只能设置成
1. 屏蔽网址1, 网址2,网址3,其余所有通过。 ---不是我要的。
2. 屏蔽所有网站。---- 不是我要的。
我要的是,只允许访问我开出的清单中的网站。
你想访问 games.com? 对不起,不在我的列表,不予访问。youtube? 不在列表,访问
不了。Wikipedia? 在我的列表里面,可以放行。
我的研究是好像只有DLINK DIR 625有这个功能,但是以前用过DLINK, 非常不稳定,很
不喜欢。
我只要router level的控制,编辑hosts, 或者微软的family security就免了。
请推荐,谢谢。
s*e
4 楼
you can scale all features to [0,1]. e.g., for each feature,
divide by the maximum value of that feature.
Or you can scale it by substracting mean from each feature and
dividing by the covariance of that feature.
vect
,
two
I
【在 M*****t 的大作中提到】
: 【 以下文字转载自 EE 讨论区 】
: 发信人: Mrabbit (Zoolander), 信区: EE
: 标 题: Pattern recognition problem
: 发信站: BBS 未名空间站 (Wed Jul 30 15:44:16 2008), 转信
: I used Euclidean distance to measure the similarity between two feature vect
: ors. The feature vector includes continuous variables with values like 100,
: and discrete variables 1/0. So, I guess when comparing the similarity of two
: vectors, I need to scale the values in the feature vector...Am I correct? I
: f yes, how to scale it?
divide by the maximum value of that feature.
Or you can scale it by substracting mean from each feature and
dividing by the covariance of that feature.
vect
,
two
I
【在 M*****t 的大作中提到】
: 【 以下文字转载自 EE 讨论区 】
: 发信人: Mrabbit (Zoolander), 信区: EE
: 标 题: Pattern recognition problem
: 发信站: BBS 未名空间站 (Wed Jul 30 15:44:16 2008), 转信
: I used Euclidean distance to measure the similarity between two feature vect
: ors. The feature vector includes continuous variables with values like 100,
: and discrete variables 1/0. So, I guess when comparing the similarity of two
: vectors, I need to scale the values in the feature vector...Am I correct? I
: f yes, how to scale it?
e*i
5 楼
http://www.dd-wrt.com/wiki/index.php/Blocking_URLs/IPs
【在 y********o 的大作中提到】
: 我现在用的是LINKSYS WRT54G。
: 我想要一个可以设置成屏蔽除了X, Y, Z网站的所有网站的路由器。
: 多数路由器只能设置成
: 1. 屏蔽网址1, 网址2,网址3,其余所有通过。 ---不是我要的。
: 2. 屏蔽所有网站。---- 不是我要的。
: 我要的是,只允许访问我开出的清单中的网站。
: 你想访问 games.com? 对不起,不在我的列表,不予访问。youtube? 不在列表,访问
: 不了。Wikipedia? 在我的列表里面,可以放行。
: 我的研究是好像只有DLINK DIR 625有这个功能,但是以前用过DLINK, 非常不稳定,很
: 不喜欢。
【在 y********o 的大作中提到】
: 我现在用的是LINKSYS WRT54G。
: 我想要一个可以设置成屏蔽除了X, Y, Z网站的所有网站的路由器。
: 多数路由器只能设置成
: 1. 屏蔽网址1, 网址2,网址3,其余所有通过。 ---不是我要的。
: 2. 屏蔽所有网站。---- 不是我要的。
: 我要的是,只允许访问我开出的清单中的网站。
: 你想访问 games.com? 对不起,不在我的列表,不予访问。youtube? 不在列表,访问
: 不了。Wikipedia? 在我的列表里面,可以放行。
: 我的研究是好像只有DLINK DIR 625有这个功能,但是以前用过DLINK, 非常不稳定,很
: 不喜欢。
y*o
7 楼
谢谢你,这个有点复杂,好像是要更新路由器之firmware, 没有做过。
【在 e*i 的大作中提到】
: http://www.dd-wrt.com/wiki/index.php/Blocking_URLs/IPs
【在 e*i 的大作中提到】
: http://www.dd-wrt.com/wiki/index.php/Blocking_URLs/IPs
j*n
8 楼
我觉着吧, 没啥办法。你像上面 那么 normalize, 相当于 把每个
feature 的 权值 搞成一样的了, 但是 feature 的 权值 一般可不一样
feature 的 权值 搞成一样的了, 但是 feature 的 权值 一般可不一样
d*e
12 楼
I think if you do classification, CART may be better in such case.
Euclidean distance is usually used when the samples are assumed in a cetrain
continuous distribution.
vect
,
two
I
【在 M*****t 的大作中提到】
: 【 以下文字转载自 EE 讨论区 】
: 发信人: Mrabbit (Zoolander), 信区: EE
: 标 题: Pattern recognition problem
: 发信站: BBS 未名空间站 (Wed Jul 30 15:44:16 2008), 转信
: I used Euclidean distance to measure the similarity between two feature vect
: ors. The feature vector includes continuous variables with values like 100,
: and discrete variables 1/0. So, I guess when comparing the similarity of two
: vectors, I need to scale the values in the feature vector...Am I correct? I
: f yes, how to scale it?
Euclidean distance is usually used when the samples are assumed in a cetrain
continuous distribution.
vect
,
two
I
【在 M*****t 的大作中提到】
: 【 以下文字转载自 EE 讨论区 】
: 发信人: Mrabbit (Zoolander), 信区: EE
: 标 题: Pattern recognition problem
: 发信站: BBS 未名空间站 (Wed Jul 30 15:44:16 2008), 转信
: I used Euclidean distance to measure the similarity between two feature vect
: ors. The feature vector includes continuous variables with values like 100,
: and discrete variables 1/0. So, I guess when comparing the similarity of two
: vectors, I need to scale the values in the feature vector...Am I correct? I
: f yes, how to scale it?
M*t
13 楼
Sorry, what is CART? Is it also a metric to measure the similarity between t
wo feature vectors? I do use it for classification purpose.
cetrain
【在 d******e 的大作中提到】
: I think if you do classification, CART may be better in such case.
: Euclidean distance is usually used when the samples are assumed in a cetrain
: continuous distribution.
:
: vect
: ,
: two
: I
wo feature vectors? I do use it for classification purpose.
cetrain
【在 d******e 的大作中提到】
: I think if you do classification, CART may be better in such case.
: Euclidean distance is usually used when the samples are assumed in a cetrain
: continuous distribution.
:
: vect
: ,
: two
: I
p*r
15 楼
I think what drbunie mean is that CART builds its decision on each feature
independently. Because each feature is selected as a node to make "left" and
"right" decisions individually, we no longer need to worry about the
scaling in Euclidean space.
Based on the above reasoning, Naive Bayes should also do the job. Any
comments?
t
【在 M*****t 的大作中提到】
: Sorry, what is CART? Is it also a metric to measure the similarity between t
: wo feature vectors? I do use it for classification purpose.
:
: cetrain
independently. Because each feature is selected as a node to make "left" and
"right" decisions individually, we no longer need to worry about the
scaling in Euclidean space.
Based on the above reasoning, Naive Bayes should also do the job. Any
comments?
t
【在 M*****t 的大作中提到】
: Sorry, what is CART? Is it also a metric to measure the similarity between t
: wo feature vectors? I do use it for classification purpose.
:
: cetrain
H*S
16 楼
In my opinion, DT and NBC are all weak learners. If a much stronger learner
is required to accomplish the job, SVM or Boosting Algorithm are suggested.
Further based on your class example distribution, some imbalanced mining
tricks can be applied to improve the overall performance. That's my 2 cents
is required to accomplish the job, SVM or Boosting Algorithm are suggested.
Further based on your class example distribution, some imbalanced mining
tricks can be applied to improve the overall performance. That's my 2 cents
M*t
17 楼
For imbalanced dataset (say, 1%:99% distribution), which classification algo
rithms can be used to achieve good accuracy?
learner
.
cents
【在 H****S 的大作中提到】
: In my opinion, DT and NBC are all weak learners. If a much stronger learner
: is required to accomplish the job, SVM or Boosting Algorithm are suggested.
: Further based on your class example distribution, some imbalanced mining
: tricks can be applied to improve the overall performance. That's my 2 cents
rithms can be used to achieve good accuracy?
learner
.
cents
【在 H****S 的大作中提到】
: In my opinion, DT and NBC are all weak learners. If a much stronger learner
: is required to accomplish the job, SVM or Boosting Algorithm are suggested.
: Further based on your class example distribution, some imbalanced mining
: tricks can be applied to improve the overall performance. That's my 2 cents
s*e
18 楼
1% to 99% is way too imbalanced...
usually for imbalanced data set, you can do
sampling, e.g., over-sampling from the minor
class or under-samplin gfrom the major class.
algo
【在 M*****t 的大作中提到】
: For imbalanced dataset (say, 1%:99% distribution), which classification algo
: rithms can be used to achieve good accuracy?
:
: learner
: .
: cents
usually for imbalanced data set, you can do
sampling, e.g., over-sampling from the minor
class or under-samplin gfrom the major class.
algo
【在 M*****t 的大作中提到】
: For imbalanced dataset (say, 1%:99% distribution), which classification algo
: rithms can be used to achieve good accuracy?
:
: learner
: .
: cents
相关阅读
有没有可以把无线信号转换成有线信号的?wireless router的问题Need help: internet doesn't work为什么网上有的realplayer文件下载不下来呢?重装系统后的上网问题HELP (转载)请教一个小问题,关于登陆局域网内其他电脑(windowsXP) (转载)IE 7我两天电脑的sina主页怎么不一样,一台变成新闻中心的主页了,怎么能改回去?换了新电脑,如何使用cable上网?有人用Vonge internet phone service吗?请问有什么好地BT加速器?Gmail里面为什么不能显示图片?如何通过cable modem连接代理服务器How to remove IE6 completely牛人给看看IPv6远水不解近渴 IPv4仍需挑大梁可以连国内的网站吗?国内的Intenet骨干网太脆弱了吧请教,登陆网上电子信箱,password会存在本地机器的哪里?中美海底光纜修復