问一个机器学习的问题 - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>JobHunting - 待字闺中

问一个机器学习的问题

问一个机器学习的问题# JobHunting - 待字闺中

w*o2015-04-27 07:04

1 楼

2tb 的内置硬盘有人收不？这东西不熟啊，不知道好不好卖啊！

l*r2015-04-27 07:04

2 楼

面试官说如果一个机器学习的model 过拟合了，要怎么改进，我想到了说 cross
validation再重新训练，感觉面试官很不满意，应该怎么改进呢？想不到其他方法了

c*72015-04-27 07:04

3 楼

no good

p*r2015-04-27 07:04

4 楼

如果你真会，会至少说出来regularization。

【在 l**********r 的大作中提到】

: 面试官说如果一个机器学习的model 过拟合了，要怎么改进，我想到了说 cross
: validation再重新训练，感觉面试官很不满意，应该怎么改进呢？想不到其他方法了

d*o2015-04-27 07:04

5 楼

google lowest price * 60%，就有人收

r*g2015-04-27 07:04

6 楼

降低模型复杂度
有证明说复杂度提高需要的sample volume是指数上涨的?忘了
我能回答的也就是cross validation了

【在 l**********r 的大作中提到】

c*72015-04-27 07:04

7 楼

自己用吧，备份备份资料，利润太小了

o*42015-04-27 07:04

8 楼

Split data 3 ways: train, eval, validation. Train & tune your model using
the validation set, and only use eval for evaluation, never tune parameters
on the eval set.

w*o2015-04-27 07:04

9 楼

出了六个，还有三个，谁要？

t*32015-04-27 07:04

10 楼

增加training数据或者regulerization调lambda

【在 l**********r 的大作中提到】

b*g2015-04-27 07:04

11 楼

BSO

【在 w****o 的大作中提到】

: 出了六个，还有三个，谁要？

m*02015-04-27 07:04

12 楼

regularization, bagging, early stopping, add more training data

l*i2015-04-27 07:04

13 楼

多少钱，要一个自用。。

w*n2015-04-27 07:04

14 楼

reduce the complexity of the current model.

【在 l**********r 的大作中提到】

c*52015-04-27 07:04

15 楼

多少米？

【在 w****o 的大作中提到】

: 出了六个，还有三个，谁要？

w*n2015-04-27 07:04

16 楼

try a simpler model.

【在 l**********r 的大作中提到】

w*n2015-04-27 07:04

17 楼

In practice, you can do anything about data when data is giving.
You need to reduce the complexity of your model to overcome overfitting.

【在 r*g 的大作中提到】

: 降低模型复杂度
: 有证明说复杂度提高需要的sample volume是指数上涨的?忘了
: 我能回答的也就是cross validation了

r*g2015-04-27 07:04

18 楼

我也全忘了刚搜了一下
E(R(fn) - R(f)*) < O(sqrt(Vf) * sqrt(log(n)/(n)))
fn是有n个data时候你设计的classifior
V(f)是这个classifior的复杂度
R(fn)是risk, R(f)*是f类classifior的最优risk
他的期望的bound是随着sqrt(Vf)的增加而增加的
这就是理论上为什么不能把classifior的复杂度设置的太高的原因
因为你无法控制error的上限

【在 w********n 的大作中提到】

: In practice, you can do anything about data when data is giving.
: You need to reduce the complexity of your model to overcome overfitting.

w*n2015-04-27 07:04

19 楼

My two cents:
Overfitting arises from two sources:
1. Training data is out-dated and can not represent future.
2. Model is trained too much on training data.
So either getting more new data or reducing model complexity to reduce
overfitting.

【在 l**********r 的大作中提到】

b*02015-04-27 07:04

20 楼

。。我这个半吊子都知道cross validation是用来判断有没有overfitting的
改进和他就没啥关系了吧。。
另外我这个半吊子能想到的就是对输入的feature vector用个pca之类的去噪去
correlation降维。。。

【在 l**********r 的大作中提到】

p*r2015-04-27 07:04

21 楼

前半部分是对的，但是PCA对overfitting没有用的，需要一些feature selection来降
低纬度

【在 b********0 的大作中提到】

: 。。我这个半吊子都知道cross validation是用来判断有没有overfitting的
: 改进和他就没啥关系了吧。。
: 另外我这个半吊子能想到的就是对输入的feature vector用个pca之类的去噪去
: correlation降维。。。

l*r2015-04-27 07:04

22 楼

面试官说是linear regression的模型，还能进一步简化模型么？感觉已经很简单了啊

w*n2015-04-27 07:04

23 楼

You can use lasso to choose feature

【在 l**********r 的大作中提到】

: 面试官说是linear regression的模型，还能进一步简化模型么？感觉已经很简单了啊

L*d2015-04-27 07:04

24 楼

regularization, feature selection ....

【在 l**********r 的大作中提到】

: 面试官说是linear regression的模型，还能进一步简化模型么？感觉已经很简单了啊

h*32015-04-27 07:04

25 楼

没有说出regularization的，基本上不大可能过。
cross-validation是估计accuracy的variance，但并不是解决overfit的办法。
cross-validation也不能让你知道是否overfitting。比如，这个P(Y|X)就是random产
生，任何一个model最好的accuracy就是0.5，给你再多的training数据都是这样，你怎
么通过cross-validation知道你的model是否overfitting呢？

【在 l**********r 的大作中提到】

t*32015-04-27 07:04

26 楼

cross validation画learning curve，如果cross validation error随着数据量的变大
和training error差距始终比较大就说明是overfitting了，反之如果在某个位置交叉
了就是underfitting

【在 h********3 的大作中提到】

: 没有说出regularization的，基本上不大可能过。
: cross-validation是估计accuracy的variance，但并不是解决overfit的办法。
: cross-validation也不能让你知道是否overfitting。比如，这个P(Y|X)就是random产
: 生，任何一个model最好的accuracy就是0.5，给你再多的training数据都是这样，你怎
: 么通过cross-validation知道你的model是否overfitting呢？

r*g2015-04-27 07:04

27 楼

看你这个语境那就是regularization了

【在 l**********r 的大作中提到】

: 面试官说是linear regression的模型，还能进一步简化模型么？感觉已经很简单了啊

l*m2015-04-27 07:04

28 楼

这面试官明显不懂ml

【在 l**********r 的大作中提到】

l*r2015-04-27 07:04

29 楼

请问为什么这么说？

【在 l*******m 的大作中提到】

: 这面试官明显不懂ml

m*s2015-04-27 07:04

30 楼

为了装B

【在 l**********r 的大作中提到】

: 请问为什么这么说？