c*1
2 楼
刚拿到的一个Project:11 features,200 observations.The response variable (
ordinal and categorical) takes on only three possible values. The goal is to
learn if there are some common characteristics which help predict the
classification of the response variable.
I apply filter-based feature selection first: I run pairwise statistic tests
for all combinations of response and predictors. I select those significant
features, run VIF tests to get rid of multicollinearity, and fit the data
to an ordered logistic regression model (with significant features).
Unfortunately it turns out almost all features are insignificant (p_value>0.
05). Hence the ordered logistic regression might not be a good choice.
因为最终目的是要找出对response有显著影响的feature并且最好能够给出magnitude
of impact,所以random forest和SVM都不适合。我正在考虑用decision tree。不知道
版上各位大牛有没有更好的建议?
ordinal and categorical) takes on only three possible values. The goal is to
learn if there are some common characteristics which help predict the
classification of the response variable.
I apply filter-based feature selection first: I run pairwise statistic tests
for all combinations of response and predictors. I select those significant
features, run VIF tests to get rid of multicollinearity, and fit the data
to an ordered logistic regression model (with significant features).
Unfortunately it turns out almost all features are insignificant (p_value>0.
05). Hence the ordered logistic regression might not be a good choice.
因为最终目的是要找出对response有显著影响的feature并且最好能够给出magnitude
of impact,所以random forest和SVM都不适合。我正在考虑用decision tree。不知道
版上各位大牛有没有更好的建议?
E*e
3 楼
3个response level,可以用multinomial logistic regresion, following standard
model variable selections。
to
tests
significant
0.
【在 c********1 的大作中提到】
: 刚拿到的一个Project:11 features,200 observations.The response variable (
: ordinal and categorical) takes on only three possible values. The goal is to
: learn if there are some common characteristics which help predict the
: classification of the response variable.
: I apply filter-based feature selection first: I run pairwise statistic tests
: for all combinations of response and predictors. I select those significant
: features, run VIF tests to get rid of multicollinearity, and fit the data
: to an ordered logistic regression model (with significant features).
: Unfortunately it turns out almost all features are insignificant (p_value>0.
: 05). Hence the ordered logistic regression might not be a good choice.
model variable selections。
to
tests
significant
0.
【在 c********1 的大作中提到】
: 刚拿到的一个Project:11 features,200 observations.The response variable (
: ordinal and categorical) takes on only three possible values. The goal is to
: learn if there are some common characteristics which help predict the
: classification of the response variable.
: I apply filter-based feature selection first: I run pairwise statistic tests
: for all combinations of response and predictors. I select those significant
: features, run VIF tests to get rid of multicollinearity, and fit the data
: to an ordered logistic regression model (with significant features).
: Unfortunately it turns out almost all features are insignificant (p_value>0.
: 05). Hence the ordered logistic regression might not be a good choice.
E*g
4 楼
randomforest为啥不行?
randomforest -> important features
输出每个feature的影响概率,而不是category
to
tests
significant
0.
【在 c********1 的大作中提到】
: 刚拿到的一个Project:11 features,200 observations.The response variable (
: ordinal and categorical) takes on only three possible values. The goal is to
: learn if there are some common characteristics which help predict the
: classification of the response variable.
: I apply filter-based feature selection first: I run pairwise statistic tests
: for all combinations of response and predictors. I select those significant
: features, run VIF tests to get rid of multicollinearity, and fit the data
: to an ordered logistic regression model (with significant features).
: Unfortunately it turns out almost all features are insignificant (p_value>0.
: 05). Hence the ordered logistic regression might not be a good choice.
randomforest -> important features
输出每个feature的影响概率,而不是category
to
tests
significant
0.
【在 c********1 的大作中提到】
: 刚拿到的一个Project:11 features,200 observations.The response variable (
: ordinal and categorical) takes on only three possible values. The goal is to
: learn if there are some common characteristics which help predict the
: classification of the response variable.
: I apply filter-based feature selection first: I run pairwise statistic tests
: for all combinations of response and predictors. I select those significant
: features, run VIF tests to get rid of multicollinearity, and fit the data
: to an ordered logistic regression model (with significant features).
: Unfortunately it turns out almost all features are insignificant (p_value>0.
: 05). Hence the ordered logistic regression might not be a good choice.
c*1
5 楼
Feature importance我也考虑过。因为project的client是基本不懂统计和ML的,
feature importance的output很难向他们解释清楚:只能笼统地说哪几个feature重要
,有多重要还真的很难解释。不像Linear regression可以用one unit change in
independent variables lead to how much change in the dependent variables, 直
观易懂。
而且我用的是R,愣是没看懂help document里对importance的output的解释,但能肯定
不是影响概率。
【在 E*********g 的大作中提到】
: randomforest为啥不行?
: randomforest -> important features
: 输出每个feature的影响概率,而不是category
:
: to
: tests
: significant
: 0.
feature importance的output很难向他们解释清楚:只能笼统地说哪几个feature重要
,有多重要还真的很难解释。不像Linear regression可以用one unit change in
independent variables lead to how much change in the dependent variables, 直
观易懂。
而且我用的是R,愣是没看懂help document里对importance的output的解释,但能肯定
不是影响概率。
【在 E*********g 的大作中提到】
: randomforest为啥不行?
: randomforest -> important features
: 输出每个feature的影响概率,而不是category
:
: to
: tests
: significant
: 0.
m*u
6 楼
lasso ?
e*9
9 楼
Lasso可以用到Logistic regression上。。。
E*e
10 楼
oh,yes. just learned regularized logistic regression.
:Lasso可以用到Logistic regression上。。。
:Lasso可以用到Logistic regression上。。。
h*d
11 楼
to
tests
significant
0.
【在 c********1 的大作中提到】
: 刚拿到的一个Project:11 features,200 observations.The response variable (
: ordinal and categorical) takes on only three possible values. The goal is to
: learn if there are some common characteristics which help predict the
: classification of the response variable.
: I apply filter-based feature selection first: I run pairwise statistic tests
: for all combinations of response and predictors. I select those significant
: features, run VIF tests to get rid of multicollinearity, and fit the data
: to an ordered logistic regression model (with significant features).
: Unfortunately it turns out almost all features are insignificant (p_value>0.
: 05). Hence the ordered logistic regression might not be a good choice.
相关阅读
现在ds做哪个方向比较好?计算数学硕士能去找data science的工作吗?急需healthcare方面的数据分析人员讲讲:这个行业的outcomes主要有哪几方面?谢谢!!做data analytics SQL语句需要多少功力?Amazon data science请教insight health data多久出结果啊bioinformatics转行data science的可行性分享一个转data scientist失败的经历请推荐好的hadoop/spark 的 课程请问deep learning方向工作机会多么?【技术讲座】SSRS2016 New featuresMIT 6.036Data scientist非科班出身以后好跳槽么how to reduce 10 data points to 8 proportionally免费提供数据分析方面的职业咨询和简历review (转载)请教insight health data的面试期刊征稿(SCI-Index, 2017-05-31) "Remote Sensing Big Data: Theory, Methods and Applications"OMSA好申请吗?【新人】版里有工程的转数据的么?Georgia Tech Analytics program