DS in my understanding# DataSciences - 数据科学
w*2
1 楼
Not sure if this is right (I did similar thing for some biomedical projects):
You are facing a chunk of data without any previous knowledge from
literature or any other means, then ask yourself:
1. Is there any interesting question in this data set?
2. If so, how many groups/types can be formed?
3. If so, is there any difference among these groups?
4. If so, what is the difference?
5. If so, what are the differentiators?
6. If so, identify all of the major differentiators?
7. If so, build a model or index combining all of the differentiators and
predict the training data set. Do not over fit. Validate with another
independent data set. Apply the model to new data sets for prediction.
8. Reiterate to optimize the model.
9. Interpret the results according to different backgrounds/professions.
Is this logic/workflow right for DS? Thanks.
You are facing a chunk of data without any previous knowledge from
literature or any other means, then ask yourself:
1. Is there any interesting question in this data set?
2. If so, how many groups/types can be formed?
3. If so, is there any difference among these groups?
4. If so, what is the difference?
5. If so, what are the differentiators?
6. If so, identify all of the major differentiators?
7. If so, build a model or index combining all of the differentiators and
predict the training data set. Do not over fit. Validate with another
independent data set. Apply the model to new data sets for prediction.
8. Reiterate to optimize the model.
9. Interpret the results according to different backgrounds/professions.
Is this logic/workflow right for DS? Thanks.