Redian新闻
>
generating percentile-percentage charts
avatar
generating percentile-percentage charts# DataSciences - 数据科学
p*i
1
央妈没有令人失望, 果然降准了, 这样1.5万亿-2万亿的流动性缺口就算补上了, 大家都可以高高兴兴地过年了. 至于年后的事情, 鬼才愿意想呢, 能高兴一会儿是一会儿, 大家说对嘛. 至于以后还有多大规模的放水, 这个也得慢慢来, 反正我脚的吧, 这么大家业, 总得折腾一阵子吧. 要不然, 这还能叫中国吗?
股市是必涨无疑, 但大家不能抱太大的希望. 楼市呢, 暂时肯定也得涨, 但涨过以后就不好说了, 我想真正的聪明人一定会把握住这个时机抓紧出手, 而不是在那里跟着别人瞎涨价, 给人家托市. 至于人民币嘛, 希望能跌下去, 要不然这么多资本出逃, 中国也要被掏空了呢. 总之, 一切都符合预期. 就是辛苦了人民日报, 在后头瞎叫唤了半天, 什么中国就是不放水, 让人看了笑掉大牙, 中国的宣传还是这么低的水平.。
必须指出, 即使放了水, 对于中国而言, 也是饮鸩止渴. 只能暂时延缓危机的到来. 说到底要找到问题究竟出在了什么地方. 还是那句话, 泡沫的问题不能是拖一日是一日, 得过且过. 必须下狠心来解决. 要不然, 总是这么被动, 不知道哪天就过去了。
avatar
c*z
2
Spent some time generating this kind of charts from raw data. There might be
better ways of doing so, but I would just post my method and 抛砖引玉。
Raw table has three columns: clinic | age | count, which records the age of
patients, rather, how many of each age category.
Target table has three columns: clinic | age_percentile | count_percentage,
which records the percentage of patients in each age category, with the
categories in percentiles form (e.g. if there are only two age categories,
then the percentiles would be 50 and 100).
Here is the R code (I knew Scala code must be simpler but my company is not
using it)
# order by clinic and age
visits order(clinic, age)), ]
# percentiles of age
percentiles list(visits$clinic),
function(x) trunc(rank(x)/length(x) * 100),
simplify = T)
# percentages of count
percentages list(visits$clinic),
function(x) x / sum(x),
simplify = T)
# put them together
patient_percentiles percentiles,
percentages)

patient_percentiles
# unpack list elements
patient_percentiles cbind(melt(percentiles),
melt(percentages)))
# clean up
patient_percentiles colnames(patient_percentiles) percentages")
avatar
f*8
3
能不能贴点数据?不是太清楚
Raw table has three columns: clinic | age | count, which records the age of
patients, rather, how many of each age category.
的意思。
感觉用dplyr可能会简洁一些?
avatar
c*z
4
sorry, here is an example
clinic | age | count
A | 12 | 3
A | 18 | 2
B | 22 | 4
B | 40 | 2
就是说A家有3位12岁的病人,2位18岁的病人;B家有4位22岁的病人,2位40岁的病人。
谢谢回复,我去看看dplyr
avatar
c*z
5
sorry 忘了一步
# add up for each percentile
patient_percentiles_fin ~ clinic + age_percentiles,
FUN = sum,
data = patient_percentiles)
avatar
c*z
6
老板又有新花样,这次要cumulative的percentages
patient_percentiles_cum colnames(patient_percentiles_cum)[2] for (k in 1:100) {
# k
temp
top 1,
FUN = sum)
top
patient_percentiles_cum top)

colnames(patient_percentiles_cum)[2+k] k,
sep = ".")
}
avatar
f*8
9
合成的数据:
library(dplyr) # version: ≥0.3
set.seed(123)
visits %
group_by(clinic) %>%
mutate(age=sample(1:50, length(clinic), replace=FALSE),
count=sample(1:100, length(clinic), replace=TRUE)) %>%
arrange(clinic, age)
我的做法:
patient_percentiles2 %
group_by(clinic) %>%
mutate(age.percentile=as.integer(min_rank(age)/length(age)*100),
count.percentage=count/sum(count)) %>%
select(clinic, age.percentile, count.percentage)
抛砖引玉,欢迎指教!

【在 c***z 的大作中提到】
: sorry, here is an example
: clinic | age | count
: A | 12 | 3
: A | 18 | 2
: B | 22 | 4
: B | 40 | 2
: 就是说A家有3位12岁的病人,2位18岁的病人;B家有4位22岁的病人,2位40岁的病人。
: 谢谢回复,我去看看dplyr

avatar
c*z
10
Thanks a lot! Definitely will try out.

【在 f***8 的大作中提到】
: 合成的数据:
: library(dplyr) # version: ≥0.3
: set.seed(123)
: visits %
: group_by(clinic) %>%
: mutate(age=sample(1:50, length(clinic), replace=FALSE),
: count=sample(1:100, length(clinic), replace=TRUE)) %>%
: arrange(clinic, age)
: 我的做法:
: patient_percentiles2 %

avatar
c*z
11
Yes, it works like a charm! Thanks a lot!

【在 f***8 的大作中提到】
: 合成的数据:
: library(dplyr) # version: ≥0.3
: set.seed(123)
: visits %
: group_by(clinic) %>%
: mutate(age=sample(1:50, length(clinic), replace=FALSE),
: count=sample(1:100, length(clinic), replace=TRUE)) %>%
: arrange(clinic, age)
: 我的做法:
: patient_percentiles2 %

avatar
c*z
12
And it is beautiful in style, I can feel the flow. :)
avatar
f*8
13
The credit goes to Hadley Wickham..

【在 c***z 的大作中提到】
: And it is beautiful in style, I can feel the flow. :)
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。