I don't see any possible way to avoid redundancy regarding chinese Family Name. One 'Zhang' might mean millions people. One could include the First Name to do the matching too. But any idea on how to match chinese First Name?
【在 G***G 的大作中提到】 : is it possible that the last names were overlapped with multiple persons? : if it is, this statistic analysis doesn't make sense.
【在 s********9 的大作中提到】 : I don't see any possible way to avoid redundancy regarding chinese Family : Name. : One 'Zhang' might mean millions people. : One could include the First Name to do the matching too. But any idea on how : to match chinese First Name?
所以搞生物的转马公还是有难度啊 name disambiguation,人名消歧。要做得很精确很难,但是你first name根本不考虑 也是不对的。python上估计有一些包可以用。 而且还可以用到一些其它信息,比如institute,比如通讯作者,来判断是不是同一个 作者
how
【在 s********9 的大作中提到】 : I don't see any possible way to avoid redundancy regarding chinese Family : Name. : One 'Zhang' might mean millions people. : One could include the First Name to do the matching too. But any idea on how : to match chinese First Name?
a*n
35 楼
啥时候开卖阿?
T*g
36 楼
楼主对于是不是同一个作者并不关心,只想统计中国人的总数。 同一个人不同一个人在这个问题里毫无区别。
【在 m********a 的大作中提到】 : 所以搞生物的转马公还是有难度啊 : name disambiguation,人名消歧。要做得很精确很难,但是你first name根本不考虑 : 也是不对的。python上估计有一些包可以用。 : 而且还可以用到一些其它信息,比如institute,比如通讯作者,来判断是不是同一个 : 作者 : : how