总的感觉是,这个作者数学很强,编程很一般,但是站在风口上了,pandas变得很火, 其实是太多人涌进DS,R比较难学,python容易上手,于是pandas起来了。 就我的经验,exploratory data analysis, python系列全面弱于R系列。当然ML,训练 模型是另一回事。 R的data.table比pandas强太多了,pandas作者说了,内存需要是数据量的5-10倍。我 用data.table远没有这么夸张。 http://wesmckinney.com/blog/apache-arrow-pandas-internals/
N*a
6 楼
Only one for both
b*a
7 楼
sf mm唱的很好听,很有意境,赞~
l*i
8 楼
我投她第一
m*c
9 楼
别急,2个月后,就都卖299了。
【在 y***m 的大作中提到】 : 上次staple那个 100 off... : thx!
d*c
10 楼
pandas rule of thumb: have 5 to 10 times as much RAM as the size of your dataset There are additional, hidden memory killers in the project, like the way that we use Python objects (like strings) for many internal details, so it's not unusual to see a dataset that is 5GB on disk take up 20GB or more in memory. It's an overall bad situation for large datasets. The 10 (really 11) things are (paraphrasing my own words): Internals too far from "the metal" No support for memory-mapped datasets Poor performance in database and file ingest / export Warty missing data support Lack of transparency into memory use, RAM management Weak support for categorical data Complex groupby operations awkward and slow Appending data to a DataFrame tedious and very costly Limited, non-extensible type metadata Eager evaluation model, no query planning "Slow", limited multicore algorithms for large datasets
l*e
11 楼
bd
z*0
12 楼
明年这时候就99了。
【在 m********c 的大作中提到】 : 别急,2个月后,就都卖299了。
t*c
13 楼
5 to 10 times? It sucks.
's
【在 d******c 的大作中提到】 : pandas rule of thumb: have 5 to 10 times as much RAM as the size of your : dataset : There are additional, hidden memory killers in the project, like the way : that we use Python objects (like strings) for many internal details, so it's : not unusual to see a dataset that is 5GB on disk take up 20GB or more in : memory. It's an overall bad situation for large datasets. : The 10 (really 11) things are (paraphrasing my own words): : Internals too far from "the metal" : No support for memory-mapped datasets : Poor performance in database and file ingest / export
总的感觉是,这个作者数学很强,编程很一般,但是站在风口上了,pandas变得很火, 其实是太多人涌进DS,R比较难学,python容易上手,于是pandas起来了。 就我的经验,exploratory data analysis, python系列全面弱于R系列。当然ML,训练 模型是另一回事。 R的data.table比pandas强太多了,pandas作者说了,内存需要是数据量的5-10倍。我 用data.table远没有这么夸张。 http://wesmckinney.com/blog/apache-arrow-pandas-internals/
d*c
52 楼
pandas rule of thumb: have 5 to 10 times as much RAM as the size of your dataset There are additional, hidden memory killers in the project, like the way that we use Python objects (like strings) for many internal details, so it's not unusual to see a dataset that is 5GB on disk take up 20GB or more in memory. It's an overall bad situation for large datasets. The 10 (really 11) things are (paraphrasing my own words): Internals too far from "the metal" No support for memory-mapped datasets Poor performance in database and file ingest / export Warty missing data support Lack of transparency into memory use, RAM management Weak support for categorical data Complex groupby operations awkward and slow Appending data to a DataFrame tedious and very costly Limited, non-extensible type metadata Eager evaluation model, no query planning "Slow", limited multicore algorithms for large datasets
t*c
53 楼
5 to 10 times? It sucks.
's
【在 d******c 的大作中提到】 : pandas rule of thumb: have 5 to 10 times as much RAM as the size of your : dataset : There are additional, hidden memory killers in the project, like the way : that we use Python objects (like strings) for many internal details, so it's : not unusual to see a dataset that is 5GB on disk take up 20GB or more in : memory. It's an overall bad situation for large datasets. : The 10 (really 11) things are (paraphrasing my own words): : Internals too far from "the metal" : No support for memory-mapped datasets : Poor performance in database and file ingest / export