Redian新闻
>
急,跪求答案 (moving avg using spark dataframe window functions)
avatar
急,跪求答案 (moving avg using spark dataframe window functions)# DataSciences - 数据科学
w*2
1
请教大牛们,如何用window functions来算出 3day moving avg。那个error msg看不
懂呢,为啥要hive context。
多谢了~
例子如下,
from pyspark.sql import Window
from pyspark.sql import SQLContext
import pyspark.sql.functions as func
Table T:
Date Num
07/01 2
07/02 3
07/03 2
07/04 2
07/05 5
07/06 6
07/07 7
sqlCtx = SQLContext(sc)
T.registerTempTable(“T”)
w = Window.partitionBy(T.Date).orderBy(T.Date).rangeBetween(-2,0)
a = (func.avg(T["Num"]).over(w))
T.select(T["Date"],T["Num"],a.alias("moving_avg"))
Error Msg:
Could not resolve window function 'avg'. Note that, using window functions
currently requires a HiveContext;
avatar
S*e
2
SQLContext only supports very limited SQL functions. HiveContext supports
many functions such as what you need. Anything SQLContext supports, the
HiveContext will support.
I think you only change "from pyspark.sql import SQLContext ", to
"from pyspark.sql import HiveContext " and change "sqlCtx = SQLContext(sc)"
to "sqlCtx = HiveContext(sc)" will work (by the way, I have very limited
knowledge on python. I mainly use Java to do Spark).
avatar
w*2
3
太感谢了。希望1.5.0版本可以有改进吧。
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。