Redian新闻
>
What's the best way to convert text/csv file into PARQUET
avatar
What's the best way to convert text/csv file into PARQUET# DataSciences - 数据科学
s*h
1
I have text/csv files and want to upload them into Cloudera cluster, and use
them in Spark.
What's the best way to upload and convert text/csv file into PARQUET format?
Two load, use either file manager in Hue or SFTP?
To convert, I can think of 3 ways:
A.
In HIVE, create external table based on the original file,
then create new external table in PARQUET format ?
B.
In Spark, wse Scala code to convert ? Conversion speed might be a concern.
https://developer.ibm.com/hadoop/blog/2015/12/03/parquet-for-sp
C.
Using Apache Drill? Anyone has installed Apache Drill on CDH before?
Conversion speed would be better. https://www.mapr.com/blog/how-convert-csv-
file-apache-parquet-using-apache-drill
Need install Apache Drill first: https://drill.apache.org/docs/installing-
drill-on-the-cluster/
With Sqoop, it's much easier as we have setting "--as-parquetfile".
Thanks!
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。