What's the best way to convert text/csv file into PARQUET - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>DataSciences - 数据科学

What's the best way to convert text/csv file into PARQUET

What's the best way to convert text/csv file into PARQUET# DataSciences - 数据科学

s*h2016-03-17 07:03

1 楼

I have text/csv files and want to upload them into Cloudera cluster, and use
them in Spark.
What's the best way to upload and convert text/csv file into PARQUET format?
Two load, use either file manager in Hue or SFTP?
To convert, I can think of 3 ways:
A.
In HIVE, create external table based on the original file,
then create new external table in PARQUET format ?
B.
In Spark, wse Scala code to convert ? Conversion speed might be a concern.
https://developer.ibm.com/hadoop/blog/2015/12/03/parquet-for-sp
C.
Using Apache Drill? Anyone has installed Apache Drill on CDH before?
Conversion speed would be better. https://www.mapr.com/blog/how-convert-csv-
file-apache-parquet-using-apache-drill
Need install Apache Drill first: https://drill.apache.org/docs/installing-
drill-on-the-cluster/
With Sqoop, it's much easier as we have setting "--as-parquetfile".
Thanks!