Yes, it is working properly. Test case: Step 1 : create a python script to generate random strings. (copied from internet with little mod) #!/usr/bin/env python import sys import random import string def id_generator(size=6, chars=string.ascii_uppercase + string.digits): return ''.join(random.choice(chars) for _ in range(size)) line=int(sys.argv[1]) for x in range(1,line): print(id_generator()) Step 2: generate test data and put into home folder: ./random_string.py 1000 | hdfs dfs -put - /user/xxxxx/thousand_strings.txt ./random_string.py 1000 | hdfs dfs -put - /user/xxxxxx/hundred_strings.txt Step 3: create hive table CREATE TABLE T1 (S string); Step4: Login to hive using beeline or eclipse data tool , sample here are in beeline. Notice that the system is kerbros enabled. !Beeline version 0.13.1-cdh5.3.1 by Apache Hive beeline> !run beelineconn.txt >>> !connect jdbc:hive2://namenode1:10000/default;principal=hive/[email protected] mydomain.com;AuthMech=1;KrbHostFQDN=namenode1;KrbServiceName=hive scan complete in 6ms Connecting to jdbc:hive2://namenode1:10000/default;principal=hive/[email protected] mydomain.com;AuthMech=1;KrbHostFQDN=namenode1;KrbServiceName=hive Enter username for jdbc:hive2://namenode1:10000/default;principal=hive/ n*******[email protected];AuthMech=1;KrbHostFQDN=namenode1;KrbServiceName=hive: Enter password for jdbc:hive2://namenode1:10000/default;principal=hive/ n*******[email protected];AuthMech=1;KrbHostFQDN=namenode1;KrbServiceName=hive: Connected to: Apache Hive (version 0.13.1-cdh5.3.1) Driver: Hive JDBC (version 0.13.1-cdh5.3.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://namenode1:100> truncate table t1; No rows affected (0.254 seconds) 0: jdbc:hive2://namenode1:100> LOAD DATA INPATH '/user/xxxxxx/hundred_ strings.txt' into table t1; No rows affected (0.572 seconds) 0: jdbc:hive2://namenode1:100> select count(*) from t1; +------+--+ | _c0 | +------+--+ | 99 | +------+--+ 1 row selected (78.502 seconds)
w*r
4 楼
if your file is not in hdfs, it is better to copy the file to hdfs first, the load data local command can only work if the file is visible by hiveserver2 process, which in a lot cases you cannot touch that server. therefore, hdfs put the file first is usually a better way.
t*u
5 楼
我的data文件是在jdbc client上 而且是windows的系统 。。。。。。。
【在 w*r 的大作中提到】 : Yes, it is working properly. : Test case: : Step 1 : create a python script to generate random strings. (copied from : internet with little mod) : #!/usr/bin/env python : import sys : import random : import string : def id_generator(size=6, chars=string.ascii_uppercase + string.digits): : return ''.join(random.choice(chars) for _ in range(size))
w*r
6 楼
My test in eclipse shows it works. All u need to do is to hdfs dfs put file into cluster first
【在 w*r 的大作中提到】 : My test in eclipse shows it works. All u need to do is to hdfs dfs put file : into cluster first
w*r
8 楼
不行,如果你要用load data local inpath 这个方法文件必须在HiveServer2执行的 server上面,这个path HiveServer2 process要能看得到,所以你的文件直接在本地的 话,只能用hdfs put或者你生成INSERT INTO TABLE VALUES(...) 用loop解决.