可以通过jdbc给hive table 进行 load data么？ - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>DataSciences - 数据科学

可以通过jdbc给hive table 进行 load data么？

可以通过jdbc给hive table 进行 load data么？# DataSciences - 数据科学

t*u2015-03-05 08:03

1 楼

如果可以的话
jdbc的 client是在windows上，怎么写那个pathname啊

t*u2015-03-05 08:03

2 楼

如果可以的话
jdbc的 client是在windows上，怎么写那个pathname啊

w*r2015-03-05 08:03

3 楼

Yes, it is working properly.
Test case:
Step 1 : create a python script to generate random strings. (copied from
internet with little mod)
#!/usr/bin/env python
import sys
import random
import string
def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
line=int(sys.argv[1])
for x in range(1,line):
print(id_generator())
Step 2: generate test data and put into home folder:
./random_string.py 1000 | hdfs dfs -put - /user/xxxxx/thousand_strings.txt
./random_string.py 1000 | hdfs dfs -put - /user/xxxxxx/hundred_strings.txt
Step 3: create hive table
CREATE TABLE T1 (S string);
Step4: Login to hive using beeline or eclipse data tool , sample here are in
beeline. Notice that the system is kerbros enabled.
!Beeline version 0.13.1-cdh5.3.1 by Apache Hive
beeline> !run beelineconn.txt
>>> !connect jdbc:hive2://namenode1:10000/default;principal=hive/[email protected]
mydomain.com;AuthMech=1;KrbHostFQDN=namenode1;KrbServiceName=hive
scan complete in 6ms
Connecting to jdbc:hive2://namenode1:10000/default;principal=hive/[email protected]
mydomain.com;AuthMech=1;KrbHostFQDN=namenode1;KrbServiceName=hive
Enter username for jdbc:hive2://namenode1:10000/default;principal=hive/
n*******[email protected];AuthMech=1;KrbHostFQDN=namenode1;KrbServiceName=hive:
Enter password for jdbc:hive2://namenode1:10000/default;principal=hive/
n*******[email protected];AuthMech=1;KrbHostFQDN=namenode1;KrbServiceName=hive:
Connected to: Apache Hive (version 0.13.1-cdh5.3.1)
Driver: Hive JDBC (version 0.13.1-cdh5.3.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://namenode1:100> truncate table t1;
No rows affected (0.254 seconds)
0: jdbc:hive2://namenode1:100> LOAD DATA INPATH '/user/xxxxxx/hundred_
strings.txt' into table t1;
No rows affected (0.572 seconds)
0: jdbc:hive2://namenode1:100> select count(*) from t1;
+------+--+
| _c0 |
+------+--+
| 99 |
+------+--+
1 row selected (78.502 seconds)

w*r2015-03-05 08:03

4 楼

if your file is not in hdfs, it is better to copy the file to hdfs first,
the load data local command can only work if the file is visible by
hiveserver2 process, which in a lot cases you cannot touch that server.
therefore, hdfs put the file first is usually a better way.

t*u2015-03-05 08:03

5 楼

我的data文件是在jdbc client上
而且是windows的系统
。。。。。。。

【在 w*r 的大作中提到】

: Yes, it is working properly.
: Test case:
: Step 1 : create a python script to generate random strings. (copied from
: internet with little mod)
: #!/usr/bin/env python
: import sys
: import random
: import string
: def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
: return ''.join(random.choice(chars) for _ in range(size))

w*r2015-03-05 08:03

6 楼

My test in eclipse shows it works. All u need to do is to hdfs dfs put file
into cluster first

【在 t*********u 的大作中提到】

: 我的data文件是在jdbc client上
: 而且是windows的系统
: 。。。。。。。

t*u2015-03-05 08:03

7 楼

我是想直接load到table
如果需要上传的话，直接拷到hive的table目录下不是更快？

file

【在 w*r 的大作中提到】

: My test in eclipse shows it works. All u need to do is to hdfs dfs put file
: into cluster first

w*r2015-03-05 08:03

8 楼

不行，如果你要用load data local inpath 这个方法文件必须在HiveServer2执行的
server上面，这个path HiveServer2 process要能看得到，所以你的文件直接在本地的
话，只能用hdfs put或者你生成INSERT INTO TABLE VALUES(...) 用loop解决.

【在 t*********u 的大作中提到】

: 我是想直接load到table
: 如果需要上传的话，直接拷到hive的table目录下不是更快？
:
: file

t*u2015-03-05 08:03

9 楼

thanks
insert into value 是 hdp 2.2的？
loop + insert into value效率会不会很低？
PS
我只能在java里面开一个ssh的连接把文件直接后台上传到
hive /warehouse 里面去了

【在 w*r 的大作中提到】

: 不行，如果你要用load data local inpath 这个方法文件必须在HiveServer2执行的
: server上面，这个path HiveServer2 process要能看得到，所以你的文件直接在本地的
: 话，只能用hdfs put或者你生成INSERT INTO TABLE VALUES(...) 用loop解决.

w*r2015-03-05 08:03

10 楼

好好有hdfs 的api你不用，折腾ssh .... 你这个路子我喜欢～让你从前门去天安门，
你非去通县打个转

【在 t*********u 的大作中提到】

: thanks
: insert into value 是 hdp 2.2的？
: loop + insert into value效率会不会很低？
: PS
: 我只能在java里面开一个ssh的连接把文件直接后台上传到
: hive /warehouse 里面去了

t*u2015-03-05 08:03

11 楼

我的程序在windows上运行
Hdp在远程机器上，还不开放端口
你说说怎么直接api

【在 w*r 的大作中提到】

: 好好有hdfs 的api你不用，折腾ssh .... 你这个路子我喜欢～让你从前门去天安门，
: 你非去通县打个转

w*r2015-03-05 08:03

12 楼

HDP 远程的端口都关掉了？？？那大伙儿都怎么玩啊？所有的traffic都去edge node?

【在 t*********u 的大作中提到】

: 我的程序在windows上运行
: Hdp在远程机器上，还不开放端口
: 你说说怎么直接api

t*u2015-03-05 08:03

13 楼

只给了一个jdbc hive 只读的访问权限
和一个linux ssh的权限

?

【在 w*r 的大作中提到】

: HDP 远程的端口都关掉了？？？那大伙儿都怎么玩啊？所有的traffic都去edge node?

w*r2015-03-05 08:03

14 楼

这帮操蛋的家伙.........你确定hdp 的端口都关了吗？
试试看http://:50070/explorer.html#/
http://:50070/webhdfs/v1/?op=GETFILESTAT
http://:8020/
8020如果开着你就可以用api..
50070开着可以用restapi
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoo

t*u2015-03-05 08:03

15 楼

只有 http://:50070/dfshealth.jsp
这个开着
其他的都不行

【在 w*r 的大作中提到】

: 这帮操蛋的家伙.........你确定hdp 的端口都关了吗？
: 试试看http://:50070/explorer.html#/
: http://:50070/webhdfs/v1/?op=GETFILESTAT
: http://:8020/
: 8020如果开着你就可以用api..
: 50070开着可以用restapi
: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoo