有一个文件夹里有大概1000个文件。我有以下的Python语句调用后(转载) - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>Programming - 葵花宝典

有一个文件夹里有大概1000个文件。我有以下的Python语句调用后(转载)

有一个文件夹里有大概1000个文件。我有以下的Python语句调用后(转载)# Programming - 葵花宝典

a*p2015-05-02 07:05

1 楼

刚上班9个月，整天加班，又没钱。
想transfer到另外一个部门，一般怎么操作？请过来人指教。H1B

C*y2015-05-02 07:05

2 楼

没玩过xoom
请教一下玩过的同学
xoom的屏幕比ipad细腻很多吗？

m*r2015-05-02 07:05

3 楼

【以下文字转载自 DataSciences 讨论区】
发信人: milkrootbeer (milkbeer), 信区: DataSciences
标题: 有一个文件夹里有大概1000个文件。我有以下的Python语句调用后出现下面的错误。应该是涉及到特殊字符的问题，我试了其他的方法，都不能解决问题。
发信站: BBS 未名空间站 (Sat May 2 20:09:17 2015, 美东)
有一个文件夹里有大概1000个文件。我有以下的Python语句调用后出现下面的错误。应
该是涉及到特殊字符的问题，我试了其他的方法，都不能解决问题。
DIR = 'C:\Users\Desktop\data\rec.sport.hockey'
posts = [open(os.path.join(DIR,f)).read() for f in os.listdir(DIR)]
x_train = vectorizer.fit_transform(posts)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 240:
invalid start byte
Traceback (most recent call last):
File "C:/Users/PycharmProjects/Project3/demo10.py", line 16, in
x_train = vectorizer.fit_transform(posts)
File "C:UsersAppDataRoamingPythonPython27site-packagessklearnfeature_
extractiontext.py", line 804, in fit_transform
self.fixed_vocabulary_)
File "C:UsersAppDataRoamingPythonPython27site-packagessklearnfeature_
extractiontext.py", line 739, in _count_vocab
for feature in analyze(doc):
File "C:UsersAppDataRoamingPythonPython27site-packagessklearnfeature_
extractiontext.py", line 236, in
tokenize(preprocess(self.decode(doc))), stop_words)
File "C:UsersAppDataRoamingPythonPython27site-packagessklearnfeature_
extractiontext.py", line 113, in decode
doc = doc.decode(self.encoding, self.decode_error)
File "C:Python27libencodingsutf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 240:
invalid start byte
DIR = 'C:\Users\Desktop\data\rec.sport.hockey'
posts = [codecs.open(os.path.join(DIR,f),'r','utf-8') for f in os.listdir(
DIR)]
x_train = vectorizer.fit_transform(posts)
Traceback (most recent call last):
File "C:/Users/PycharmProjects/Project3/demo10.py", line 15, in
posts = [codecs.open(os.path.join(DIR,f),'r','utf-8') for f in os.
listdir(DIR)]
File "C:Python27libcodecs.py", line 878, in open
file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 24] Too many open files: 'C:\Users\Desktop\data\rec.sport.
hockey\53909'