创建gbk文件
➜ file iconv -l | grep -i 'utf-8'
UTF-8 UTF8
UTF-8-MAC UTF8-MAC
➜ file iconv -l | grep -i 'gbk'
GBK
➜ file iconv -f UTF8 -t GBK read_test.txt> read_gbk.txt
➜ file file read_gbk.txt
read_gbk.txt: ISO-8859 text
读取文件
不指定编码
➜ file cat read_gbk.py
for line in open("read_gbk.txt"):
print line.replace("\n", "")
➜ file python read_gbk.py
��1��
��2��
��3��
不指定编码就乱码了
指定编码
➜ file cat read_gbk.py
import codecs
for line in codecs.open("read_gbk.txt", encoding="GBK"):
print line.replace("\n", "")
➜ file python read_gbk.py
第1行
第2行
第3行
指定编码后就正常了
错误处理
➜ file cat read_gbk.py
import codecs
print "replace with ?"
for line in codecs.open("read_gbk.txt", encoding="utf-8", errors="replace"):
print line.replace("\n", "")
print
print "ignore the error"
for line in codecs.open("read_gbk.txt", encoding="utf-8", errors="ignore"):
print line.replace("\n", "")
print
print "raise the errot"
for line in codecs.open("read_gbk.txt", encoding="utf-8", errors="strict"):
print line.replace("\n", "")
➜ file python read_gbk.py
replace with ? ->
��1��
��2��
��3��
ignore the error ->
1
2
3
raise the error ->
Traceback (most recent call last):
File "read_gbk.py", line 14, in <module>
for line in codecs.open("read_gbk.txt", encoding="utf-8", errors="strict"):
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 699, in next
return self.reader.next()
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 630, in next
line = self.readline()
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 545, in readline
data = self.read(readsize, firstline=True)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 492, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 0: invalid start byte