字节和字符
计算机中字符是按字节存储和传输的,字符的本质是一个或多个数字码的组合
字节串
字节串内部存储的单位是字节
>>> s = 'a'
>>> len(s)
1
>>> type(s)
<type 'str'>
>>> ord(s)
97
>>> s = '中'
>>> len(s)
3
>>> type(s)
<type 'str'>
>>> ord(s[0])
228
字节串没有相关前缀
字符串
字符串内部存储的单位是字符
>>> u = 'a'
>>> len(u)
1
>>> type(u)
<type 'str'>
>>> ord(u)
97
>>> u = u'中'
>>> len(u)
1
>>> type(u)
<type 'unicode'>
>>> ord(u)
20013
字符串有个
u前缀
类型的判断
>>> s = 'a'
>>> u = u'a'
>>> type(s) == str
True
>>> type(s) == unicode
False
>>> type(u) == str
False
>>> type(u) == unicode
True
>>> isinstance(s, str)
True
>>> isinstance(s, unicode)
False
>>> isinstance(u, str)
False
>>> isinstance(u, unicode)
True
>>> isinstance(s, basestring)
True
>>> isinstance(u, basestring)
True
type vs isinstance
旧类的判断
>>> class A:
... pass
...
>>> class B:
... pass
...
>>> a = A()
>>> b = B()
>>> type(a)
<type 'instance'>
>>> type(b)
<type 'instance'>
>>> type(a) == type(b)
True
>>> isinstance(a, B)
False
>>> isinstance(b, A)
False
旧类使用
type判断的结果是错误的
新类的判断
>>> class A(object):
... pass
...
>>> class B(object):
... pass
...
>>> a = A()
>>> b = B()
>>> type(a)
<class '__main__.A'>
>>> type(b)
<class '__main__.B'>
>>> type(a) == type(b)
False
>>> isinstance(a, B)
False
>>> isinstance(b, A)
False
新类使用
type判断的结果是正确的
子类的判断
>>> class A(object):
... pass
...
>>> class B(A):
... pass
...
>>> a = A()
>>> b = B()
>>> type(a)
<class '__main__.A'>
>>> type(b)
<class '__main__.B'>
>>> type(a) == type(b)
False
>>> isinstance(a, B)
False
>>> isinstance(b, A)
True
子类使用
type判断的结果是错误的
类型的转换
>>> s = 'a'
>>> u = unicode(s)
>>> type(u)
<type 'unicode'>
>>> print u
a
>>> u = u'a'
>>> s = str(u)
>>> type(s)
<type 'str'>
>>> print s
a
>>> s = '中'
>>> u = unicode(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> u = u'中'
>>> s = str(u)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e2d' in position 0: ordinal not in range(128)
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
中文字符相互转换出现了编码相关错误,原因是
ascii不能识别unicode字符,详情请参见后续的编码和解码的部分