字节和字符

计算机中字符是按字节存储和传输的,字符的本质是一个或多个数字码的组合

字节串

字节串内部存储的单位是字节

>>> s = 'a'
>>> len(s)
1
>>> type(s)
<type 'str'>
>>> ord(s)
97

>>> s = '中'
>>> len(s)
3
>>> type(s)
<type 'str'>
>>> ord(s[0])
228

字节串没有相关前缀

字符串

字符串内部存储的单位是字符

>>> u = 'a'
>>> len(u)
1
>>> type(u)
<type 'str'>
>>> ord(u)
97

>>> u = u'中'
>>> len(u)
1
>>> type(u)
<type 'unicode'>
>>> ord(u)
20013

字符串有个u前缀

类型的判断

>>> s = 'a'
>>> u = u'a'

>>> type(s)  == str
True
>>> type(s) == unicode
False
>>> type(u) == str
False
>>> type(u) == unicode
True

>>> isinstance(s, str)
True
>>> isinstance(s, unicode)
False
>>> isinstance(u, str)
False
>>> isinstance(u, unicode)
True

>>> isinstance(s, basestring)
True
>>> isinstance(u, basestring)
True

type vs isinstance

旧类的判断

>>> class A:
...   pass
...
>>> class B:
...   pass
...

>>> a = A()
>>> b = B()
>>> type(a)
<type 'instance'>
>>> type(b)
<type 'instance'>
>>> type(a) == type(b)
True

>>> isinstance(a, B)
False
>>> isinstance(b, A)
False

旧类使用type判断的结果是错误的

新类的判断

>>> class A(object):
...   pass
...
>>> class B(object):
...   pass
...

>>> a = A()
>>> b = B()
>>> type(a)
<class '__main__.A'>
>>> type(b)
<class '__main__.B'>
>>> type(a) == type(b)
False

>>> isinstance(a, B)
False
>>> isinstance(b, A)
False

新类使用type判断的结果是正确的

子类的判断

>>> class A(object):
...   pass
...
>>> class B(A):
...   pass
...

>>> a = A()
>>> b = B()
>>> type(a)
<class '__main__.A'>
>>> type(b)
<class '__main__.B'>
>>> type(a) == type(b)
False

>>> isinstance(a, B)
False
>>> isinstance(b, A)
True

子类使用type判断的结果是错误的

类型的转换

>>> s = 'a'
>>> u = unicode(s)
>>> type(u)
<type 'unicode'>
>>> print u
a

>>> u = u'a'
>>> s = str(u)
>>> type(s)
<type 'str'>
>>> print s
a

>>> s = '中'
>>> u = unicode(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> u = u'中'
>>> s = str(u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e2d' in position 0: ordinal not in range(128)

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

中文字符相互转换出现了编码相关错误,原因是ascii不能识别unicode字符,详情请参见后续的编码和解码的部分

Copyright © zhujipeng 2017 all right reserved,powered by Gitbook 该文件修订时间: 2017-12-16 15:12:10

results matching ""

    No results matching ""