Open a python interpreter and import numpy:
>>> import numpy as np
Now create a bytes object and create a numpy array from it:
>>> my_bytes = bytes([0, 1, 127, 255]) >>> my_np_bytes = np.array(my_bytes) >>> my_np_bytes.size 1 >>> my_np_bytes.shape () >>> my_np_bytes.dtype dtype('S4')
As we can see, the bytes object results in an ndarray object of size 1 and of byte-string type.
Now let's repeat for bytearray:
>>> my_bytearray = bytearray([0, 1, 127, 255]) >>> my_np_bytearray = np.array(my_bytearray) >>> my_np_bytearray.size 4 >>> my_np_bytearray.shape (4,) >>> my_np_bytearray.dtype dtype('uint8')
We now get what we expect, an ndarray of size 4 and of uint8 type.
Why does this matter? Consider the following example where we round-trip through scipy savemat and loadmat:
>>> import numpy as np >>> import io >>> import scipy.io >>> a_int = 3 >>> a_bytes = bytes([0, 1, 127]) >>> a_bytearray = bytearray([0, 1, 127, 255]) >>> bio = io.BytesIO() >>> my_dict = {'VAR': {'a_int': a_int, 'a_bytes': a_bytes, 'a_bytearray': a_bytearray}} >>> scipy.io.savemat(bio, my_dict, long_field_names=True) >>> data_out = scipy.io.loadmat(bio, struct_as_record=False, squeeze_me=True) >>> recovered_data = {k: getattr(data_out['VAR'], k) for k in data_out['VAR']._fieldnames} >>> recovered_data {'a_bytes': '\x00\x01\x7f', 'a_int': 3, 'a_bytearray': array([ 0, 1, 127, 255], dtype=uint8)}
So far, all good. Notice, however, that I've dropped the 255 value in the bytes object, that's because 255 is not a valid ascii character. If instead we have:
>>> a_bytes = bytes([0, 1, 127, 255])
We get the following when we call loadmat():
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.4/dist-packages/scipy/io/matlab/mio.py", line 132, in loadmat matfile_dict = MR.get_variables(variable_names) File "/usr/local/lib/python3.4/dist-packages/scipy/io/matlab/mio5.py", line 292, in get_variables res = self.read_var_array(hdr, process) File "/usr/local/lib/python3.4/dist-packages/scipy/io/matlab/mio5.py", line 252, in read_var_array return self._matrix_reader.array_from_header(header, process) File "mio5_utils.pyx", line 625, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header (scipy/io/matlab/mio5_utils.c:5993) File "mio5_utils.pyx", line 673, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header (scipy/io/matlab/mio5_utils.c:5585) File "mio5_utils.pyx", line 931, in scipy.io.matlab.mio5_utils.VarReader5.read_struct (scipy/io/matlab/mio5_utils.c:8694) File "mio5_utils.pyx", line 623, in scipy.io.matlab.mio5_utils.VarReader5.read_mi_matrix (scipy/io/matlab/mio5_utils.c:5184) File "mio5_utils.pyx", line 667, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header (scipy/io/matlab/mio5_utils.c:5503) File "mio5_utils.pyx", line 824, in scipy.io.matlab.mio5_utils.VarReader5.read_char (scipy/io/matlab/mio5_utils.c:7306) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 3: invalid start byte
So, although 255 is a valid bytes value, the scipy library tries to decode the stream as a string, and fails.