storing and parsing packed binary data in files

A forum for general discussion of the Python programming language.

storing and parsing packed binary data in files

Postby robgraves » Tue Dec 24, 2013 9:48 am

Ok I'm still pretty new to python, but I've been workign through the O'Reilly "Learning Python" by Mark Lutz, 4th edition, and I'm on page 239 where it speaks of "storing and parsing packed binary data in files"

The book's examples up until this point have all worked for me, this section goes as follows,
Code: Select all
>>> F = open('data.bin', 'wb')
>>> import struct
>>> data = struct.pack('>i4sh', 7, 'spam', 8)
>>> data
b'\x00\x00\x00\x07spam\x00\x08'
>>> F.write(data)
>>> F.close()

then later he opens it and reads from the file,

However when I enter the line into the python interpreter:
Code: Select all
data = struct.pack('>i4sh', 7, 'spam', 8)

I get the error
Code: Select all
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: argument for 's' must be a bytes object


I'm currently using python 3.3.3 on Arch Linux, a screenshot as well
Image
User avatar
robgraves
 
Posts: 5
Joined: Tue May 21, 2013 8:57 pm
Location: Elmira, NY

Re: storing and parsing packed binary data in files

Postby hansn » Tue Dec 24, 2013 10:09 am

I don't know anything about the struct module or bytes objects. But I think one or both of the strings you are passing to the struct.pack function must be bytes objects. You could try to preceed the string(s) with a b, like so:
Code: Select all
data = struct.pack(b'>i4sh', 7, 'spam', 8)
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm

Re: storing and parsing packed binary data in files

Postby robgraves » Tue Dec 24, 2013 10:15 am

that gives me the same error

Code: Select all
>>> data = struct.pack(b'>i4sh', 7, 'spam', 8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: argument for 's' must be a bytes object
>>>
User avatar
robgraves
 
Posts: 5
Joined: Tue May 21, 2013 8:57 pm
Location: Elmira, NY

Re: storing and parsing packed binary data in files

Postby hansn » Tue Dec 24, 2013 10:33 am

Did you try on the other argument?
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm

Re: storing and parsing packed binary data in files

Postby robgraves » Tue Dec 24, 2013 10:42 am

ahh you're right

Code: Select all
>>> data = struct.pack(b'>i4sh', 7, b'spam', 8)
>>> date
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'date' is not defined
>>> data
b'\x00\x00\x00\x07spam\x00\x08'
>>>


That one went through successfully, thank you.

The book didnt specify to do that, but that works...thanks again.
User avatar
robgraves
 
Posts: 5
Joined: Tue May 21, 2013 8:57 pm
Location: Elmira, NY

Re: storing and parsing packed binary data in files

Postby metulburr » Tue Dec 24, 2013 7:16 pm

It looks to be a typo in the 4th edition. In the 5th edition he corrects it to:
Code: Select all
data = struct.pack('>i4sh', 7, b'spam', 8)

The first argument to be of type string.
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1312
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: storing and parsing packed binary data in files

Postby Tcll » Wed Jan 01, 2014 7:51 pm

I personally got tired of the struct module as you can't do stuff like 24bit float values...

I've been testing other various methods for quite some time now,
and the best I've run across is using array('B', []) to store a list of 8-bit ints,
and then iterate through that array using a global offset.

I have a function which performs every operation on a given byte-size
Code: Select all
def __BIT(big,bit_format,byte_size,value):
    global __f,__o,__c #file, offset, current (for multiple files)
    if type(value)==str: #credit to Gribouillis (@DaniWeb) for various speedups:
        if (__o[__c]+byte_size)<=len(__f[__c]): #check for EOF (better than recieving an indexing error)
            DATA= __f[__c][__o[__c]:__o[__c]+byte_size]; val = 0
            for v in (DATA if big else list(reversed(DATA))): val=(val<<8)|v #multi-int -> single-int (flipped if little endian)
            if bit_format == 1: val=(val-(1<<(byte_size<<3)) if val>(1<<(byte_size<<3))/2 else val) #signed int
            if bit_format == 2: #float (IEEE754)
                if val==0: val = 0.0 #speedy check (before performing any calculations)
                else: #credit to pyTony (@DaniWeb) for simplifying the formula of 'e' and fixing the return values:
                    e=((byte_size*8)-1)//(byte_size+1)+(byte_size>2)*byte_size//2; m,b=[((byte_size*8)-(1+e)), ~(~0 << e-1)]
                    S,E,M=[(val>>((byte_size*8)-1))&1,(val>>m)&~(~0 << e),val&~(~0 << m)] #<- added brackets (faster processing)
                    if E == int(''.join(['1']*e),2): val=(float('NaN') if M!=0 else (float('+inf') if S else float('-inf')))
                    else: val=((pow(-1,S)*(2**(E-b-m)*((1<<m)+M))) if E else pow(-1,S)*(2**(1-b-m)*M))
                   
            if bit_format == 3: pass #float (IBM)
            if bit_format == 4: pass #float (Borland)
            __o[__c]+=byte_size #modify the file offset
            return val

        else: #EOF (End Of File)
            raise EOFError
           
    elif type(value)==int or type(value)==long: #write int
        if bit_format==1: value=(value+pow(256,byte_size) if value<0 else value) #signed int
        Bytes=[(value>>(i*8))&255 for i in range(byte_size)] # single-int -> multi-int (could be faster)
        __f[__c]+=__arr('B',list(reversed(Bytes)) if big else Bytes)
        __o[__c]+=byte_size

    elif type(value)==float: #write float
        if value==0: Bytes=[0]*byte_size #speedy check (before performing any unneeded calculations)
        else: #credit to jdaster64 (@kc-mm) for this:
            e=((byte_size*8)-1)//(byte_size+1)+(byte_size>2)*byte_size//2+(byte_size==3); m,E=[((byte_size*8)-(1+e)), ~(~0 << e-1)]; S=0 #pyTony's formula
            if value<0: S=1; value*=-1 #set the sign
            while value<1.0 or value>=2.0: value,E=(value*2.0,E-1) if value<1.0 else (value/2.0,E+1)
            v=(S<<(e+m))|(E<<m)|int(round((value-1)*(1<<m)))
            Bytes=[(v>>(i*8))&255 for i in range(byte_size)]
        __f[__c]+=__arr('B',list(reversed(Bytes)) if big else Bytes)
        __o[__c]+=byte_size

    elif type(value)==list: return list(__BIT(big,bit_format,byte_size,Lval) for Lval in value)
    elif type(value)==tuple: return tuple(__BIT(big,bit_format,byte_size,Tval) for Tval in value)
    elif type(value)==bool: return __BIT(big,bit_format,byte_size,int(value))
    return None


it's pretty fast and efficient for what it's capable of, but it could be faster <.<

bit_format:
0: unsigned int
1: signed int
2: float (IEEE754)
3: float (IBM)
4: float (Borland)


I never finished the last 2 -.-*

value:
"": read offset+byte_size
int/float/bool: write
list/tuple: R/W

you may notice I don't use [].append(val) in the code...
that's because that method is about 3x slower than []+[val]

the fastest array you can use is a dictionary, but that's unstable and can very easily kill your memory,
so the next best thing was to use array('B', []).
(this is even faster than NumPy's array methods)
User avatar
Tcll
 
Posts: 100
Joined: Wed Jan 01, 2014 6:36 pm


Return to General Discussions

Who is online

Users browsing this forum: No registered users and 3 guests