How to write fast into a file in python?

A forum for general discussion of the Python programming language.

How to write fast into a file in python?

Postby VenugopalaRao » Fri May 17, 2013 3:11 am

I need to write numbers into a file upto 50mb and it should be fast
can any one help me how to do that?
i had written the following code..
-----------------------------------------------------------------------------------------------------------
def create_file_numbers_old(filename, size):
start = time.clock()

value = 0
with open(filename, "w") as f:
while f.tell()< size:
f.write("{0}\n".format(value))
value += 1

end = time.clock()

print "time taken to write a file of size", size, " is ", (end -start), "seconds \n"
------------------------------------------------------------------------------------------------------------------
it takes about 34sec i need 5 to 10 times less than that.
VenugopalaRao
 
Posts: 16
Joined: Wed May 15, 2013 5:05 pm

Re: How to write fast into a file in python?

Postby metulburr » Fri May 17, 2013 4:02 am

use code tags

You should also use timeit instead. Although i admit, i use the same sort of method to quickly aquire time stats of a general idea.

The more methods you call, the slower its going to be. Writing to disc is always going to be slow to a certain extent. While your code on my pc ran at 24 seconds ...For example this code snippets' code outcome is the same as it creates a 50MB file of ints, and on my pc runs between 2.0 and 2.4 seconds to write. But that is because i use write() once, and not constantly checking the size, nor checking the position. It depends on what you need to do.

Code: Select all
import timeit

def writer():
    l = ''
    for i in range(6388889):
        l += '{}\n'.format(i)
    f = open('test.txt','w')
    f.write(l)
    f.close()

val = timeit.Timer(lambda:writer())
print(val.timeit(number=1))


To be honest there might be a better method of doing this, but this is not a subject where i shine.
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1469
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: How to write fast into a file in python?

Postby VenugopalaRao » Fri May 17, 2013 7:14 am

@ metulburr
its not working it taking more than 34 sec can u tell me a good elegant code plsee...
unable to do it
VenugopalaRao
 
Posts: 16
Joined: Wed May 15, 2013 5:05 pm

Re: How to write fast into a file in python?

Postby Mekire » Fri May 17, 2013 8:08 am

It is important to figure out what is bottlenecking here. It isn't as you seem to think it is, the actual writing to file. It is the way you are creating your file.

For example let's compare Metul's previous:
Code: Select all
def make_string_slow(num):
    l = ''
    for i in xrange(num):
        l += '{}\n'.format(i)
    return l
with:
Code: Select all
def make_string(num):
    return "\n".join(str(i) for i in xrange(num))+"\n"

First lets show that at least for an argument that is an integer greater than 0, they are equivalent:
Code: Select all
print(make_string(1000000)==make_string_slow(1000000))
Code: Select all
True
Now lets see what happens if we time both of these.
Code: Select all
NUMBER = 1000000

start = time.time()
DATA1 = make_string(NUMBER)
print("Time to generate data: {}".format(time.time()-start))

start = time.time()
DATA2 = make_string_slow(NUMBER)
print("Time to generate data (slow): {}".format(time.time()-start))

print(DATA1==DATA2)
Code: Select all
Time to generate data: 0.391999959946
Time to generate data (slow): 3.01900005341
True
So we can see that for larger numbers Metul's method becomes sloooooooooow.

Now let's put it all together (I'm going to use time because I'm lazy).
Code: Select all
import time

def writer(data):
    with open('test.txt','w') as myfile:
        myfile.write(data)

def make_string(num):
    return "\n".join(str(i) for i in xrange(num))+"\n"

def make_string_slow(num):
    l = ''
    for i in xrange(num):
        l += '{}\n'.format(i)
    return l

if __name__ == "__main__":
    NUMBER = 6388889

    start = time.time()
    DATA = make_string(NUMBER)
    print("Time to generate data: {}".format(time.time()-start))

    start2 = time.time()
    writer(DATA)
    print("Time to write data: {}".format(time.time()-start2))

    print("Total time: {}:".format(time.time()-start))
Code: Select all
>>>
Time to generate data: 2.22300004959
Time to write data: 0.248000144958
Total time: 2.47300004959:
>>>
And the final file size is 55 MB. So as you can see... the writing of the file was by far the fastest thing we did. The thing that took time was creating the data. This is where you need to optimize. As such we can't tell you how to go faster without seeing what your method of generating your data is.

-Mek
User avatar
Mekire
 
Posts: 988
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: How to write fast into a file in python?

Postby VenugopalaRao » Fri May 17, 2013 8:43 am

thanks mek but i am getting the following memory error
File "D:/work/pythoncourse/project_02.py", line 46, in create_file_numbers_new
data = "\n".join(str(i) for i in xrange(size))+"\n"
MemoryError

that is this not working for size = 50*1024*1024
producing a memory error
Last edited by VenugopalaRao on Fri May 17, 2013 9:24 am, edited 1 time in total.
VenugopalaRao
 
Posts: 16
Joined: Wed May 15, 2013 5:05 pm

Re: How to write fast into a file in python?

Postby VenugopalaRao » Fri May 17, 2013 8:45 am

my code is this can u tell me wheres the error is ????

Code: Select all
import functools
import inspect
import os
import sys
import time

def create_file_numbers_old(filename, size):
    start = time.clock()

    value = 0
    with open(filename, "w") as f:
        while f.tell()< int(size):
            f.write("{0}\n".format(value))
            value += 1

    end = time.clock()
    print "time taken to write a file of size", size, " is ", (end -start), "seconds \n"

def create_file_numbers_new(filename, size):
    pass
    start = time.clock()
    size = int(size)
    print type(size)
    data = "\n".join(str(i) for i in xrange(size))+"\n"
    with open(filename,'w') as myfile:
        myfile.write(data)
    end = time.clock()
    print "time taken to write a file of size", size, " is ", (end -start), "seconds \n"

def get_module_dir():
    mod_file = inspect.getfile(inspect.currentframe())
    return os.path.dirname(mod_file)

output_path = functools.partial(os.path.join, get_module_dir())

def main(argv = sys.argv):
    try:
     fpath = output_path("test.txt")
     fpath2 = output_path(sys.argv[1])
     create_file_numbers_old(fpath, sys.argv[2])
     create_file_numbers_new(fpath2, sys.argv[2])
    except IndexError:
        print "argument not given"

if __name__ == "__main__":
    main()
VenugopalaRao
 
Posts: 16
Joined: Wed May 15, 2013 5:05 pm

Re: How to write fast into a file in python?

Postby Kebap » Fri May 17, 2013 10:16 am

Try using a smaller size first, or switch to generators instead of lists, if your computer has not enough Memory to handle such a huge list at once.
Learn: How To Ask Questions The Smart Way
Join the #python-forum IRC channel on irc.freenode.net and chat with uns directly!
Kebap
 
Posts: 396
Joined: Thu Apr 04, 2013 1:17 pm
Location: Germany, Europe

Re: How to write fast into a file in python?

Postby Mekire » Fri May 17, 2013 10:37 am

VenugopalaRao wrote:thanks mek but i am getting the following memory error
File "D:/work/pythoncourse/project_02.py", line 46, in create_file_numbers_new
data = "\n".join(str(i) for i in xrange(size))+"\n"
MemoryError

that is this not working for size = 50*1024*1024
producing a memory error
Figure out a size that you can fit in memory and do it in chunks.

Seriously I have no clue why you would possibly need to do this. Computers are great at generating numbers; why save over 52 million consecutive ones in a file:
Code: Select all
import time

start = time.time()

NUMBER = 50*1024*1024
EACH_CHUNK = 4000000
TIMES = NUMBER//EACH_CHUNK

with open('huge.txt','a') as myfile:
    for i in xrange(TIMES):
        data = "\n".join(str(num) for num in xrange(i*EACH_CHUNK,(i+1)*EACH_CHUNK))+"\n"
        myfile.write(data)

print(time.time()-start)
Takes 20 seconds on my machine and is nearly half a gig :/
User avatar
Mekire
 
Posts: 988
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: How to write fast into a file in python?

Postby Kebap » Fri May 17, 2013 11:36 am

Mekire wrote:Seriously I have no clue why you would possibly need to do this. :/

Some random software I know uses a process like this to deliberately clutter your file system with nonsense.
Learn: How To Ask Questions The Smart Way
Join the #python-forum IRC channel on irc.freenode.net and chat with uns directly!
Kebap
 
Posts: 396
Joined: Thu Apr 04, 2013 1:17 pm
Location: Germany, Europe

Re: How to write fast into a file in python?

Postby VenugopalaRao » Fri May 17, 2013 11:41 am

@mek
its my assignment,given by my professor
however, i tried the following code but its exceeding the 50mb why so, can you give me the reason
Code: Select all
def create_file_numbers_new(filename, size):
count = value = 0
    with open(filename, 'w') as f:
         while count <=int(size):
             s = '%s\n' % value
             f.write(s)
             count +=len(s)
             value += 1

i am calling the function create_file_numbers_new("text1.txt",50*1024*1024)
the output file i.e text1.txt is 56.7mb rather than 50mb(i.e 50*1024*1024) why is it exceeding
please correct the code
VenugopalaRao
 
Posts: 16
Joined: Wed May 15, 2013 5:05 pm

Re: How to write fast into a file in python?

Postby metulburr » Fri May 17, 2013 12:02 pm

Mekire wrote:Time to generate data: 0.391999959946
Time to generate data (slow): 3.01900005341
True


3 seconds?

My result for that code is:
Time to generate data: 0.1988260746
Time to generate data (slow): 0.23685503006
True


Code: Select all
import time

def make_string_slow(num):
    l = ''
    for i in xrange(num):
        l += '{}\n'.format(i)
    return l

def make_string(num):
    return "\n".join(str(i) for i in xrange(num))+"\n"

NUMBER = 1000000

start = time.time()
DATA1 = make_string(NUMBER)
print("Time to generate data: {}".format(time.time()-start))

start = time.time()
DATA2 = make_string_slow(NUMBER)
print("Time to generate data (slow): {}".format(time.time()-start))

print(DATA1==DATA2)
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1469
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: How to write fast into a file in python?

Postby VenugopalaRao » Fri May 17, 2013 12:24 pm

@metulburr
can you answer for the question i had given above i.e exceeding of 50mb problem
plsee......
VenugopalaRao
 
Posts: 16
Joined: Wed May 15, 2013 5:05 pm

Re: How to write fast into a file in python?

Postby Mekire » Fri May 17, 2013 12:38 pm

Lol I understand what you are doing... the argument you are passing is NOT the data size of the file to generate. It is the upper value of the integers to generate.

If you pass 50*1024*1024, using my function, you are asking for a file that prints all integers up to 52428800 which is almost a 500 megabyte file.

@Metul... the file time only starts to blow up once I use large numbers which is proportional to my processing speed. Try increasing the number drastically and see if they stay around even. It shouldn't stay even if you make them large enough unless something very strange is happening. You may just be using a much more powerful computer than mine.

Anyway... final silly answer:
Code: Select all
import time

def write_all_the_numbers(filename,lim,n=None):
    """lim is the max number to print up to (non-inclusive).
    n should be chosen to be as large as possible while still fitting in memory.
    If lim can already fit in memory do not pass a third argument."""
    if n:
        gener=(xrange(i,i+n) if i+n<lim else xrange(i,lim) for i in xrange(0,lim,n))
    else:
        gener = (xrange(lim),)
    with open(filename,'a') as myfile:
        for chunk_range in gener:
            data = "\n".join(str(num) for num in chunk_range)+"\n"
            myfile.write(data)

if __name__ == "__main__":
    NUMBER = 50*1024*1024
    PER_CHUNK = 4000000

    start = time.time()
    write_all_the_numbers("huge.txt",NUMBER,PER_CHUNK)
    print(time.time()-start)


Note the above code WILL create a file with 52428800 numbers. It runs in 20 seconds on my machine. If you only want a 50 megabyte file lower the number to what Metul originally suggested of 6388889.

Edit: @Metul: When booting in linux mint I produced speeds similar to yours. I have no clue what retardation windows is doing to add 3 seconds on to the time.
@OP: if you are running under windows, it seems that "\n" adds two bytes, not one. Try taking this into account.
User avatar
Mekire
 
Posts: 988
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: How to write fast into a file in python?

Postby Mekire » Fri May 17, 2013 3:11 pm

Just double checked:

If you are using windows, the newline character "\n" retardedly adds 2 bytes so you would need something like this:
Code: Select all
import time

def max_size(filename):
    size = 0
    num = 0
    buf = []
    while size+len(str(num)) < MAX_SIZE:
        buf.append(str(num))
        size += len(str(num))+2 #newline is two bytes on windows.
        num += 1
    with open(filename,'w') as myfile:
        myfile.write("\n".join(buf))

if __name__ == "__main__":
    start = time.time()
    MAX_SIZE = 50*(1024**2)
    max_size("fifty.txt")
    print(time.time()-start)

On linux and in other worlds of sanity it adds 1 byte so your previous code might have worked.
You might want to confirm with your instructor which he wants.
The above should give you as close to 50 megs as you can get with the current criteria as I understand it.
-Mek
User avatar
Mekire
 
Posts: 988
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan


Return to General Discussions

Who is online

Users browsing this forum: No registered users and 3 guests