How to read a text file from end?

A forum for general discussion of the Python programming language.

How to read a text file from end?

Postby solomon243 » Sun Aug 04, 2013 12:37 pm

How to read the text file since the end? I don't want to use readlines () - too big file.


--------------
from Russia with love
solomon243
 
Posts: 6
Joined: Sun Aug 04, 2013 12:06 pm

Re: How to read the text file from end?

Postby stranac » Sun Aug 04, 2013 12:53 pm

You could use file.seek().
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1093
Joined: Thu Feb 07, 2013 3:42 pm

Re: How to read the text file from end?

Postby solomon243 » Sun Aug 04, 2013 4:04 pm

stranac wrote:You could use file.seek().


if I use '>>' option it will concatenate data to existing... But I won't be able to read them back to front... sorry for my english...
solomon243
 
Posts: 6
Joined: Sun Aug 04, 2013 12:06 pm


Re: How to read a text file from end?

Postby ochichinyezaboombwa » Mon Aug 05, 2013 6:14 am

one char at a time is one way to do it.
ochichinyezaboombwa
 
Posts: 200
Joined: Tue Jun 04, 2013 7:53 pm

Re: How to read a text file from end?

Postby solomon243 » Mon Aug 05, 2013 8:49 am

One char at a time? I can't believe in it! Python - excellent language!
solomon243
 
Posts: 6
Joined: Sun Aug 04, 2013 12:06 pm

Re: How to read a text file from end?

Postby manojg » Tue Aug 06, 2013 3:18 am

Best way to do is to use readlines() whatever big the file is. Read the files line-by-line in List and read the List in backward.
manojg
 
Posts: 13
Joined: Tue Jul 09, 2013 6:40 pm

Re: How to read a text file from end?

Postby Mekire » Tue Aug 06, 2013 4:40 am

manojg wrote:Best way to do is to use readlines() whatever big the file is. Read the files line-by-line in List and read the List in backward.

That is all well and good... IF the file fits in memory. If it doesn't which is the ops implication, then this idea is useless.

-Mek
User avatar
Mekire
 
Posts: 984
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: How to read a text file from end?

Postby solomon243 » Tue Aug 06, 2013 6:45 pm

write the text file, and to store values of length of each written-down line in the second file...

or it is nonsense?

(i need simple database for messages. important condition: all messages will store in plain text files)
solomon243
 
Posts: 6
Joined: Sun Aug 04, 2013 12:06 pm

Re: How to read a text file from end?

Postby manojg » Wed Aug 07, 2013 1:55 am

Mekire wrote:That is all well and good... IF the file fits in memory. If it doesn't which is the ops implication, then this idea is useless.

-Mek


It is a useless idea to use huge file because it will consume a lots of memory and cpu, it may hung up the computer. Rather split the file into multiple small files by either using linux command or python script. Then read it in backward.
manojg
 
Posts: 13
Joined: Tue Jul 09, 2013 6:40 pm

Re: How to read a text file from end?

Postby solomon243 » Wed Aug 07, 2013 7:10 pm

in Perl I made it so:

Code: Select all
#!/usr/bin/perl -w
$| = 1;
open IDX, "< base.idx" or die "cannot open index file: $!";
@index_array = <IDX>;
open DB, "< base.db" or die "cannot open data file: $!";
$output_str_num = 5;
print "offset of $output_str_num th string is $index_array[$output_str_num-2]:\n";
print "cannot seek" if (! seek(DB, $index_array[$output_str_num-2], 0));
$current_position = tell DB;
print "current position is: $current_position\n";
$cs = <DB>;
print "$cs";

close IDX;
close DB;
1;


data stores in first file, index - in other file. Now it seems to me that it works quickly enough.
solomon243
 
Posts: 6
Joined: Sun Aug 04, 2013 12:06 pm

Re: How to read a text file from end?

Postby solomon243 » Wed Aug 07, 2013 7:15 pm

I wrote it quickly, to a lunch break. Don't abuse for a bad code. :roll:
solomon243
 
Posts: 6
Joined: Sun Aug 04, 2013 12:06 pm

Re: How to read a text file from end?

Postby micseydel » Wed Aug 07, 2013 11:13 pm

I don't understand the question here, but it should be clear from the first post that readlines() isn't an option. Breaking the single large file into smaller files doesn't sound like a good answer to me either.

solomon: your Perl code means less to me than if you'd written Chinese, could you explain in English what you're trying to accomplish? Is what you want to read it line by line, backwards? Because I can think of a decent-ish way to do that in a memory efficient way.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1132
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: How to read a text file from end?

Postby Mekire » Wed Aug 07, 2013 11:44 pm

manojg wrote:It is a useless idea to use huge file because it will consume a lots of memory and cpu
The whole point of iterating through a file as opposed to using readlines() is that it does not do this. When you iterate through a file you generate one line at a time and the only state information is the seek index in the file. Reading a huge file in this way does not take more memory than anything else; that is the whole point.

I wrote a couple out myself. One read character by character backwards; it performed horribly. I made another that iterated through the file two times; once getting the file line indexes using file.tell() and next using those indexes to read the file backwards (as a yield generator of course). The second one performed ok, but still somewhat slow.

I then found the following which is very similar to some of my attempts, but reads the file in larger chunks (thereby reducing seek and read calls) and then manually reverses each section. It seems to work fairly well (still much slower than iterating through the file normally, but that is to be expected). I would be interested if there are better solutions.
Code: Select all
import os

def reversed_lines(file_object):
    "Generate the lines of file in reverse order."
    part = []
    for block in reversed_blocks(file_object):
        for char in reversed(block):
            if char == '\n' and part:
                yield "".join(part[::-1])
                part = []
            part.append(char)
    if part:
        yield "".join(part[::-1])

def reversed_blocks(file_object, blocksize=4096):
    "Generate blocks of file's contents in reverse order."
    file_object.seek(0, os.SEEK_END)
    here = file_object.tell()
    while 0 < here:
        delta = min(blocksize, here)
        here -= delta
        file_object.seek(here, os.SEEK_SET)
        yield file_object.read(delta)

We can iterate through it as normal:
Code: Select all
with open("whatever.txt") as myfile:
    for line in reversed_lines(myfile):
        #do stuff

-Mek

Edit:
Changed the function as it was written to use "".join() instead of concatenation.
User avatar
Mekire
 
Posts: 984
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: How to read a text file from end?

Postby Mekire » Thu Aug 08, 2013 12:27 am

Apologies on the double post. I just came up with this and it seems decently fast. Still needs more testing to confirm I didn't miss any corner cases. It combines the approach of reading chunks of the file but then exploits string.split("\n") to greatly speed things up.

Code: Select all
import os


def reversed_lines(file_object):
    "Generate the lines of file in reverse order."
    part = ""
    for block in reversed_blocks(file_object):
        block += part
        lines = block.split("\n")
        if block[0]=="\n":
            part = ""
            for line in lines[::-1]:
                yield line+"\n"
        else:
            part = lines[0]
            for line in lines[:0:-1]:
                yield line+"\n"
    if part:
        yield part+"\n"


def reversed_blocks(file_object, block_size=4096):
    "Generate blocks of file's contents in reverse order."
    file_object.seek(0, os.SEEK_END)
    here = file_object.tell()
    while 0 < here:
        delta = min(block_size, here)
        here -= delta
        file_object.seek(here, os.SEEK_SET)
        yield file_object.read(delta)


if __name__ == "__main__":
    with open("whatever.txt") as myfile:
        for line in reversed_lines(myfile):
            #do stuff here

-Mek

Edit: There is a problem with this; trying to fix.

Edit2: Okay, I'm thoroughly confused. The chunks are reading in redundant data; I checked the previous version I found in the stack thread and it is doing it too. I'm wondering if it is related to the 2-bit newline characters on windows.

Edit3: Confirmed theory. The above works nicely and quickly on my linux partition. On windows the breaks between data chunks are off; presumably because of the difference in how windows handles newline characters.
User avatar
Mekire
 
Posts: 984
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan


Return to General Discussions

Who is online

Users browsing this forum: No registered users and 3 guests