regex - searching for fuzzy strings

This is the place for queries that don't fit in any of the other categories.

regex - searching for fuzzy strings

Postby Morrolan » Wed Jul 03, 2013 1:43 pm

Hi all,

I have a list of strings from os.listdir containing file/directory names. What I'm looking for is any file that matches a particular pattern, but so far I've been unable to get anything less than a perfect match using the 're' module.

Using Python 2.7.5, I need to search a list of 300+ strings and find any string that matches my filename, however, my filename has some dynamic components.

Typical logfile name:

20130703_d3rp_3088_0.log

the 'd3rp' and .log are the only static components - the date at the beginning changes, 3088 is the PID of the process creating the logfile, and the _0 is a version, as the logfile spawns _1, _2 every 10mb or so.

However, so far I can only find the file if I specify the precise filename, which is useless when it changes daily (and sometimes hourly). I have the following code which works fine if I precisely specify the whole filename, but so far I've been unable to find the correct regular expression syntax to essentially find '*d3rp*.log'.

Code: Select all
import os
import re

items = os.listdir(LOGFILE_PATH)
   
   for item in items:
   
      regex = re.compile('can't get this bit right')
      m = regex.match(item)
      if m:
         print 'Match found: ', m.group()
      else:
         print 'No match'


Please could someone point me in the right direction for this?

Many Thanks in advance,
Morrolan
Morrolan
 
Posts: 2
Joined: Mon May 13, 2013 4:27 pm

Re: regex - searching for fuzzy strings

Postby metulburr » Wed Jul 03, 2013 2:13 pm

depending on possible other file name conflicts, you dont even need regex to accomplish this. for example:
Code: Select all
items = [
    '20130703_d3rp_3088_0.log',
    '20130703_d3rp_3089_0.log',
    '20130704_d3rp_3090_0.log',
    '20130705_d3rp_3091_0.log',
    'something_else.log',
    '_d3rp_.txt'
]

   
for item in items:
    if item.endswith('log'):
        if '_d3rp_' in item:
            print(item)

or you could replace str.endswith() with
Code: Select all
if os.path.splitext(item)[1] == '.log':
instead.
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1418
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: regex - searching for fuzzy strings

Postby stranac » Wed Jul 03, 2013 2:36 pm

Or you could just use the glob module:
Code: Select all
import glob
matching_files = glob.glob('path/*d3rp*.log')
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1117
Joined: Thu Feb 07, 2013 3:42 pm

Re: regex - searching for fuzzy strings

Postby metulburr » Wed Jul 03, 2013 2:38 pm

Or you could just use the glob module:

+1, oh yeah i completely forgot about that.
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1418
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: regex - searching for fuzzy strings

Postby snippsat » Wed Jul 03, 2013 3:08 pm

Yes glob is good :)
Ok also soultion from metulburr.
Can have one if block if throw in and.
Code: Select all
if item.endswith('log') and '_d3rp_' in item:

No need for regex,just one as a example.
Code: Select all
import re

data = '''\
20130703_d3rp_3088_0.log
20130703_d3rp_3089_0.log
20130704_d3rp_3090_0.log
20130705_d3rp_3091_0.log
something_else.log
_d3rp_.txt'''

for match in re.finditer(r'.*_d3rp_.*log', data):
    print match.group()
User avatar
snippsat
 
Posts: 171
Joined: Thu Feb 21, 2013 12:04 am


Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot] and 4 guests