regexp

This is the place for queries that don't fit in any of the other categories.

regexp

Postby metulburr » Thu Jun 13, 2013 12:23 pm

i am writing an IRC trivia bot which returns a question and an answer. The answer i am trying to parse with regexp, mainly because i am trying to figure them out, plus, i dont think splits, startswiths,etc. will do well either. The users answer is considered valid once the user enters all uppercase words of the answer in any order. The lowercase words and other stuff is just extra info regarding the answer. I am not really sure how you would go about doing this via regexp?

Code: Select all
import re

answer = 'CAIRO, Egypt / DAMASCUS, Syria'

user_response = 'cairo damascus'
user_response2 = 'damascus cairo'


for word in user_response.split():
    search = r'(?s)[A-Z]*'   #.format(word)
    res = re.findall(search, answer)
    print(res)


some string examples of the answers of some that it has returned are:
Code: Select all
CAIRO, Egypt / DAMASCUS, Syria
$750
U.S. COAST GUARD
YES--Viking I in 1976
LILITH FAIR / SARAH McLACHLAN
...any three of ... CELICA / CAMRY / COROLLA / CRESSIDA
USA AND RUSSIA / ALASKA AND SIBERIA
a. 1939   b. 1961   c. 1979
The ARIZONA CARDINALS who began in 1899.  They were formally the Phoenix Cardinals, the St. Louis Cardinals, the Chicago Cardinals, and the Racine Normals.

with such a wide variety of answers i am not sure how to parse them. the / is to define the that the answer requires both left and right. Or maybe this is a larger task than i thought?
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1382
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: regexp

Postby Mekire » Thu Jun 13, 2013 1:11 pm

Coincidentally I just came across this quote today:
Jamie Zawinski wrote:Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.


Honestly I think that unless you are going to make your answer formats very strict you would be better off with using some clear delimiter and using split. Then compare the users answer split and sorted, with the desired answer split and sorted.

-Mek
User avatar
Mekire
 
Posts: 986
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: regexp

Postby setrofim » Thu Jun 13, 2013 2:09 pm

Code: Select all
>>> import re
>>>
>>> answer = 'CAIRO, Egypt / DAMASCUS, Syria'
>>>
>>> user_response = 'cairo damascus'
>>> user_response2 = 'damascus cairo'
>>>
>>>
>>> for word in user_response.split():
...     search = r'\b[A-Z]+\b'
...     res = re.findall(search, answer)
...     print(res)
...
['CAIRO', 'DAMASCUS']
['CAIRO', 'DAMASCUS']

"*" means "zero or more", so "[A-Z]*" would match an empty string (which bacially matches anywhere). "+" means "one or more", so it ensures that at least one capital letter is present.

"\b" is a "word boundary" it matches an empty string on the word boundary, i.e. just before first character in a word and just after last character in a word. A "word" is a sequence of alphanumeric characters and underscores, i.e. "\w+". Here, it prevents matching on mixed-case words such as "Egypt".
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm

Re: regexp

Postby metulburr » Thu Jun 13, 2013 2:32 pm

hmm, for some reason i thought the word boundary would have gotten screwed up with the comma next to it. Whats funny is that sounds completely plausible. Except when going to parse a string at that time it all goes blank, lol.

assuming you did not want to catch I or A alone, i thought this was the method of limiting it to 2 characters or more, but i guess not:
Code: Select all
r'\b[A-Z]{2}+\b'
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1382
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: regexp

Postby metulburr » Thu Jun 13, 2013 2:57 pm

@mekire
the formats are from triviacafe.com 's random trivia Q and A, and some of them have a weird variety of "added info" along with the answer that makes it a pain to parse out the actual answer
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1382
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: regexp

Postby metulburr » Thu Jun 13, 2013 3:08 pm

oh its like a single tuple sort of, ok i think i got something, thanks
Code: Select all
import re

answer = 'CAIRO, Egypt / DAMASCUS, Syria, I, D29'

user_response = 'cairo damascus'
user_response2 = 'damascus cairo'


for word in user_response.split():
    search = r'\b[A-Z0-9]{2,}\b'   #.format(word)
    res = re.findall(search, answer)
    print(res)
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1382
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: regexp

Postby DrakeMagi » Thu Jun 13, 2013 3:30 pm

here what i came up with.
Code: Select all
import re
import itertools

answer = 'CAIRO, Egypt / DAMASCUS, Syria'

def answer_split(a):
   ans = a.split(',')
   words = [a.split('/') for a in ans]
   words = [[a.strip() for a in w] for w in words]
   a_list = [list(w) for w in itertools.product(*words)]
   return a_list, list(itertools.chain(*words))

def response(answers, alist, user):
   for r in xrange(len(answers)):
      answers[r].sort()
   group = []
   user = ''.join([user, ' '])
   
   alist = [''.join([a.strip(), ' ']) for a in alist]
   
   for a in alist:
      search = re.search(a, user)
      try:
         group.append(search.group(0))
      except:
         pass
   
   group = [g.strip() for g in group]
   group.sort()
   if group in answers:
      return True
   return False

user_1 = 'Syria CAIRO Egypt'
user_2 = 'Syria DAMASCUS CAIRO'
user_3 = 'Syrian DAMASCUS CAIRO'

ans, alist = answer_split(answer)

print response(ans, alist, user_1)
print response(ans, alist, user_2)
print response(ans, alist, user_3)
Linux: won't find windows here.
Linux: the choice of a GNU generation.
https://github.com/DrakeMagi
DrakeMagi
 
Posts: 96
Joined: Sun May 12, 2013 8:36 pm

Re: regexp

Postby micseydel » Thu Jun 13, 2013 10:25 pm

I haven't looked closely here, but sets come to mind.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1220
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA


Return to General Coding Help

Who is online

Users browsing this forum: No registered users and 2 guests