trying to avoid using regexp

This is the place for queries that don't fit in any of the other categories.

trying to avoid using regexp

Postby metulburr » Tue Sep 24, 2013 12:43 pm

I am essetially tampering into making your own language. Still making a lexer. So with a basic string of something like these would all have to match with an ID, operator, int or string, and semi-colon terminating the line
Code: Select all
a=10.12;

Code: Select all
a = 10;

Code: Select all
a = (20 - 4) *  2 + 4;

Code: Select all
a     =      10    ;

Code: Select all
var = 10;

Code: Select all
var = "string";

I am trying to determine if regex is the easiest solution or not? I mean i can come up with str methods use, but i think this might be a case where it appears regex are easier to assign the tokens a tag? Especially when this is just an assignment line, let alone other lines that would have to account for while/for loops and whatenot.
New Users, Read This
version Python 3.3.2 and 2.7.5, tkinter 8.5, pyqt 4.8.4, pygame 1.9.2 pre
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
User avatar
metulburr
 
Posts: 1122
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: trying to avoid using regexp

Postby metulburr » Tue Sep 24, 2013 1:54 pm

maybe i can do it jsut as easily without regexp

Code: Select all
text = '''\n
a = 1\n
b=2.2\n
ccc = "string"\n
long = (20 - 4) *  2 + 4\n
spaced     =      10
a = 11
'''

env = {}

count = 0
for line in text.split('\n'):
    count += 1
    if '=' in line:
        ID = line.split('=')[0].strip()
        value = line.split('=')[1].strip()
       
        if not env.get(ID):
            #convert to int/float if needed
            try:
                value = int(value)
            except ValueError:
                try:
                    value = float(value)
                except ValueError:
                    pass
                   
            env[ID] = value
        else:
            print('ERROR: {}: "{}" is already defined'.format(text.split('\n')[count-1], ID))

print(env)
New Users, Read This
version Python 3.3.2 and 2.7.5, tkinter 8.5, pyqt 4.8.4, pygame 1.9.2 pre
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
User avatar
metulburr
 
Posts: 1122
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: trying to avoid using regexp

Postby micseydel » Tue Sep 24, 2013 6:12 pm

Regular expressions handle regular languages which do not contain arbitrarily nested parenthesis. (I know the re module is more powerful than regular languages, but I'm not sure by how much.) I took a compilers class where we learned about all kinds of this stuff, but I don't remember much of it. You should see if someone has written a free book on the topic, or done a MOOC for it or something.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 939
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: trying to avoid using regexp

Postby ochichinyezaboombwa » Tue Sep 24, 2013 9:02 pm

Take a look at LEX &YACC.
ochichinyezaboombwa
 
Posts: 200
Joined: Tue Jun 04, 2013 7:53 pm

Re: trying to avoid using regexp

Postby micseydel » Tue Sep 24, 2013 9:08 pm

ochichinyezaboombwa wrote:Take a look at LEX &YACC.

+1
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 939
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA


Return to General Coding Help

Who is online

Users browsing this forum: No registered users and 1 guest

cron