This is the place for queries that don't fit in any of the other categories.


Postby MarcelF6 » Fri Apr 19, 2013 7:39 pm

Hi guys,

I have the following problem: I created a dictionary (german) with words and their corresponding lemma. exemple:
"Lagerbestände", "Lager-bestand"; "Wohnhäuser", "Wohn-haus"; "Bahnhof", "Bahn-hof"

I now have a text and I want to check for all word their lemmata. It can happen that it appears a word which is not in the dict, such as "Restbestände". But the lemma of "bestände", we already know. So I want wo take the first part of the word and add this to the lemmatized second part and print this out (or return it).

I coded the following:
Code: Select all
for limit in range(1, len(Word)):
      for k, v in dicti.iteritems():
         if re.search('[\w]*'+Word[limit:], k, re.IGNORECASE) != None:
            if '-' in v:
               tmp = v.find('-')
               end = v[tmp:]
               end = re.sub(ur'[-]',"", end)
               Word = Word[:limit] + '-' + end

But I got 2 problems:
1.) At the end of the words, it is printed out every time "&#10". How can I avoid this?
2.) The second part of the word is sometimes not correct - there must be a logical error.

However; how would you solve this?
Posts: 3
Joined: Fri Apr 19, 2013 7:28 pm

Return to General Coding Help

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], Mekire, micseydel, snippsat and 4 guests