I have the following problem: I created a dictionary (german) with words and their corresponding lemma. exemple:
"Lagerbestände", "Lager-bestand"; "Wohnhäuser", "Wohn-haus"; "Bahnhof", "Bahn-hof"
I now have a text and I want to check for all word their lemmata. It can happen that it appears a word which is not in the dict, such as "Restbestände". But the lemma of "bestände", we already know. So I want wo take the first part of the word and add this to the lemmatized second part and print this out (or return it).
I coded the following:
- Code: Select all
for limit in range(1, len(Word)):
for k, v in dicti.iteritems():
if re.search('[\w]*'+Word[limit:], k, re.IGNORECASE) != None:
if '-' in v:
tmp = v.find('-')
end = v[tmp:]
end = re.sub(ur'[-]',"", end)
Word = Word[:limit] + '-' + end
But I got 2 problems:
1.) At the end of the words, it is printed out every time "
". How can I avoid this?
2.) The second part of the word is sometimes not correct - there must be a logical error.
However; how would you solve this?