Suppose you have a string which represents a morphologically complex word. How do you split that word into morphemes, given a list of roots and a list of affixes (and possibly infixes, though that’s harder)?
E.g.:
We start with «hablamos» [ablamos] and we end up with abla-mos.
This would presume a list which looked something like:
Now, here’s the question:
If you split all word strings, greedily, with a list of prefixes sorted by length, is that sufficient to do basic morphological analysis?
So, something like:
word = 'ablamos'
affixRE = re.compile('(mos|o)')
parts = word.split(affixRE)
Expand with more data… is that insanely oversimplified?