Paradigm structures are analyses, and dataset authors have freedom in how they want to formulate this analysis. Among the main problems are:
- What is the inventory of paradigm cells ?
- How should each cell be characterised?
- What counts as a lexeme ?
What is the inventory of paradigm cells ?
Data creators can provide labels of their choice, but should use a cells and features table to document the meaning of these labels, and map from these labels to existing standards and conventions.
How should each cell be characterised ?
For long term usability, it is important to account for paradigm structure choices in the documentation. A particularly tricky case is that of overdifferentiation. For example, in English, one might want to expand the person/number combinations of verbs to match pronouns and define the paradigm of verbs such as:
|first person singular||I eat||I ate|
|second person singular||you eat||you ate|
|third person singular||he/she/it eats||he/she/it ate|
|first person plural||we eat||we ate|
|second person plural||you eat||you ate|
|third person plural||they eat||they ate|
|Imperative||Present participle||Past participle||Infinitive|
However, for most verbs, it would be sufficient to stipulate:
|present 3 singular||eats|
This choice unfortunately has the consequence of requiring extra cells only for the verb to be:
|present 1 singular||am|
|present 3 singular||is|
|preterite 1/3 singular||was|
We suggest preferring structures which allow for uniform paradigm shapes and documenting these choices clearly. It is easier for users to go from such annotations to a more minimal paradigm structure, than to do the opposite. For propositions about "morphomic" paradigm structures, see Boyé & Schalchli (2016).
What should count as a lexeme
The creators of a dataset are free to produce the analysis which they believe best fit their data.
In some cases, a lexeme is entirely overabundant because it can take either of several inflection classes or stems. In other terms, a same lexeme could be split in several flexemes (see Fradin & Kerleroux 2003, Thornton 2018).
In this case, there are two main solutions:
- Either split these lexemes so that each lexeme identifier corresponds to a single flexeme
- Or account for the two levels by maintaining a single lexeme; and adding a flexeme table and flexeme identifiers.
- Fradin, Bernard & Françoise Kerleroux. 2003. Troubles with lexemes. In Geert Booij, Janet DeCesaris, Angela Ralli & Sergio Scalise (eds.), Selected papers from the third Mediterranean Morphology Meeting, 177–196. Barcelona: IULA – Universitat Pompeu Fabra.
- Boyé, G., & Schalchli, G. (2016). The Status of Paradigms. In A. Hippisley & G. Stump (Eds.), The Cambridge Handbook of Morphology (Cambridge Handbooks in Language and Linguistics, pp. 206-234). Cambridge: Cambridge University Press. DOI: 10.1017/9781139814720.009
- Anna M. Thornton (2018). Troubles with flexemes. In Olivier Bonami, Gilles Boyé, Georgette Dal, Hélène Giraudo & Fiammetta Namer (eds.), The lexeme in descriptive and theoretical morphology, 303–321. Berlin: Language Science Press. DOI: 10.5281/zenodo.1407011