Paralex specifications (default package)
This is a human-readable rendition of a JSON file defining a frictionless package. It was generated automatically.
name
: paralexlicenses
:-
keywords
: lexicon, inflection, linguistics, morphology, paradigms -
profile
data-package contributors
- [1]
title
Sacha Beniaminerole
maintainer
- [2]
title
Cormac Andersonrole
contributor
- [3]
title
Jules Boutonrole
contributor
- [4]
title
Mae Carrollrole
contributor
- [5]
title
Borja Hercerole
contributor
- [6]
title
Matías Guzmán Naranjorole
contributor
- [7]
title
Matteo Pellegrinirole
contributor
- [8]
title
Erich Roundrole
contributor
- [9]
title
Helen Sims-Williamsrole
contributor
version
2.2.3languages_iso639
[]
This package describes the following files:
forms
Inflected forms
- This file is located in forms.csv
.
-
The identifier column (or
primaryKey
) is['form_id']
-
Formal relations (foreignKeys) with other tables:
- Each value in column
['cell']
of forms must refer to['cell_id']
in tablecells
- Each value in column
['lexeme']
of forms must refer to['lexeme_id']
in tablelexemes
- Each value in column
Columns defined by forms-schema
:
-
form_id
(string
): Form table row identifiers. These identifiers are specific to form, lexeme, cell triples.- constraints: a
form_id
is obligatory; it must be unique.
- constraints: a
-
lexeme
(string
): Reference to a lexeme identifier. Lexeme identifiers must be unique to paradigms.-
constraints: a
lexeme
is obligatory. -
rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#lexeme
-
-
cell
(string
): Reference to a cell identifier. The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl-
constraints: a
cell
is obligatory. -
rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#cell
-
-
phon_form
(string
): Inflected form (phonemic or phonetic). The form, given in phonemic or phonetic notation, with sounds separated by spacesrdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#phon_formmissingValues
:#DEF#
-
orth_form
(string
): Inflected form (orthographic). The form, given orthographicallyrdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#orth_formmissingValues
:#DEF#
-
analysed_phon_form
(string
): Inflected form with analysis, such as segmentation markers (phonemic or phonetic). The form, given in phonemic or phonetic notation, with sounds separated by spaces, and analysis markers.rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#analysed_phon_formmissingValues
:#DEF#
-
analysed_orth_form
(string
): Inflected form with analysis, such as segmentation markers (orthographic). The form, given orthographically, with markers for analysis.rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#analysed_orth_formmissingValues
:#DEF#
-
frequency
(number
): Frequency. Frequency for this row. -
analysis_tag
(string
): Tags for marking separate analyses. Identifies sets of forms which are related by the same analysis. Eg: forms from two distinct sources, or a more phonetic and a more phonological transcription. -
defectiveness_tag
(string
): Tags for defectiveness status. Identifies sets of defective forms (eg. pluralia tantum). -
epistemic_tag
(string
): Tags for epistemic status. Identifies sets of forms with the same epistemic status. -
variants_tag
(string
): Tags for form variants. Identifies sets of forms used by specific groups of speakers. Eg. dialectal variants. -
overabundance_tag
(string
): Tags for overabundant forms. Identifies sets of overabundant forms. For example, overabundant forms across lexemes might belong to a series of regular and irregular forms, or a series of short and long forms, etc. -
comment
(string
): Comment. Human-readable comment.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#comment
-
source
(string
): Source. Reference to a specific source (bibtex key). If used, the dataset should comprise a .bib file where the keys are referenced. -
POS
(string
): Part of Speech. The relevant part of speech for this item. This must refer to a PartOfSpeech entity from the lexinfo (https://lexinfo.net/) ontology.-
constraints: a
POS
must be one of the values:verb
,numeral
,conjunction
,noun
,adposition
,determiner
,article
,adverb
,pronoun
,fusedPreposition
,adjective
,symbol
,particle
,conditionalParticle
,demonstrativePronoun
,interjection
,semiColon
,diminutiveNoun
,possessivePronoun
,prepositionalAdverb
,compoundPreposition
,interrogativeRelativePronoun
,possessiveParticle
,plainVerb
,letter
,interrogativeDeterminer
,relativePronoun
,postposition
,fusedPronounAuxiliary
,interrogativeOrdinalNumeral
,indefiniteOrdinalNumeral
,strongPersonalPronoun
,possessiveRelativePronoun
,ordinalAdjective
,collectivePronoun
,commonNoun
,infinitiveParticle
,comparativeParticle
,partitiveArticle
,invertedComma
,lightVerb
,emphaticPronoun
,distinctiveParticle
,genericNumeral
,possessiveAdjective
,reflexivePossessivePronoun
,colon
,coordinationParticle
,presentParticipleAdjective
,fusedPrepositionPronoun
,cardinalNumeral
,indefiniteDeterminer
,numeralFraction
,questionMark
,generalAdverb
,superlativeParticle
,point
,indefiniteMultiplicativeNumeral
,comma
,closeParenthesis
,futureParticle
,personalPronoun
,reflexivePersonalPronoun
,adverbialPronoun
,reciprocalPronoun
,openParenthesis
,pastParticipleAdjective
,negativePronoun
,relativeDeterminer
,existentialPronoun
,pronominalAdverb
,relativeParticle
,exclamativeDeterminer
,multiplicativeNumeral
,reflexiveDeterminer
,modal
,unclassifiedParticle
,properNoun
,allusivePronoun
,interrogativeCardinalNumeral
,bullet
,subordinatingConjunction
,irreflexivePersonalPronoun
,possessiveDeterminer
,negativeParticle
,indefinitePronoun
,generalizationWord
,coordinatingConjunction
,deficientVerb
,adjective-i
,impersonalPronoun
,indefiniteCardinalNumeral
,adjective-na
,qualifierAdjective
,affirmativeParticle
,mainVerb
,fusedPrepositionDeterminer
,indefiniteArticle
,weakPersonalPronoun
,suspensionPoints
,interrogativeMultiplicativeNumeral
,affixedPersonalPronoun
,auxiliary
,circumposition
,copula
,demonstrativeDeterminer
,participleAdjective
,exclamativePoint
,interrogativePronoun
,presentativePronoun
,punctuation
,definiteArticle
,slash
,exclamativePronoun
,preposition
,conditionalPronoun
,relationNoun
,interrogativeParticle
. -
rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#POS
-
frequencies
Frequencies
- This file is located in frequencies.csv
.
-
The identifier column (or
primaryKey
) is['freq_id']
-
Formal relations (foreignKeys) with other tables:
- Each value in column
['cell']
of frequencies must refer to['cell_id']
in tablecells
- Each value in column
['form']
of frequencies must refer to['form_id']
in tableforms
- Each value in column
['lexeme']
of frequencies must refer to['lexeme_id']
in tablelexemes
- Each value in column
Columns defined by frequencies-schema
:
-
freq_id
(string
): Frequency record identifier. One frequency value, for a single data point and source- constraints: a
freq_id
is obligatory; it must be unique.
- constraints: a
-
form
(string
): Reference to form table row identifiers. These identifiers are specific to form, lexeme, cell triples. -
lexeme
(string
): Reference to a lexeme identifier. Lexeme identifiers must be unique to paradigms. -
cell
(string
): Reference to a cell identifier. The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl -
frequency
(number
): Frequency. Frequency for this row. -
analysis_tag
(string
): Tags for marking separate analyses. Identifies sets of forms which are related by the same analysis. Eg: forms from two distinct sources, or a more phonetic and a more phonological transcription. -
defectiveness_tag
(string
): Tags for defectiveness status. Identifies sets of defective forms (eg. pluralia tantum). -
epistemic_tag
(string
): Tags for epistemic status. Identifies sets of forms with the same epistemic status. -
variants_tag
(string
): Tags for form variants. Identifies sets of forms used by specific groups of speakers. Eg. dialectal variants. -
overabundance_tag
(string
): Tags for overabundant forms. Identifies sets of overabundant forms. For example, overabundant forms across lexemes might belong to a series of regular and irregular forms, or a series of short and long forms, etc. -
source
(string
): Source. Reference to a specific source (bibtex key). If used, the dataset should comprise a .bib file where the keys are referenced.
sounds
Sound inventory with distinctive features
- This file is located in sounds.csv
.
- The identifier column (or
primaryKey
) is['sound_id']
missingValues
: ``
Columns defined by sounds-schema
:
-
sound_id
(string
): sound representation. These identifiers are specific to sounds.- constraints: a
sound_id
is obligatory; it must be unique.
- constraints: a
-
label
(string
): label for this row. A human readable label for the row.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#label
-
comment
(string
): Comment. Human-readable comment.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#comment
-
CLTS_id
(string
): Identifier of this sound in CLTS. Reference to this sound in CLTS data.-
constraints: a
CLTS_id
must be unique. -
rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#CLTS_id
-
-
PHOIBLE_id
(string
): Identifier of this sound in PHOIBLE. Reference to this sound in PHOIBLE.-
constraints: a
PHOIBLE_id
must be unique. -
rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#PHOIBLE_id
-
graphemes
Graphemes inventory
- This file is located in graphemes.csv
.
- The identifier column (or
primaryKey
) is['grapheme_id']
missingValues
: ``
Columns defined by graphemes-schema
:
-
grapheme_id
(string
): grapheme representation. These identifiers are specific to graphemes.-
constraints: a
grapheme_id
is obligatory; it must be unique.
-
-
comment
(string
): Comment. Human-readable comment.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#comment
-
canonical_order
(integer
): Sorting order for visual presentation. The order in which items are canonically presented. Use integers to represent relative order, order is used per-item.
cells
Paradigm cells
- This file is located in cells.csv
.
- The identifier column (or
primaryKey
) is['cell_id']
Columns defined by cells-schema
:
-
cell_id
(string
): Cell identifier. The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl- constraints: a
cell_id
is obligatory; it must be unique.
- constraints: a
-
label
(string
): label for this row. A human readable label for the row.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#label
-
unimorph
(string
): Cell in unimorph format. The cell, written following the unimorph schema -
ud
(string
): Cell in the universal dependency format. The cell, written following the universal dependency format -
comment
(string
): Comment. Human-readable comment.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#comment
-
POS
(string
): Part of Speech. The relevant part of speech for this item. This must refer to a PartOfSpeech entity from the lexinfo (https://lexinfo.net/) ontology.-
constraints: a
POS
must be one of the values:verb
,numeral
,conjunction
,noun
,adposition
,determiner
,article
,adverb
,pronoun
,fusedPreposition
,adjective
,symbol
,particle
,conditionalParticle
,demonstrativePronoun
,interjection
,semiColon
,diminutiveNoun
,possessivePronoun
,prepositionalAdverb
,compoundPreposition
,interrogativeRelativePronoun
,possessiveParticle
,plainVerb
,letter
,interrogativeDeterminer
,relativePronoun
,postposition
,fusedPronounAuxiliary
,interrogativeOrdinalNumeral
,indefiniteOrdinalNumeral
,strongPersonalPronoun
,possessiveRelativePronoun
,ordinalAdjective
,collectivePronoun
,commonNoun
,infinitiveParticle
,comparativeParticle
,partitiveArticle
,invertedComma
,lightVerb
,emphaticPronoun
,distinctiveParticle
,genericNumeral
,possessiveAdjective
,reflexivePossessivePronoun
,colon
,coordinationParticle
,presentParticipleAdjective
,fusedPrepositionPronoun
,cardinalNumeral
,indefiniteDeterminer
,numeralFraction
,questionMark
,generalAdverb
,superlativeParticle
,point
,indefiniteMultiplicativeNumeral
,comma
,closeParenthesis
,futureParticle
,personalPronoun
,reflexivePersonalPronoun
,adverbialPronoun
,reciprocalPronoun
,openParenthesis
,pastParticipleAdjective
,negativePronoun
,relativeDeterminer
,existentialPronoun
,pronominalAdverb
,relativeParticle
,exclamativeDeterminer
,multiplicativeNumeral
,reflexiveDeterminer
,modal
,unclassifiedParticle
,properNoun
,allusivePronoun
,interrogativeCardinalNumeral
,bullet
,subordinatingConjunction
,irreflexivePersonalPronoun
,possessiveDeterminer
,negativeParticle
,indefinitePronoun
,generalizationWord
,coordinatingConjunction
,deficientVerb
,adjective-i
,impersonalPronoun
,indefiniteCardinalNumeral
,adjective-na
,qualifierAdjective
,affirmativeParticle
,mainVerb
,fusedPrepositionDeterminer
,indefiniteArticle
,weakPersonalPronoun
,suspensionPoints
,interrogativeMultiplicativeNumeral
,affixedPersonalPronoun
,auxiliary
,circumposition
,copula
,demonstrativeDeterminer
,participleAdjective
,exclamativePoint
,interrogativePronoun
,presentativePronoun
,punctuation
,definiteArticle
,slash
,exclamativePronoun
,preposition
,conditionalPronoun
,relationNoun
,interrogativeParticle
. -
rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#POS
-
-
frequency
(number
): Frequency. Frequency for this row. -
canonical_order
(integer
): Sorting order for visual presentation. The order in which items are canonically presented. Use integers to represent relative order, order is used per-item.
features-values
Grammatical features values
- This file is located in features-values.csv
.
- The identifier column (or
primaryKey
) is['value_id']
Columns defined by features-values-schema
:
-
value_id
(string
): Grammatical Feature value identifier. Identifier for the grammatical feature value (as found in the cell)- constraints: a
value_id
is obligatory; it must be unique.
- constraints: a
-
label
(string
): label for this row. A human readable label for the row.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#label
-
feature
(string
): feature. The name of the dimension of this feature, eg. case, tense, modality, voice, force, gender, evidentiality, person, number, polarity...-
constraints: a
feature
is obligatory. -
rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#feature
-
-
comment
(string
): Comment. Human-readable comment.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#comment
-
POS
(string
): Part of Speech. The relevant part of speech for this item. This must refer to a PartOfSpeech entity from the lexinfo (https://lexinfo.net/) ontology.-
constraints: a
POS
must be one of the values:verb
,numeral
,conjunction
,noun
,adposition
,determiner
,article
,adverb
,pronoun
,fusedPreposition
,adjective
,symbol
,particle
,conditionalParticle
,demonstrativePronoun
,interjection
,semiColon
,diminutiveNoun
,possessivePronoun
,prepositionalAdverb
,compoundPreposition
,interrogativeRelativePronoun
,possessiveParticle
,plainVerb
,letter
,interrogativeDeterminer
,relativePronoun
,postposition
,fusedPronounAuxiliary
,interrogativeOrdinalNumeral
,indefiniteOrdinalNumeral
,strongPersonalPronoun
,possessiveRelativePronoun
,ordinalAdjective
,collectivePronoun
,commonNoun
,infinitiveParticle
,comparativeParticle
,partitiveArticle
,invertedComma
,lightVerb
,emphaticPronoun
,distinctiveParticle
,genericNumeral
,possessiveAdjective
,reflexivePossessivePronoun
,colon
,coordinationParticle
,presentParticipleAdjective
,fusedPrepositionPronoun
,cardinalNumeral
,indefiniteDeterminer
,numeralFraction
,questionMark
,generalAdverb
,superlativeParticle
,point
,indefiniteMultiplicativeNumeral
,comma
,closeParenthesis
,futureParticle
,personalPronoun
,reflexivePersonalPronoun
,adverbialPronoun
,reciprocalPronoun
,openParenthesis
,pastParticipleAdjective
,negativePronoun
,relativeDeterminer
,existentialPronoun
,pronominalAdverb
,relativeParticle
,exclamativeDeterminer
,multiplicativeNumeral
,reflexiveDeterminer
,modal
,unclassifiedParticle
,properNoun
,allusivePronoun
,interrogativeCardinalNumeral
,bullet
,subordinatingConjunction
,irreflexivePersonalPronoun
,possessiveDeterminer
,negativeParticle
,indefinitePronoun
,generalizationWord
,coordinatingConjunction
,deficientVerb
,adjective-i
,impersonalPronoun
,indefiniteCardinalNumeral
,adjective-na
,qualifierAdjective
,affirmativeParticle
,mainVerb
,fusedPrepositionDeterminer
,indefiniteArticle
,weakPersonalPronoun
,suspensionPoints
,interrogativeMultiplicativeNumeral
,affixedPersonalPronoun
,auxiliary
,circumposition
,copula
,demonstrativeDeterminer
,participleAdjective
,exclamativePoint
,interrogativePronoun
,presentativePronoun
,punctuation
,definiteArticle
,slash
,exclamativePronoun
,preposition
,conditionalPronoun
,relationNoun
,interrogativeParticle
. -
rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#POS
-
-
unimorph
(string
): Cell in unimorph format. The cell, written following the unimorph schema -
ud
(string
): Cell in the universal dependency format. The cell, written following the universal dependency format -
canonical_order
(integer
): Sorting order for visual presentation. The order in which items are canonically presented. Use integers to represent relative order, order is used per-item.
lexemes
Lexemes
- This file is located in lexemes.csv
.
- The identifier column (or
primaryKey
) is['lexeme_id']
Columns defined by lexemes-schema
:
-
lexeme_id
(string
): Identifier for the lexeme. Lexeme identifiers. Often, they are identical to the label (lemma). However, they must be unique to paradigms, distinguishing homonyms with different inflection. For example, the animal mouse/mice and the computer peripheric mouse/mouses would both have the label 'mouse' but could be identified by the lexeme identifiers mouse_1 and mouse_2.- constraints: a
lexeme_id
is obligatory; it must be unique.
- constraints: a
-
inflection_class
(string
): Inflection class identifier. This identifier groups together lexemes of the same inflection class. -
source
(string
): Source. Reference to a specific source (bibtex key). If used, the dataset should comprise a .bib file where the keys are referenced. -
frequency
(number
): Frequency. Frequency for this row. -
label
(string
): label for this row. A human readable label for the row.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#label
-
meaning
(string
): Definition for this lexeme. This is a description of the lexeme's overall meaning. -
gloss
(string
): Short meaning, used for glossing.. Gloss for this lexeme. -
comment
(string
): Comment. Human-readable comment.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#comment
-
POS
(string
): Part of Speech. The relevant part of speech for this item. This must refer to a PartOfSpeech entity from the lexinfo (https://lexinfo.net/) ontology.-
constraints: a
POS
must be one of the values:verb
,numeral
,conjunction
,noun
,adposition
,determiner
,article
,adverb
,pronoun
,fusedPreposition
,adjective
,symbol
,particle
,conditionalParticle
,demonstrativePronoun
,interjection
,semiColon
,diminutiveNoun
,possessivePronoun
,prepositionalAdverb
,compoundPreposition
,interrogativeRelativePronoun
,possessiveParticle
,plainVerb
,letter
,interrogativeDeterminer
,relativePronoun
,postposition
,fusedPronounAuxiliary
,interrogativeOrdinalNumeral
,indefiniteOrdinalNumeral
,strongPersonalPronoun
,possessiveRelativePronoun
,ordinalAdjective
,collectivePronoun
,commonNoun
,infinitiveParticle
,comparativeParticle
,partitiveArticle
,invertedComma
,lightVerb
,emphaticPronoun
,distinctiveParticle
,genericNumeral
,possessiveAdjective
,reflexivePossessivePronoun
,colon
,coordinationParticle
,presentParticipleAdjective
,fusedPrepositionPronoun
,cardinalNumeral
,indefiniteDeterminer
,numeralFraction
,questionMark
,generalAdverb
,superlativeParticle
,point
,indefiniteMultiplicativeNumeral
,comma
,closeParenthesis
,futureParticle
,personalPronoun
,reflexivePersonalPronoun
,adverbialPronoun
,reciprocalPronoun
,openParenthesis
,pastParticipleAdjective
,negativePronoun
,relativeDeterminer
,existentialPronoun
,pronominalAdverb
,relativeParticle
,exclamativeDeterminer
,multiplicativeNumeral
,reflexiveDeterminer
,modal
,unclassifiedParticle
,properNoun
,allusivePronoun
,interrogativeCardinalNumeral
,bullet
,subordinatingConjunction
,irreflexivePersonalPronoun
,possessiveDeterminer
,negativeParticle
,indefinitePronoun
,generalizationWord
,coordinatingConjunction
,deficientVerb
,adjective-i
,impersonalPronoun
,indefiniteCardinalNumeral
,adjective-na
,qualifierAdjective
,affirmativeParticle
,mainVerb
,fusedPrepositionDeterminer
,indefiniteArticle
,weakPersonalPronoun
,suspensionPoints
,interrogativeMultiplicativeNumeral
,affixedPersonalPronoun
,auxiliary
,circumposition
,copula
,demonstrativeDeterminer
,participleAdjective
,exclamativePoint
,interrogativePronoun
,presentativePronoun
,punctuation
,definiteArticle
,slash
,exclamativePronoun
,preposition
,conditionalPronoun
,relationNoun
,interrogativeParticle
. -
rdfProperty
: https://www.paralex-standard.org/paralex_ontology.xml#POS
-
-
language_ID
(string
): Identifier for the language. Language identifiers should use some standard ID (iso code, glottocode, etc) -
analysis_tag
(string
): Tags for marking separate analyses. Identifies sets of forms which are related by the same analysis. Eg: forms from two distinct sources, or a more phonetic and a more phonological transcription. -
defectiveness_tag
(string
): Tags for defectiveness status. Identifies sets of defective forms (eg. pluralia tantum). -
epistemic_tag
(string
): Tags for epistemic status. Identifies sets of forms with the same epistemic status. -
variants_tag
(string
): Tags for form variants. Identifies sets of forms used by specific groups of speakers. Eg. dialectal variants. -
overabundance_tag
(string
): Tags for overabundant forms. Identifies sets of overabundant forms. For example, overabundant forms across lexemes might belong to a series of regular and irregular forms, or a series of short and long forms, etc.
tags
Tags mark rows which have commonalities
- This file is located in tags.csv
.
- The identifier column (or
primaryKey
) is['tag_id']
Columns defined by tags-schema
:
-
tag_id
(string
): Tag id. The label for a set of forms which have something in common.- constraints: a
tag_id
is obligatory; it must be unique.
- constraints: a
-
tag_column_name
(string
): Name of the tag column in the forms table. The name of the column this tag is used in the forms table- constraints: a
tag_column_name
is obligatory; it must match the regular expression[^ ]+_tag
.
- constraints: a
-
comment
(string
): Comment. Human-readable comment.rdfProperty
: http://www.w3.org/2000/01/rdf-schema#comment
sources
Sources| Bibliographical references.
- This file is located in
sources.bib
.
readme
Read me| Basic documentation
- This file is located in
readme.md
.
data_sheet
Data Sheet| Data Sheet
- This file is located in
data_sheet.md
.