Although the standard specifies how to handle many complex case, at its core, it only requires very little. This page demonstrates some examples of small, minimal valid paralex packages.
Minimal dataset
This example does only the very minimum to have a valid paralex dataset
Example
This is a minimal paralex dataset, which comprises:
forms.csv
: A forms table:- The 6 present forms, in phonemic notation, of the verb chanter, taken from Vlexique (https://doi.org/10.5281/zenodo.13366341).
- The only present columns are the mandatory ones:
form_id
,lexeme
,cell
,phon_form
. - Alternatively, a dataset with only
orth_form
would also be valid.
paralex-infos.yml
: A config file with minimal informationparalex-min-chanter.package.json
: A JSON generated by doing:
The dataset, although valid, is missing a data_sheet.md
, as for such a small example, the file was nonsensical.
form_id,lexeme,cell,phon_form
form-chanter-prs-1pl,chanter,ind.prs.1.pl,ʃ ɑ̃ t ɔ̃
form-chanter-prs-1sg,chanter,ind.prs.1.sg,ʃ ɑ̃ t
form-chanter-prs-2pl,chanter,ind.prs.2.pl,ʃ ɑ̃ t e
form-chanter-prs-2sg,chanter,ind.prs.2.sg,ʃ ɑ̃ t
form-chanter-prs-3pl,chanter,ind.prs.3.pl,ʃ ɑ̃ t
form-chanter-prs-3sg,chanter,ind.prs.3.sg,ʃ ɑ̃ t
title: Minimal example - Indicative present of The French Verb "Chanter"
name: paralex-min-chanter
version: '1.0'
languages_iso639:
- fr
contributors:
- title: Sacha Beniamine
keywords:
- example
- toy
- French
- verbs
- paradigms
- paralex
licenses:
- name: CC-BY-SA-4.0
path: 'https://creativecommons.org/licenses/by-sa/4.0/'
title: Attribution-ShareAlike 4.0 International
files:
forms:
path: forms.csv
readme:
path: readme.md
File generated automatically by running paralex meta paralex-infos.yml
{
"name": "paralex-min-chanter",
"title": "Minimal example - Indicative present of The French Verb \"Chanter\"",
"profile": "data-package",
"licenses": [
{
"name": "CC-BY-SA-4.0",
"path": "https://creativecommons.org/licenses/by-sa/4.0/",
"title": "Attribution-ShareAlike 4.0 International"
}
],
"contributors": [
{
"title": "Sacha Beniamine"
}
],
"keywords": [
"example",
"toy",
"French",
"verbs",
"paradigms",
"paralex"
],
"version": "1.0",
"resources": [
{
"name": "readme",
"type": "text",
"title": "Read me",
"description": "Basic documentation",
"path": "readme.md",
"scheme": "file",
"format": "md",
"mediatype": "text/markdown",
"encoding": "utf-8"
},
{
"name": "forms",
"type": "table",
"title": "Inflected forms",
"path": "forms.csv",
"scheme": "file",
"format": "csv",
"mediatype": "text/csv",
"encoding": "utf-8",
"schema": {
"name": "forms-schema",
"fields": [
{
"name": "form_id",
"type": "string",
"title": "Form table row identifiers",
"description": "These identifiers are specific to form, lexeme, cell triples.",
"constraints": {
"required": true,
"unique": true
}
},
{
"name": "lexeme",
"type": "string",
"title": "Reference to a lexeme identifier",
"description": "Lexeme identifiers must be unique to paradigms.",
"constraints": {
"required": true
},
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
},
{
"name": "cell",
"type": "string",
"title": "Reference to a cell identifier",
"description": "The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl",
"constraints": {
"required": true
},
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#cell"
},
{
"name": "phon_form",
"type": "string",
"title": "Inflected form (phonemic or phonetic)",
"description": "The form, given in phonemic or phonetic notation, with sounds separated by spaces",
"missingValues": [
"#DEF#"
],
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#phon_form"
}
],
"primaryKey": [
"form_id"
]
},
"rdfType": "https://www.paralex-standard.org/paralex_ontology.xml#Form"
}
],
"languages_iso639": [
"fr"
],
"paralex-version": "2.2.3"
}
Multi paths
This dataset illustrates the usage of multiple data paths.
Example
This is a minimal paralex dataset, which exemplifies splitting a table (here the forms) in two files. In the current case, the data is tiny, but this is usually done with large datasets, to avoid getting enormous files:
- A forms table:
- The 6 present forms, in phonemic notation, of the verb chanter, taken from Vlexique (https://doi.org/10.5281/zenodo.13366341).
- The only present columns are the mandatory ones:
form_id
,lexeme
,cell
,phon_form
. forms.csv
: the three first formsforms2.csv
: the three last forms
paralex-infos.yml
: A config file with minimal informationparalex-multipart-chanter.package.json
: A JSON generated by doing:
The dataset, although valid, is missing a data_sheet.md
, as for such a small example, the file was nonsensical.
title: Minimal example - Indicative present of The French Verb "Chanter"
name: paralex-multipart-chanter
version: '1.0'
languages_iso639:
- fr
contributors:
- title: Sacha Beniamine
keywords:
- example
- toy
- French
- verbs
- paradigms
- paralex
licenses:
- name: CC-BY-SA-4.0
path: 'https://creativecommons.org/licenses/by-sa/4.0/'
title: Attribution-ShareAlike 4.0 International
files:
forms:
path:
- forms.csv
- forms2.csv
readme:
path: readme.md
File generated automatically by running paralex meta paralex-infos.yml
{
"name": "paralex-multipart-chanter",
"title": "Minimal example - Indicative present of The French Verb \"Chanter\"",
"profile": "data-package",
"licenses": [
{
"name": "CC-BY-SA-4.0",
"path": "https://creativecommons.org/licenses/by-sa/4.0/",
"title": "Attribution-ShareAlike 4.0 International"
}
],
"contributors": [
{
"title": "Sacha Beniamine"
}
],
"keywords": [
"example",
"toy",
"French",
"verbs",
"paradigms",
"paralex"
],
"version": "1.0",
"resources": [
{
"name": "readme",
"type": "text",
"title": "Read me",
"description": "Basic documentation",
"path": "readme.md",
"scheme": "file",
"format": "md",
"mediatype": "text/markdown",
"encoding": "utf-8"
},
{
"name": "forms",
"type": "table",
"title": "Inflected forms",
"path": "forms.csv",
"scheme": "multipart",
"format": "csv",
"mediatype": "text/csv",
"extrapaths": [
"forms2.csv"
],
"encoding": "utf-8",
"schema": {
"name": "forms-schema",
"fields": [
{
"name": "form_id",
"type": "string",
"title": "Form table row identifiers",
"description": "These identifiers are specific to form, lexeme, cell triples.",
"constraints": {
"required": true,
"unique": true
}
},
{
"name": "lexeme",
"type": "string",
"title": "Reference to a lexeme identifier",
"description": "Lexeme identifiers must be unique to paradigms.",
"constraints": {
"required": true
},
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
},
{
"name": "cell",
"type": "string",
"title": "Reference to a cell identifier",
"description": "The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl",
"constraints": {
"required": true
},
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#cell"
},
{
"name": "phon_form",
"type": "string",
"title": "Inflected form (phonemic or phonetic)",
"description": "The form, given in phonemic or phonetic notation, with sounds separated by spaces",
"missingValues": [
"#DEF#"
],
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#phon_form"
}
],
"primaryKey": [
"form_id"
]
},
"rdfType": "https://www.paralex-standard.org/paralex_ontology.xml#Form"
}
],
"languages_iso639": [
"fr"
],
"paralex-version": "2.2.3"
}
Sources
This dataset illustrates adding a source file
Example
This is a minimal paralex dataset, which comprises:
forms.csv
: A forms table:- The 6 present forms, in phonemic notation, of the verb chanter, taken from Vlexique (https://doi.org/10.5281/zenodo.13366341).
- The only present columns are the mandatory ones:
form_id
,lexeme
,cell
,phon_form
. - Alternatively, a dataset with only
orth_form
would also be valid.
sources.bib
a bib file with sources.paralex-infos.yml
: A config file with minimal informationparalex-min-chanter.package.json
: A JSON generated by doing:
The dataset, although valid, is missing a data_sheet.md
, as for such a small example, the file was nonsensical.
form_id,lexeme,cell,phon_form,source
form-chanter-prs-1pl,chanter,ind.prs.1.pl,ʃ ɑ̃ t ɔ̃,FictionalSource1
form-chanter-prs-1sg,chanter,ind.prs.1.sg,ʃ ɑ̃ t,FictionalSource1
form-chanter-prs-2pl,chanter,ind.prs.2.pl,ʃ ɑ̃ t e,FictionalSource1
form-chanter-prs-2sg,chanter,ind.prs.2.sg,ʃ ɑ̃ t,FictionalSource2
form-chanter-prs-3pl,chanter,ind.prs.3.pl,ʃ ɑ̃ t,FictionalSource2
form-chanter-prs-3sg,chanter,ind.prs.3.sg,ʃ ɑ̃ t,FictionalSource2
title: Minimal example with sources - Indicative present of The French Verb "Chanter"
name: paralex-sources-chanter
version: '1.0'
languages_iso639:
- fr
contributors:
- title: Sacha Beniamine
keywords:
- example
- toy
- French
- verbs
- paradigms
- paralex
licenses:
- name: CC-BY-SA-4.0
path: 'https://creativecommons.org/licenses/by-sa/4.0/'
title: Attribution-ShareAlike 4.0 International
files:
sources:
path: sources.bib
forms:
path: forms.csv
readme:
path: readme.md
File generated automatically by running paralex meta paralex-infos.yml
{
"name": "paralex-sources-chanter",
"title": "Minimal example with sources - Indicative present of The French Verb \"Chanter\"",
"profile": "data-package",
"licenses": [
{
"name": "CC-BY-SA-4.0",
"path": "https://creativecommons.org/licenses/by-sa/4.0/",
"title": "Attribution-ShareAlike 4.0 International"
}
],
"contributors": [
{
"title": "Sacha Beniamine"
}
],
"keywords": [
"example",
"toy",
"French",
"verbs",
"paradigms",
"paralex"
],
"version": "1.0",
"resources": [
{
"name": "readme",
"type": "text",
"title": "Read me",
"description": "Basic documentation",
"path": "readme.md",
"scheme": "file",
"format": "md",
"mediatype": "text/markdown",
"encoding": "utf-8"
},
{
"name": "sources",
"type": "file",
"title": "Sources",
"description": "Bibliographical references.",
"path": "sources.bib",
"scheme": "file",
"format": "bib",
"mediatype": "text/x-bibtex",
"encoding": "utf-8"
},
{
"name": "forms",
"type": "table",
"title": "Inflected forms",
"path": "forms.csv",
"scheme": "file",
"format": "csv",
"mediatype": "text/csv",
"encoding": "utf-8",
"schema": {
"name": "forms-schema",
"fields": [
{
"name": "form_id",
"type": "string",
"title": "Form table row identifiers",
"description": "These identifiers are specific to form, lexeme, cell triples.",
"constraints": {
"required": true,
"unique": true
}
},
{
"name": "lexeme",
"type": "string",
"title": "Reference to a lexeme identifier",
"description": "Lexeme identifiers must be unique to paradigms.",
"constraints": {
"required": true
},
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
},
{
"name": "cell",
"type": "string",
"title": "Reference to a cell identifier",
"description": "The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl",
"constraints": {
"required": true
},
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#cell"
},
{
"name": "phon_form",
"type": "string",
"title": "Inflected form (phonemic or phonetic)",
"description": "The form, given in phonemic or phonetic notation, with sounds separated by spaces",
"missingValues": [
"#DEF#"
],
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#phon_form"
},
{
"name": "source",
"type": "string",
"title": "Source",
"description": "Reference to a specific source (bibtex key). If used, the dataset should comprise a .bib file where the keys are referenced.",
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#source"
}
],
"primaryKey": [
"form_id"
]
},
"rdfType": "https://www.paralex-standard.org/paralex_ontology.xml#Form"
}
],
"languages_iso639": [
"fr"
],
"paralex-version": "2.2.3"
}