Although the standard specifies how to handle many complex case, at its core, it only requires very little. This page demonstrates some examples of small, minimal valid paralex packages.

Minimal dataset

This example does only the very minimum to have a valid paralex dataset

Example

readme.mdForms TableMetadata: yaml config fileGenerated: json metadata

This is a minimal paralex dataset, which comprises:

forms.csv: A forms table:
- The 6 present forms, in phonemic notation, of the verb chanter, taken from Vlexique (https://doi.org/10.5281/zenodo.13366341).
- The only present columns are the mandatory ones: form_id, lexeme, cell, phon_form.
- Alternatively, a dataset with only orth_form would also be valid.
paralex-infos.yml: A config file with minimal information
paralex-min-chanter.package.json: A JSON generated by doing:

paralex meta paralex-infos.yml

The dataset, although valid, is missing a data_sheet.md, as for such a small example, the file was nonsensical.

forms.csv

form_id,lexeme,cell,phon_form
form-chanter-prs-1pl,chanter,ind.prs.1.pl,ʃ ɑ̃ t ɔ̃
form-chanter-prs-1sg,chanter,ind.prs.1.sg,ʃ ɑ̃ t
form-chanter-prs-2pl,chanter,ind.prs.2.pl,ʃ ɑ̃ t e
form-chanter-prs-2sg,chanter,ind.prs.2.sg,ʃ ɑ̃ t
form-chanter-prs-3pl,chanter,ind.prs.3.pl,ʃ ɑ̃ t
form-chanter-prs-3sg,chanter,ind.prs.3.sg,ʃ ɑ̃ t

paralex-infos.yml

title: Minimal example - Indicative present of The French Verb "Chanter"
name: paralex-min-chanter
version: '1.0'
languages_iso639:
- fr
pos:
- verb
contributors:
- title: Sacha Beniamine
keywords:
- example
- toy
- French
- verbs
- paradigms
- paralex
licenses:
- name: CC-BY-SA-4.0
  path: 'https://creativecommons.org/licenses/by-sa/4.0/'
  title: Attribution-ShareAlike 4.0 International
files:
  forms:
    path: forms.csv
  readme:
    path: readme.md

File generated automatically by running paralex meta paralex-infos.yml

paralex-min-chanter.package.json

{
  "name": "paralex-min-chanter",
  "title": "Minimal example - Indicative present of The French Verb \"Chanter\"",
  "profile": "data-package",
  "licenses": [
    {
      "name": "CC-BY-SA-4.0",
      "path": "https://creativecommons.org/licenses/by-sa/4.0/",
      "title": "Attribution-ShareAlike 4.0 International"
    }
  ],
  "contributors": [
    {
      "title": "Sacha Beniamine"
    }
  ],
  "keywords": [
    "example",
    "toy",
    "French",
    "verbs",
    "paradigms",
    "paralex"
  ],
  "version": "1.0",
  "resources": [
    {
      "name": "forms",
      "type": "table",
      "title": "Inflected forms",
      "path": "forms.csv",
      "scheme": "file",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "utf-8",
      "schema": {
        "name": "forms-schema",
        "fields": [
          {
            "name": "form_id",
            "type": "string",
            "title": "Form table row identifiers",
            "description": "These identifiers are specific to form, lexeme, cell triples.",
            "constraints": {
              "required": true,
              "unique": true
            }
          },
          {
            "name": "lexeme",
            "type": "string",
            "title": "Reference to a lexeme identifier",
            "description": "Lexeme identifiers must be unique to paradigms.",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
          },
          {
            "name": "cell",
            "type": "string",
            "title": "Reference to a cell identifier",
            "description": "The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#cell"
          },
          {
            "name": "phon_form",
            "type": "string",
            "title": "Inflected form (phonemic or phonetic)",
            "description": "The form, given in phonemic or phonetic notation, with sounds separated by spaces",
            "missingValues": [
              "#DEF#",
              "#MISSING#"
            ],
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#phon_form"
          }
        ],
        "primaryKey": [
          "form_id"
        ]
      },
      "rdfType": "https://www.paralex-standard.org/paralex_ontology.xml#Form"
    },
    {
      "name": "readme",
      "type": "text",
      "title": "Read me",
      "description": "Basic documentation",
      "path": "readme.md",
      "scheme": "file",
      "format": "md",
      "mediatype": "text/markdown",
      "encoding": "utf-8"
    }
  ],
  "languages_iso639": [
    "fr"
  ],
  "pos": [
    "verb"
  ],
  "paralex-version": "2.3.0"
}

Multi paths

This dataset illustrates the usage of multiple data paths.

Example

readme.mdForms Table in two csv filesMetadata: yaml config fileGenerated: json metadata

This is a minimal paralex dataset, which exemplifies splitting a table (here the forms) in two files. In the current case, the data is tiny, but this is usually done with large datasets, to avoid getting enormous files:

A forms table:
- The 6 present forms, in phonemic notation, of the verb chanter, taken from Vlexique (https://doi.org/10.5281/zenodo.13366341).
- The only present columns are the mandatory ones: form_id, lexeme, cell, phon_form.
- forms.csv: the three first forms
- forms2.csv: the three last forms
paralex-infos.yml: A config file with minimal information
paralex-multipart-chanter.package.json: A JSON generated by doing:

paralex meta paralex-infos.yml

The dataset, although valid, is missing a data_sheet.md, as for such a small example, the file was nonsensical.

forms.csv

form_id,lexeme,cell,phon_form
form-chanter-prs-2sg,chanter,ind.prs.2.sg,ʃ ɑ̃ t
form-chanter-prs-3pl,chanter,ind.prs.3.pl,ʃ ɑ̃ t
form-chanter-prs-3sg,chanter,ind.prs.3.sg,ʃ ɑ̃ t

forms2.csv

form_id,lexeme,cell,phon_form
form-chanter-prs-1pl,chanter,ind.prs.1.pl,ʃ ɑ̃ t ɔ̃
form-chanter-prs-1sg,chanter,ind.prs.1.sg,ʃ ɑ̃ t
form-chanter-prs-2pl,chanter,ind.prs.2.pl,ʃ ɑ̃ t e

paralex-infos.yml

title: Minimal example - Indicative present of The French Verb "Chanter"
name: paralex-multipart-chanter
version: '1.0'
languages_iso639:
- fr
pos:
- verb
contributors:
- title: Sacha Beniamine
keywords:
- example
- toy
- French
- verbs
- paradigms
- paralex
licenses:
- name: CC-BY-SA-4.0
  path: 'https://creativecommons.org/licenses/by-sa/4.0/'
  title: Attribution-ShareAlike 4.0 International
files:
  forms:
    path:
      - forms.csv
      - forms2.csv
  readme:
    path: readme.md

File generated automatically by running paralex meta paralex-infos.yml

paralex-multipart-chanter.package.json

{
  "name": "paralex-multipart-chanter",
  "title": "Minimal example - Indicative present of The French Verb \"Chanter\"",
  "profile": "data-package",
  "licenses": [
    {
      "name": "CC-BY-SA-4.0",
      "path": "https://creativecommons.org/licenses/by-sa/4.0/",
      "title": "Attribution-ShareAlike 4.0 International"
    }
  ],
  "contributors": [
    {
      "title": "Sacha Beniamine"
    }
  ],
  "keywords": [
    "example",
    "toy",
    "French",
    "verbs",
    "paradigms",
    "paralex"
  ],
  "version": "1.0",
  "resources": [
    {
      "name": "forms",
      "type": "table",
      "title": "Inflected forms",
      "path": "forms.csv",
      "scheme": "multipart",
      "format": "csv",
      "mediatype": "text/csv",
      "extrapaths": [
        "forms2.csv"
      ],
      "encoding": "utf-8",
      "schema": {
        "name": "forms-schema",
        "fields": [
          {
            "name": "form_id",
            "type": "string",
            "title": "Form table row identifiers",
            "description": "These identifiers are specific to form, lexeme, cell triples.",
            "constraints": {
              "required": true,
              "unique": true
            }
          },
          {
            "name": "lexeme",
            "type": "string",
            "title": "Reference to a lexeme identifier",
            "description": "Lexeme identifiers must be unique to paradigms.",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
          },
          {
            "name": "cell",
            "type": "string",
            "title": "Reference to a cell identifier",
            "description": "The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#cell"
          },
          {
            "name": "phon_form",
            "type": "string",
            "title": "Inflected form (phonemic or phonetic)",
            "description": "The form, given in phonemic or phonetic notation, with sounds separated by spaces",
            "missingValues": [
              "#DEF#",
              "#MISSING#"
            ],
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#phon_form"
          }
        ],
        "primaryKey": [
          "form_id"
        ]
      },
      "rdfType": "https://www.paralex-standard.org/paralex_ontology.xml#Form"
    },
    {
      "name": "readme",
      "type": "text",
      "title": "Read me",
      "description": "Basic documentation",
      "path": "readme.md",
      "scheme": "file",
      "format": "md",
      "mediatype": "text/markdown",
      "encoding": "utf-8"
    }
  ],
  "languages_iso639": [
    "fr"
  ],
  "pos": [
    "verb"
  ],
  "paralex-version": "2.3.0"
}

Sources

This dataset illustrates adding a source file

Example

readme.mdForms TableSourcesMetadata: yaml config fileGenerated: json metadata

This is a minimal paralex dataset, which comprises:

forms.csv: A forms table:
- The 6 present forms, in phonemic notation, of the verb chanter, taken from Vlexique (https://doi.org/10.5281/zenodo.13366341).
- The only present columns are the mandatory ones: form_id, lexeme, cell, phon_form.
- Alternatively, a dataset with only orth_form would also be valid.
sources.bib a bib file with sources.
paralex-infos.yml: A config file with minimal information
paralex-min-chanter.package.json: A JSON generated by doing:

paralex meta paralex-infos.yml

The dataset, although valid, is missing a data_sheet.md, as for such a small example, the file was nonsensical.

forms.csv

form_id,lexeme,cell,phon_form,source
form-chanter-prs-1pl,chanter,ind.prs.1.pl,ʃ ɑ̃ t ɔ̃,FictionalSource1
form-chanter-prs-1sg,chanter,ind.prs.1.sg,ʃ ɑ̃ t,FictionalSource1
form-chanter-prs-2pl,chanter,ind.prs.2.pl,ʃ ɑ̃ t e,FictionalSource1
form-chanter-prs-2sg,chanter,ind.prs.2.sg,ʃ ɑ̃ t,FictionalSource2
form-chanter-prs-3pl,chanter,ind.prs.3.pl,ʃ ɑ̃ t,FictionalSource2
form-chanter-prs-3sg,chanter,ind.prs.3.sg,ʃ ɑ̃ t,FictionalSource2

sources.bib

@Article{FictionalSource1,
  author    = {Jane Doe},
  title     = {Fictional source n°1},
  year      = {2024},
}

@Article{FictionalSource2,
  author    = {John Smith},
  title     = {Fictional source n°2},
  year      = {2024},
}

paralex-infos.yml

title: Minimal example with sources - Indicative present of The French Verb "Chanter"
name: paralex-sources-chanter
version: '1.0'
languages_iso639:
- fr
pos:
- verb
contributors:
- title: Sacha Beniamine
keywords:
- example
- toy
- French
- verbs
- paradigms
- paralex
licenses:
- name: CC-BY-SA-4.0
  path: 'https://creativecommons.org/licenses/by-sa/4.0/'
  title: Attribution-ShareAlike 4.0 International
files:
  sources:
    path: sources.bib
  forms:
    path: forms.csv
  readme:
    path: readme.md

File generated automatically by running paralex meta paralex-infos.yml

paralex-sources-chanter.package.json

{
  "name": "paralex-sources-chanter",
  "title": "Minimal example with sources - Indicative present of The French Verb \"Chanter\"",
  "profile": "data-package",
  "licenses": [
    {
      "name": "CC-BY-SA-4.0",
      "path": "https://creativecommons.org/licenses/by-sa/4.0/",
      "title": "Attribution-ShareAlike 4.0 International"
    }
  ],
  "contributors": [
    {
      "title": "Sacha Beniamine"
    }
  ],
  "keywords": [
    "example",
    "toy",
    "French",
    "verbs",
    "paradigms",
    "paralex"
  ],
  "version": "1.0",
  "resources": [
    {
      "name": "sources",
      "type": "file",
      "title": "Sources",
      "description": "Bibliographical references.",
      "path": "sources.bib",
      "scheme": "file",
      "format": "bib",
      "mediatype": "text/x-bibtex",
      "encoding": "utf-8"
    },
    {
      "name": "forms",
      "type": "table",
      "title": "Inflected forms",
      "path": "forms.csv",
      "scheme": "file",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "utf-8",
      "schema": {
        "name": "forms-schema",
        "fields": [
          {
            "name": "form_id",
            "type": "string",
            "title": "Form table row identifiers",
            "description": "These identifiers are specific to form, lexeme, cell triples.",
            "constraints": {
              "required": true,
              "unique": true
            }
          },
          {
            "name": "lexeme",
            "type": "string",
            "title": "Reference to a lexeme identifier",
            "description": "Lexeme identifiers must be unique to paradigms.",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
          },
          {
            "name": "cell",
            "type": "string",
            "title": "Reference to a cell identifier",
            "description": "The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#cell"
          },
          {
            "name": "phon_form",
            "type": "string",
            "title": "Inflected form (phonemic or phonetic)",
            "description": "The form, given in phonemic or phonetic notation, with sounds separated by spaces",
            "missingValues": [
              "#DEF#",
              "#MISSING#"
            ],
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#phon_form"
          },
          {
            "name": "source",
            "type": "string",
            "title": "Source",
            "description": "Reference to a specific source (bibtex key). If used, the dataset should comprise a .bib file where the keys are referenced.",
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#source"
          }
        ],
        "primaryKey": [
          "form_id"
        ]
      },
      "rdfType": "https://www.paralex-standard.org/paralex_ontology.xml#Form"
    },
    {
      "name": "readme",
      "type": "text",
      "title": "Read me",
      "description": "Basic documentation",
      "path": "readme.md",
      "scheme": "file",
      "format": "md",
      "mediatype": "text/markdown",
      "encoding": "utf-8"
    }
  ],
  "languages_iso639": [
    "fr"
  ],
  "pos": [
    "verb"
  ],
  "paralex-version": "2.3.0"
}