Skip to content

Although the standard specifies how to handle many complex case, at its core, it only requires very little. This page demonstrates some examples of small, minimal valid paralex packages.

Minimal dataset

This example does only the very minimum to have a valid paralex dataset

Example

This is a minimal paralex dataset, which comprises:

  • forms.csv: A forms table:
    • The 6 present forms, in phonemic notation, of the verb chanter, taken from Vlexique (https://doi.org/10.5281/zenodo.13366341).
    • The only present columns are the mandatory ones: form_id, lexeme, cell, phon_form.
    • Alternatively, a dataset with only orth_form would also be valid.
  • paralex-infos.yml: A config file with minimal information
  • paralex-min-chanter.package.json: A JSON generated by doing:
paralex meta paralex-infos.yml

The dataset, although valid, is missing a data_sheet.md, as for such a small example, the file was nonsensical.

forms.csv
form_id,lexeme,cell,phon_form
form-chanter-prs-1pl,chanter,ind.prs.1.pl,ʃ ɑ̃ t ɔ̃
form-chanter-prs-1sg,chanter,ind.prs.1.sg,ʃ ɑ̃ t
form-chanter-prs-2pl,chanter,ind.prs.2.pl,ʃ ɑ̃ t e
form-chanter-prs-2sg,chanter,ind.prs.2.sg,ʃ ɑ̃ t
form-chanter-prs-3pl,chanter,ind.prs.3.pl,ʃ ɑ̃ t
form-chanter-prs-3sg,chanter,ind.prs.3.sg,ʃ ɑ̃ t
paralex-infos.yml
title: Minimal example - Indicative present of The French Verb "Chanter"
name: paralex-min-chanter
version: '1.0'
languages_iso639:
- fr
contributors:
- title: Sacha Beniamine
keywords:
- example
- toy
- French
- verbs
- paradigms
- paralex
licenses:
- name: CC-BY-SA-4.0
  path: 'https://creativecommons.org/licenses/by-sa/4.0/'
  title: Attribution-ShareAlike 4.0 International
files:
  forms:
    path: forms.csv
  readme:
    path: readme.md

File generated automatically by running paralex meta paralex-infos.yml

paralex-min-chanter.package.json
{
  "name": "paralex-min-chanter",
  "title": "Minimal example - Indicative present of The French Verb \"Chanter\"",
  "profile": "data-package",
  "licenses": [
    {
      "name": "CC-BY-SA-4.0",
      "path": "https://creativecommons.org/licenses/by-sa/4.0/",
      "title": "Attribution-ShareAlike 4.0 International"
    }
  ],
  "contributors": [
    {
      "title": "Sacha Beniamine"
    }
  ],
  "keywords": [
    "example",
    "toy",
    "French",
    "verbs",
    "paradigms",
    "paralex"
  ],
  "version": "1.0",
  "resources": [
    {
      "name": "readme",
      "type": "text",
      "title": "Read me",
      "description": "Basic documentation",
      "path": "readme.md",
      "scheme": "file",
      "format": "md",
      "mediatype": "text/markdown",
      "encoding": "utf-8"
    },
    {
      "name": "forms",
      "type": "table",
      "title": "Inflected forms",
      "path": "forms.csv",
      "scheme": "file",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "utf-8",
      "schema": {
        "name": "forms-schema",
        "fields": [
          {
            "name": "form_id",
            "type": "string",
            "title": "Form table row identifiers",
            "description": "These identifiers are specific to form, lexeme, cell triples.",
            "constraints": {
              "required": true,
              "unique": true
            }
          },
          {
            "name": "lexeme",
            "type": "string",
            "title": "Reference to a lexeme identifier",
            "description": "Lexeme identifiers must be unique to paradigms.",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
          },
          {
            "name": "cell",
            "type": "string",
            "title": "Reference to a cell identifier",
            "description": "The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#cell"
          },
          {
            "name": "phon_form",
            "type": "string",
            "title": "Inflected form (phonemic or phonetic)",
            "description": "The form, given in phonemic or phonetic notation, with sounds separated by spaces",
            "missingValues": [
              "#DEF#"
            ],
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#phon_form"
          }
        ],
        "primaryKey": [
          "form_id"
        ]
      },
      "rdfType": "https://www.paralex-standard.org/paralex_ontology.xml#Form"
    }
  ],
  "languages_iso639": [
    "fr"
  ],
  "paralex-version": "2.2.2"
}

Multi paths

This dataset illustrates the usage of multiple data paths.

Example

This is a minimal paralex dataset, which exemplifies splitting a table (here the forms) in two files. In the current case, the data is tiny, but this is usually done with large datasets, to avoid getting enormous files:

  • A forms table:
    • The 6 present forms, in phonemic notation, of the verb chanter, taken from Vlexique (https://doi.org/10.5281/zenodo.13366341).
    • The only present columns are the mandatory ones: form_id, lexeme, cell, phon_form.
    • forms.csv: the three first forms
    • forms2.csv: the three last forms
  • paralex-infos.yml: A config file with minimal information
  • paralex-multipart-chanter.package.json: A JSON generated by doing:
paralex meta paralex-infos.yml

The dataset, although valid, is missing a data_sheet.md, as for such a small example, the file was nonsensical.

forms.csv
form_id,lexeme,cell,phon_form
form-chanter-prs-2sg,chanter,ind.prs.2.sg,ʃ ɑ̃ t
form-chanter-prs-3pl,chanter,ind.prs.3.pl,ʃ ɑ̃ t
form-chanter-prs-3sg,chanter,ind.prs.3.sg,ʃ ɑ̃ t
forms2.csv
form_id,lexeme,cell,phon_form
form-chanter-prs-1pl,chanter,ind.prs.1.pl,ʃ ɑ̃ t ɔ̃
form-chanter-prs-1sg,chanter,ind.prs.1.sg,ʃ ɑ̃ t
form-chanter-prs-2pl,chanter,ind.prs.2.pl,ʃ ɑ̃ t e
paralex-infos.yml
title: Minimal example - Indicative present of The French Verb "Chanter"
name: paralex-multipart-chanter
version: '1.0'
languages_iso639:
- fr
contributors:
- title: Sacha Beniamine
keywords:
- example
- toy
- French
- verbs
- paradigms
- paralex
licenses:
- name: CC-BY-SA-4.0
  path: 'https://creativecommons.org/licenses/by-sa/4.0/'
  title: Attribution-ShareAlike 4.0 International
files:
  forms:
    path:
      - forms.csv
      - forms2.csv
  readme:
    path: readme.md

File generated automatically by running paralex meta paralex-infos.yml

paralex-multipart-chanter.package.json
{
  "name": "paralex-multipart-chanter",
  "title": "Minimal example - Indicative present of The French Verb \"Chanter\"",
  "profile": "data-package",
  "licenses": [
    {
      "name": "CC-BY-SA-4.0",
      "path": "https://creativecommons.org/licenses/by-sa/4.0/",
      "title": "Attribution-ShareAlike 4.0 International"
    }
  ],
  "contributors": [
    {
      "title": "Sacha Beniamine"
    }
  ],
  "keywords": [
    "example",
    "toy",
    "French",
    "verbs",
    "paradigms",
    "paralex"
  ],
  "version": "1.0",
  "resources": [
    {
      "name": "readme",
      "type": "text",
      "title": "Read me",
      "description": "Basic documentation",
      "path": "readme.md",
      "scheme": "file",
      "format": "md",
      "mediatype": "text/markdown",
      "encoding": "utf-8"
    },
    {
      "name": "forms",
      "type": "table",
      "title": "Inflected forms",
      "path": "forms.csv",
      "scheme": "multipart",
      "format": "csv",
      "mediatype": "text/csv",
      "extrapaths": [
        "forms2.csv"
      ],
      "encoding": "utf-8",
      "schema": {
        "name": "forms-schema",
        "fields": [
          {
            "name": "form_id",
            "type": "string",
            "title": "Form table row identifiers",
            "description": "These identifiers are specific to form, lexeme, cell triples.",
            "constraints": {
              "required": true,
              "unique": true
            }
          },
          {
            "name": "lexeme",
            "type": "string",
            "title": "Reference to a lexeme identifier",
            "description": "Lexeme identifiers must be unique to paradigms.",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
          },
          {
            "name": "cell",
            "type": "string",
            "title": "Reference to a cell identifier",
            "description": "The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#cell"
          },
          {
            "name": "phon_form",
            "type": "string",
            "title": "Inflected form (phonemic or phonetic)",
            "description": "The form, given in phonemic or phonetic notation, with sounds separated by spaces",
            "missingValues": [
              "#DEF#"
            ],
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#phon_form"
          }
        ],
        "primaryKey": [
          "form_id"
        ]
      },
      "rdfType": "https://www.paralex-standard.org/paralex_ontology.xml#Form"
    }
  ],
  "languages_iso639": [
    "fr"
  ],
  "paralex-version": "2.2.2"
}

Sources

This dataset illustrates adding a source file

Example

This is a minimal paralex dataset, which comprises:

  • forms.csv: A forms table:
    • The 6 present forms, in phonemic notation, of the verb chanter, taken from Vlexique (https://doi.org/10.5281/zenodo.13366341).
    • The only present columns are the mandatory ones: form_id, lexeme, cell, phon_form.
    • Alternatively, a dataset with only orth_form would also be valid.
  • sources.bib a bib file with sources.
  • paralex-infos.yml: A config file with minimal information
  • paralex-min-chanter.package.json: A JSON generated by doing:
paralex meta paralex-infos.yml

The dataset, although valid, is missing a data_sheet.md, as for such a small example, the file was nonsensical.

forms.csv
form_id,lexeme,cell,phon_form,source
form-chanter-prs-1pl,chanter,ind.prs.1.pl,ʃ ɑ̃ t ɔ̃,FictionalSource1
form-chanter-prs-1sg,chanter,ind.prs.1.sg,ʃ ɑ̃ t,FictionalSource1
form-chanter-prs-2pl,chanter,ind.prs.2.pl,ʃ ɑ̃ t e,FictionalSource1
form-chanter-prs-2sg,chanter,ind.prs.2.sg,ʃ ɑ̃ t,FictionalSource2
form-chanter-prs-3pl,chanter,ind.prs.3.pl,ʃ ɑ̃ t,FictionalSource2
form-chanter-prs-3sg,chanter,ind.prs.3.sg,ʃ ɑ̃ t,FictionalSource2
sources.bib
@Article{FictionalSource1,
  author    = {Jane Doe},
  title     = {Fictional source n°1},
  year      = {2024},
}

@Article{FictionalSource2,
  author    = {John Smith},
  title     = {Fictional source n°2},
  year      = {2024},
}
paralex-infos.yml
title: Minimal example with sources - Indicative present of The French Verb "Chanter"
name: paralex-sources-chanter
version: '1.0'
languages_iso639:
- fr
contributors:
- title: Sacha Beniamine
keywords:
- example
- toy
- French
- verbs
- paradigms
- paralex
licenses:
- name: CC-BY-SA-4.0
  path: 'https://creativecommons.org/licenses/by-sa/4.0/'
  title: Attribution-ShareAlike 4.0 International
files:
  sources:
    path: sources.bib
  forms:
    path: forms.csv
  readme:
    path: readme.md

File generated automatically by running paralex meta paralex-infos.yml

paralex-sources-chanter.package.json
{
  "name": "paralex-sources-chanter",
  "title": "Minimal example with sources - Indicative present of The French Verb \"Chanter\"",
  "profile": "data-package",
  "licenses": [
    {
      "name": "CC-BY-SA-4.0",
      "path": "https://creativecommons.org/licenses/by-sa/4.0/",
      "title": "Attribution-ShareAlike 4.0 International"
    }
  ],
  "contributors": [
    {
      "title": "Sacha Beniamine"
    }
  ],
  "keywords": [
    "example",
    "toy",
    "French",
    "verbs",
    "paradigms",
    "paralex"
  ],
  "version": "1.0",
  "resources": [
    {
      "name": "sources",
      "type": "file",
      "title": "Sources",
      "description": "Bibliographical references.",
      "path": "sources.bib",
      "scheme": "file",
      "format": "bib",
      "mediatype": "text/x-bibtex",
      "encoding": "utf-8"
    },
    {
      "name": "readme",
      "type": "text",
      "title": "Read me",
      "description": "Basic documentation",
      "path": "readme.md",
      "scheme": "file",
      "format": "md",
      "mediatype": "text/markdown",
      "encoding": "utf-8"
    },
    {
      "name": "forms",
      "type": "table",
      "title": "Inflected forms",
      "path": "forms.csv",
      "scheme": "file",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "utf-8",
      "schema": {
        "name": "forms-schema",
        "fields": [
          {
            "name": "form_id",
            "type": "string",
            "title": "Form table row identifiers",
            "description": "These identifiers are specific to form, lexeme, cell triples.",
            "constraints": {
              "required": true,
              "unique": true
            }
          },
          {
            "name": "lexeme",
            "type": "string",
            "title": "Reference to a lexeme identifier",
            "description": "Lexeme identifiers must be unique to paradigms.",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
          },
          {
            "name": "cell",
            "type": "string",
            "title": "Reference to a cell identifier",
            "description": "The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl",
            "constraints": {
              "required": true
            },
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#cell"
          },
          {
            "name": "phon_form",
            "type": "string",
            "title": "Inflected form (phonemic or phonetic)",
            "description": "The form, given in phonemic or phonetic notation, with sounds separated by spaces",
            "missingValues": [
              "#DEF#"
            ],
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#phon_form"
          },
          {
            "name": "source",
            "type": "string",
            "title": "Source",
            "description": "Reference to a specific source (bibtex key). If used, the dataset should comprise a .bib file where the keys are referenced.",
            "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#source"
          }
        ],
        "primaryKey": [
          "form_id"
        ]
      },
      "rdfType": "https://www.paralex-standard.org/paralex_ontology.xml#Form"
    }
  ],
  "languages_iso639": [
    "fr"
  ],
  "paralex-version": "2.2.2"
}