Skip to content

More information and tables

The example above is minimal: a title and name for the package and at least a paradigm table. However, we recommend you add more information. In particular, provide a full text citation specifying how you wish your dataset to be cited, a list of collaborators following the frictionless specification, a license, a DOI identifier. All relevant tables should be listed.

Richer metadata example

Here is an example with more metadata, and a full list of five tables, including vulcan_v_cells.csv, vulcan_v_features.csv, vulcan_v_lexemes.csv, vulcan_v_sounds.csv. The forms table has also been divided into two files:

paralex-infos.yml
title: Vulcan Verbal Paradigms
tables:
  cells:
    path: vulcan_v_cells.csv
  features-values:
    path: vulcan_v_features.csv
  forms:
    path:
    - vulcan_v_forms.csv
    - vulcan_v_forms2.csv
  lexemes:
    path: vulcan_v_lexemes.csv
  sounds:
    path: vulcan_v_sounds.csv
name: vulcan
citation: Spock (2258). Vulcan Verbal Paradigms dataset. Online.
contributors:
- role: author
  title: Spock
id: http://dx.doi.org/S.179-276.SP
keywords:
- vulcan
- paradigms
licenses:
- name: CC-BY-SA-4.0
  path: https://creativecommons.org/licenses/by-sa/4.0/
  title: Creative Commons Attribution Share-Alike 4.0
version: 1.0.2

Custom columns

For any columns already defined in the specification, rich metadata is automatically generated, including a column name, title and description, its expected type, and potential constraints. This is written in the <dataset>.package.json file. For example, the metadata for the lexeme column from the forms table looks as follows:

{
  "name": "lexeme",
  "type": "string",
  "title": "Reference to a lexeme identifier",
  "description": "Lexeme identifiers must be unique to paradigms.",
  "constraints": {
    "required": true
  },
  "rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
}

The Paralex standard allows users to define their own custom columns, on top of pre-defined ones. For these columns, very little metadata can be inferred automatically. For example, imagine we have a consonantal column in the sounds table, coding whether each sound is a consonant or not. Since it is not pre-defined in the standard, the only inferred metadata would be:

{
  "name": "consonantal",
  "type": "any"
}

It is possible to inject more detailed metadata by adding a "schema" key under a specific table in the config file. The syntax of the schema section follows the frictionless standard.

Injecting frictionless schema info
paralex-infos.yml
title: Vulcan Verbal Paradigms
tables:
tables:
  cells:
    path: vulcan_v_cells.csv
  features-values:
    path: vulcan_v_features.csv
  forms:
    path:
    - vulcan_v_forms.csv
    - vulcan_v_forms2.csv
  lexemes:
    path: vulcan_v_lexemes.csv
  sounds:
    path: vulcan_v_sounds.csv
    schema:
      fields:
      - constraints:
          required: true
        description: Binary feature (1/0) indicating whether the segment is a consonant
        falseValues:
        - '0'
        name: consonantal
        title: Whether the segment is a consonant
        trueValues:
        - '1'
        type: boolean
name: vulcan
citation: Spock (2258). Vulcan Verbal Paradigms dataset. Online.
contributors:
- role: author
  title: Spock
id: http://dx.doi.org/S.179-276.SP
keywords:
- vulcan
- paradigms
licenses:
- name: CC-BY-SA-4.0
  path: https://creativecommons.org/licenses/by-sa/4.0/
  title: Creative Commons Attribution Share-Alike 4.0
version: 1.0.2

To find the definitions and format of the column metadata, see the fields descriptors in the Frictionless specifications.

Custom tables

Similarly, some metadata will be missing if using custom tables. In particular, one often needs to specify which column is an identifier (or primary key), and which columns refer to other ones. This is also done by specifying the schema of these tables in the config file. For example, imagine that in addition to lexemes, we have added a flexeme table, which provides a different partition of forms into paradigms. This is done through a flexeme column in the forms table, which refers to identifiers in the flexeme table. Thus, we need to add three things in the schemas.

In the forms schema, we need to define the column, as shown above, as well as the foreign key relation to the flexeme table:

excerpt of paralex-infos.yml
...
tables:
 ...
  forms:
    path:
    - vulcan_v_forms.csv
    - vulcan_v_forms2.csv
    schema:
      foreignKeys:
      - field: flexeme
        reference:
          resource: flexemes
          field: flexeme_id
      fields:
      - name: flexeme
        title: reference to a flexeme identifier
        description: A flexeme to which a form belongs.
        type: string
        constraints:
          required: true
...

In the flexeme schema, we define the flexeme_id column (we would probably need to define more columns), and declare it as the identifier (primary key):

excerpt of paralex-infos.yml
...
tables:
  ...
  flexemes:
      path: vulcan_v_flexemes.csv
      schema:
        primaryKey: flexeme_id
        fields:
        - name: flexeme_id
          title: identifier for a flexeme
          description: the flexeme id identifies a single flexeme
          type: string
          constraints:
            required: true
...
Rich metadata example with custom tables

The entire configuration is starting to get long:

paralex-infos.yml
title: Vulcan Verbal Paradigms
tables:
  cells:
    path: vulcan_v_cells.csv
  forms:
    path:
    - vulcan_v_forms.csv
    - vulcan_v_forms2.csv
    schema:
      foreignKeys:
      - field: flexeme
        reference:
          resource: flexemes
          field: flexeme_id
      fields:
      - name: flexeme
        title: reference to a flexeme identifier
        description: A flexeme to which a form belongs.
        type: string
        constraints:
          required: true
  features-values:
    path: vulcan_v_features.csv
  lexemes:
    path: vulcan_v_lexemes.csv
  sounds:
    path: vulcan_v_sounds.csv
    schema:
      fields:
      - name: consonantal
        type: boolean
        title: Whether the segment is a consonant
        description: Binary feature (1/0) indicating whether the segment is a consonant
        trueValues:
        - '1'
        falseValues:
        - '0'
        constraints:
          required: true
  flexemes:
    path: vulcan_v_flexemes.csv
    schema:
      primaryKey: flexeme_id
      fields:
      - name: flexeme_id
        title: identifier for a flexeme
        description: the flexeme id identifies a single flexeme
        type: string
        constraints:
          required: true
citation: Spock (2258). Vulcan Verbal Paradigms dataset. Online.
version: 1.0.2
keywords:
- vulcan
- paradigms
id: http://dx.doi.org/S.179-276.SP
contributors:
- title: Spock
  role: author
licenses:
- name: CC-BY-SA-4.0
  title: Creative Commons Attribution Share-Alike 4.0
  path: https://creativecommons.org/licenses/by-sa/4.0/

More custom manipulations

You can also write your own python script, calling paralex.paralex_factory, the argument of which reflect the structure of the config file: first a title, then a dict of tables, then optional arguments name, citation, contributors, id, keywords, licenses and version. The factory returns a frictionless Package object, which can then be written to disk. This is more flexible, as you can then modify the Package object as you like:

gen-metadata.py
from paralex import paralex_factory
package = paralex_factory("Vulcan Verbal Paradigms", {"forms": {"path": "vulcan_v_forms.csv"}})
package.to_json("vulcan.package.json")