More information and tables
The example above is minimal: a title and name for the package and at least a paradigm table. However, we recommend you add more information. In particular, provide a full text citation specifying how you wish your dataset to be cited, a list of collaborators following the frictionless specification, a license, a DOI identifier. All relevant tables should be listed.
Richer metadata example
Here is an example with more metadata, and a full list of five tables, including vulcan_v_cells.csv
, vulcan_v_features.csv
, vulcan_v_lexemes.csv
, vulcan_v_sounds.csv
. The forms table has also been divided into two files:
title: Vulcan Verbal Paradigms
tables:
cells:
path: vulcan_v_cells.csv
features-values:
path: vulcan_v_features.csv
forms:
path:
- vulcan_v_forms.csv
- vulcan_v_forms2.csv
lexemes:
path: vulcan_v_lexemes.csv
sounds:
path: vulcan_v_sounds.csv
name: vulcan
citation: Spock (2258). Vulcan Verbal Paradigms dataset. Online.
contributors:
- role: author
title: Spock
id: http://dx.doi.org/S.179-276.SP
keywords:
- vulcan
- paradigms
licenses:
- name: CC-BY-SA-4.0
path: https://creativecommons.org/licenses/by-sa/4.0/
title: Creative Commons Attribution Share-Alike 4.0
version: 1.0.2
Custom columns
For any columns already defined in the specification, rich metadata is automatically generated, including a column name, title and description, its expected type, and potential constraints. This is written in the <dataset>.package.json
file. For example, the metadata for the lexeme column from the forms table looks as follows:
{
"name": "lexeme",
"type": "string",
"title": "Reference to a lexeme identifier",
"description": "Lexeme identifiers must be unique to paradigms.",
"constraints": {
"required": true
},
"rdfProperty": "https://www.paralex-standard.org/paralex_ontology.xml#lexeme"
}
The Paralex standard allows users to define their own custom columns, on top of pre-defined ones. For these columns, very little metadata can be inferred automatically. For example, imagine we have a consonantal
column in the sounds
table, coding whether each sound is a consonant or not. Since it is not pre-defined in the standard, the only inferred metadata would be:
It is possible to inject more detailed metadata by adding a "schema" key under a specific table in the config file.
The syntax of the schema section follows the frictionless
standard.
Injecting frictionless schema info
title: Vulcan Verbal Paradigms
tables:
tables:
cells:
path: vulcan_v_cells.csv
features-values:
path: vulcan_v_features.csv
forms:
path:
- vulcan_v_forms.csv
- vulcan_v_forms2.csv
lexemes:
path: vulcan_v_lexemes.csv
sounds:
path: vulcan_v_sounds.csv
schema:
fields:
- constraints:
required: true
description: Binary feature (1/0) indicating whether the segment is a consonant
falseValues:
- '0'
name: consonantal
title: Whether the segment is a consonant
trueValues:
- '1'
type: boolean
name: vulcan
citation: Spock (2258). Vulcan Verbal Paradigms dataset. Online.
contributors:
- role: author
title: Spock
id: http://dx.doi.org/S.179-276.SP
keywords:
- vulcan
- paradigms
licenses:
- name: CC-BY-SA-4.0
path: https://creativecommons.org/licenses/by-sa/4.0/
title: Creative Commons Attribution Share-Alike 4.0
version: 1.0.2
To find the definitions and format of the column metadata, see the fields descriptors in the Frictionless specifications.
Custom tables
Similarly, some metadata will be missing if using custom tables. In particular, one often needs to specify which column is an identifier (or primary key), and which columns refer to other ones. This is also done by specifying the schema of these tables in the config file. For example, imagine that in addition to lexemes, we have added a flexeme table, which provides a different partition of forms into paradigms. This is done through a flexeme
column in the forms table, which refers to identifiers in the flexeme
table. Thus, we need to add three things in the schemas.
In the forms schema, we need to define the column, as shown above, as well as the foreign key relation to the flexeme table:
...
tables:
...
forms:
path:
- vulcan_v_forms.csv
- vulcan_v_forms2.csv
schema:
foreignKeys:
- field: flexeme
reference:
resource: flexemes
field: flexeme_id
fields:
- name: flexeme
title: reference to a flexeme identifier
description: A flexeme to which a form belongs.
type: string
constraints:
required: true
...
In the flexeme schema, we define the flexeme_id
column (we would probably need to define more columns), and declare it as the identifier (primary key):
...
tables:
...
flexemes:
path: vulcan_v_flexemes.csv
schema:
primaryKey: flexeme_id
fields:
- name: flexeme_id
title: identifier for a flexeme
description: the flexeme id identifies a single flexeme
type: string
constraints:
required: true
...
Rich metadata example with custom tables
The entire configuration is starting to get long:
title: Vulcan Verbal Paradigms
tables:
cells:
path: vulcan_v_cells.csv
forms:
path:
- vulcan_v_forms.csv
- vulcan_v_forms2.csv
schema:
foreignKeys:
- field: flexeme
reference:
resource: flexemes
field: flexeme_id
fields:
- name: flexeme
title: reference to a flexeme identifier
description: A flexeme to which a form belongs.
type: string
constraints:
required: true
features-values:
path: vulcan_v_features.csv
lexemes:
path: vulcan_v_lexemes.csv
sounds:
path: vulcan_v_sounds.csv
schema:
fields:
- name: consonantal
type: boolean
title: Whether the segment is a consonant
description: Binary feature (1/0) indicating whether the segment is a consonant
trueValues:
- '1'
falseValues:
- '0'
constraints:
required: true
flexemes:
path: vulcan_v_flexemes.csv
schema:
primaryKey: flexeme_id
fields:
- name: flexeme_id
title: identifier for a flexeme
description: the flexeme id identifies a single flexeme
type: string
constraints:
required: true
citation: Spock (2258). Vulcan Verbal Paradigms dataset. Online.
version: 1.0.2
keywords:
- vulcan
- paradigms
id: http://dx.doi.org/S.179-276.SP
contributors:
- title: Spock
role: author
licenses:
- name: CC-BY-SA-4.0
title: Creative Commons Attribution Share-Alike 4.0
path: https://creativecommons.org/licenses/by-sa/4.0/
More custom manipulations
You can also write your own python script, calling paralex.paralex_factory
, the argument of which reflect the structure of the config file: first a title, then a dict of tables, then optional arguments name, citation, contributors, id, keywords, licenses and version. The factory returns a frictionless
Package
object, which can then be written to disk. This is more flexible, as you can then modify the Package object as you like: