18  Shacl4Bib

since v0.7.0. Note: This is an experimental feature.

The Shapes Constraint Language (SHACL) is a formal language for validating Resource Description Framework (RDF) graphs against a set of conditions (expressed also in RDF). Following this idea and implementing a subset of the language, the Metadata Quality Assessment Framework provides a mechanism to define SHACL-like rules for data sources in non-RDF based formats, such as XML, CSV and JSON (SHACL validates only RDF graphs). Shacl4Bib is the extension enabling the validation of bibliographic records. The rules can be defined either with YAML or JSON configuration files or with Java code. SCHACL uses RDF notation to specify or “address” the data element about which the constraints are set. Shacl4Bib supports Carsten Klee’s MARCspec for MARC records, and PICApath for PICA. You can find more information and full definition of the implemented subset of SHACL here: Defining schema with a configuration file

Parameters:

Here is a simple example for setting up rules against a MARC subfield:

format: MARC
fields:
- name: 040$a
  path: 040$a
  rules:
  - id: 040$a.minCount
    minCount: 1
  - id: 040$a.pattern
    pattern: ^BE-KBR00

The output contains an extra column, the record identifier, so it looks like something like this:

id,040$a.minCount,040$a.pattern
17529680,1,1
18212975,1,1
18216050,1,1
18184955,1,1
18184431,1,1
9550740,NA,NA
19551181,NA,NA
118592844,1,1
18592704,1,1
18592557,1,1