The Metadata Quality Assurance Framework is a research project on figuring out how can we decide in an algorithmical way whether a metadata record in a cultural heritage database is “good” or “bad”. During the project I will create a general framework, which lets metadata repositories and digital libraries (such as Europeana, TextGrid or Digital Public Library of America) to run a range of measurements on the collection, and get suggestion where they should improve the quality of their metadata.
These pages are about the process of the research – my early findings, results, codes, talks.
The first version of the web interface: http://18.104.22.168/europeana-qa
- Harvester client: https://github.com/pkiraly/europeana-oai-pmh-client
- General Metadata QA API:
- Source repository: https://github.com/pkiraly/metadata-qa-api
- Maven artifact: http://mvnrepository.com/artifact/de.gwdg.metadataqa/metadata-qa-api
- The Europeana-specific Europeana QA API:
- Source repository: https://github.com/pkiraly/europeana-qa-api
- Maven artifact: http://mvnrepository.com/artifact/de.gwdg.metadataqa/europeana-qa-api
- Measurement with Spark: https://github.com/pkiraly/europeana-qa-spark
- Analysis with R: https://github.com/pkiraly/europeana-qa-r
- Web interface: https://github.com/pkiraly/europeana-qa-web
- REST and command line interface: https://github.com/pkiraly/europeana-qa-client
- Solr connector: https://github.com/pkiraly/europeana-qa-solr
- Cassandra connector: https://github.com/pkiraly/europeana-qa-cassandra
- Measurement with Hadoop: https://github.com/pkiraly/europeana-qa
My name is Péter Király, and I am a software developer, analyst. My main interests are publishing and searching large text corporas in the web, and the new ways of web presence of cultural heritage (mainly library and archival materials with special focus on semantic web technologies). You can reach me via the methods listed in the contact page.