The Metadata Quality Assurance Framework is a research project on figuring out how can we decide in an algorithmical way whether a metadata record in a cultural heritage database is “good” or “bad”. During the project I will create a general framework, which lets metadata repositories and digital libraries (such as Europeana, TextGrid or Digital Public Library of America) to run a range of measurements on the collection, and get suggestion where they should improve the quality of their metadata.
These pages are about the process of the research – my early findings, results, codes, talks.
The first version of the web interface: http://184.108.40.206/europeana-qa
- Harvester client: https://github.com/pkiraly/europeana-oai-pmh-client
- General Metadata QA API:
- Source repository: https://github.com/pkiraly/metadata-qa-api
- Maven artifact: http://mvnrepository.com/artifact/de.gwdg.metadataqa/metadata-qa-api
- The Europeana-specific Europeana QA API:
- Source repository: https://github.com/pkiraly/europeana-qa-api
- Maven artifact: http://mvnrepository.com/artifact/de.gwdg.metadataqa/europeana-qa-api
- Measurement with Spark: https://github.com/pkiraly/europeana-qa-spark
- Analysis with R: https://github.com/pkiraly/europeana-qa-r
- Web interface: https://github.com/pkiraly/europeana-qa-web
- REST and command line interface: https://github.com/pkiraly/europeana-qa-client
- Solr connector: https://github.com/pkiraly/europeana-qa-solr
- Cassandra connector: https://github.com/pkiraly/europeana-qa-cassandra
- Measurement with Hadoop: https://github.com/pkiraly/europeana-qa
Juliane Stiller, and Péter Király. “Multilinguality of Metadata Measuring the Multilingual Degree of Europeana’s Metadata.” In M. Gäde, V. Trkulja, V. Petras (Eds.): Everything Changes, Everything Stays the Same? Understanding Information Spaces. Proceedings of the 15th International Symposium of Information Science (ISI 2017), Berlin, 13th—15th March 2017. Glückstadt: Verlag Werner Hülsbusch, pp. 164—176. URL (whole book): http://isi2017.ib.hu-berlin.de/ISI_17_ONLINE_FINAL.pdf (this paper): https://www.researchgate.net/publication/314879735_Multilinguality_of_Metadata_Measuring_the_Multilingual_Degree_of_Europeana%27s_Metadata
Péter Király. “Towards an extensible measurement of metadata quality.” In Second International Conference on Digital Access to Textual Cultural Heritage. Conference Proceedings. Göttingen, June 1-2, 2017. Published by ACM. ISBN 978-1-4503-5265-9. pp. 111-115. DOI 10.1145/3078081.3078109 URL: http://dl.acm.org/citation.cfm?doid=3078081.3078109
Péter Király. “Measuring completeness as metadata quality metric in Europeana.” In Digital Humanities 2017. Conference Abstracts. McGill University & Université de Montréal, Montréal, Canada, August 8–11, 2017. Prepared by Rhian Lewis and the DH2017 Local Organizers: Cecily Raynor, Dominic Forest, Michael Sinatra and Stéfan Sinclair. pp. 291-293. URL (whole book): https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf, URL (the abstract): https://dh2017.adho.org/abstracts/458/458.pdf.
My name is Péter Király, and I am a software developer, analyst. My main interests are publishing and searching large text corporas in the web, and the new ways of web presence of cultural heritage (mainly library and archival materials with special focus on semantic web technologies). You can reach me via the methods listed in the contact page.