The project

The Metadata Quality Assurance Framework is a research project on figuring out how can we decide in an algorithmical way whether a metadata record in a cultural heritage database is “good” or “bad”. During the project I will create a general framework, which lets metadata repositories and digital libraries (such as Europeana, TextGrid or Digital Public Library of America) to run a range of measurements on the collection, and get suggestion where they should improve the quality of their metadata.

These pages are about the process of the research – my early findings, results, codes, talks.

The first version of the web interface: http://144.76.218.178/europeana-qa

Source code

Publications

2017

Juliane Stiller, and Péter Király. “Multilinguality of Metadata Measuring the Multilingual Degree of Europeana’s Metadata.” In M. Gäde, V. Trkulja, V. Petras (Eds.): Everything Changes, Everything Stays the Same? Understanding Information Spaces. Proceedings of the 15th International Symposium of Information Science (ISI 2017), Berlin, 13th—15th March 2017. Glückstadt: Verlag Werner Hülsbusch, pp. 164—176. URL (whole book): http://isi2017.ib.hu-berlin.de/ISI_17_ONLINE_FINAL.pdf (this paper): https://www.researchgate.net/publication/314879735_Multilinguality_of_Metadata_Measuring_the_Multilingual_Degree_of_Europeana%27s_Metadata
cited by:

  • Fallert, Sarah. “Multilinguale Herausforderungen in der Sacherschließung.” Master’s thesis, Humboldt-Universität zu Berlin, 2020. link

Péter Király. “Towards an extensible measurement of metadata quality.” In Second International Conference on Digital Access to Textual Cultural Heritage. Conference Proceedings. Göttingen, June 1-2, 2017. Published by ACM 2017. ISBN 978-1-4503-5265-9. pp. 111-115. DOI 10.1145/3078081.3078109 URL: http://dl.acm.org/citation.cfm?doid=3078081.3078109
cited by:

  • Candela, Gustavo, María Pilar Escobar Esteban, María Dolores Sáez Fernández, and Manuel Marco Such. “A Shape Expression approach for assessing the quality of Linked Open Data in Libraries.” Semantic Web (2021) link

Péter Király. “Measuring completeness as metadata quality metric in Europeana.” In Digital Humanities 2017. Conference Abstracts. McGill University & Université de Montréal, Montréal, Canada, August 8–11, 2017. Prepared by Rhian Lewis and the DH2017 Local Organizers: Cecily Raynor, Dominic Forest, Michael Sinatra and Stéfan Sinclair. pp. 291-293. URL (whole book): https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf, URL (the abstract): https://dh2017.adho.org/abstracts/458/458.pdf.

2018

Valentine Charles, Juliane Stiller, Péter Király, Werner Bailer, and Nuno Freire. “Data Quality Assessment in Europeana: Metrics for Multilinguality.” In Joint Proceedings of the 1st Workshop on Temporal Dynamics in Digital Libraries (TDDL 2017), the (Meta)-Data Quality Workshop (MDQual 2017) and the Workshop on Modeling Societal Future (Futurity 2017) (TDDL MDQual Futurity 2017) co-located with 21st International Conference on Theory and Practice of Digital Libraries (TPLD 2017) (Grand Hotel Palace, Thessaloniki, Greece, 21 September 2017), edited by A. Caputo, N. Kanhabua, P. Basile, S. Lawless, D. Gavrilis, Ch. Papatheodorou, D. Trandabat. (CEUR Workshop Proceedings Volume 2038. ISSN 1613-0073.), Published by CEUR, 2018. http://ceur-ws.org/Vol-2038/paper6.pdf.

2019

Péter Király and Marco Büchler. “Measuring completeness as metadata quality metric in Europeana.” In 2018 IEEE International Conference on Big Data (Big Data). Published by IEEE, 2019. pp. 2711–2720. DOI 10.1109/BigData.2018.8622487
cited by:

  • Khan, Nadim Akhtar, S. M. Shafi, and Humma Ahangar. “Digitization of cultural heritage: Global initiatives, opportunities and challenges.” Journal of Cases on Information Technology (JCIT) 20, no. 4 (2018): 1-16. link
  • Lee, Jongwook. “Analysis and Suggestions of Digital Heritage Policy.” Journal of The Korea Society of Computer and Information 24, no. 10 (2019): 71-78. link
  • Freire, Klara Martha Wanderley. “A curadoria digital nas instituições culturais: possibilidades de reuso de dados de Arte.” (2019). link
  • Pajari, Jussi. “Tutkimusaineistojen metatiedot: Metatietojen laatu data-ja metatietoarkistoissa.” Master’s thesis, 2019. link
  • Tavakoli, Mohammadreza, Mirette Elias, Gábor Kismihók, and Sören Auer. “Quality prediction of open educational resources a metadata-based approach.” In 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT), pp. 29-31. IEEE, 2020. link
  • Schuster, Kristen, and Stuart Dunn, eds. Routledge International Handbook of Research Methods in Digital Humanities. Routledge, 2020. link
  • Tavakoli, Mohammadreza, Ali Faraji, Stefan T. Mol, and Gábor Kismihók. “OER Recommendations to Support Career Development.” In 2020 IEEE Frontiers in Education Conference (FIE), pp. 1-5. IEEE, 2020. link
  • Million, A. J. “Information Communication Technologies, Infrastructure, and Research Methods in the Digital Humanities.” In Routledge International Handbook of Research Methods in Digital Humanities, pp. 190-202. Routledge, 2020. link
  • Phillips, Mark Edward, Oksana L. Zavalina, and Hannah Tarver. “Exploring the utility of metadata record graphs and network analysis for metadata quality evaluation and augmentation.” International Journal of Metadata, Semantics and Ontologies 14, no. 2 (2020): 112-123. link
  • Abgaz, Yalemisew, Amelie Dorn, José Luis Preza Díaz, and Gerda Koch. “Towards a Comprehensive Assessment of the Quality and Richness of the Europeana Metadata of food-related Images.” In Proceedings of the 1st International Workshop on Artificial Intelligence for Historical Image Enrichment and Access, pp. 29-33. 2020. link
  • Eder, Johann, and Vladimir A. Shekhovtsov. “Data Quality for Medical Data Lakelands.” In International Conference on Future Data and Security Engineering, pp. 28-43. Springer, Cham, 2020. link
  • Eder, Johann, and Vladimir A. Shekhovtsov. “Data quality for federated medical data lakes.” International Journal of Web Information Systems (2021). link
  • Phillips, Mark Edward, and Hannah Tarver. “Investigating the use of metadata record graphs to analyze subject headings in the digital public library of America.” The Electronic Library (2021). link
  • Barrett, Susan A. “Participatory Description and Metadata Quality in Rapid Response Archives.” Collections (2021): 1550190620981038. link
  • Lorenzini, Matteo, Marco Rospocher, and Sara Tonelli. “Automatically evaluating the quality of textual descriptions in cultural heritage records.” International Journal on Digital Libraries 22, no. 2 (2021): 217-231. link
  • Wenige, Lisa, Claus Stadler, Michael Martin, Richard Figura, Robert Sauter, and Christopher W. Frank. “Open Data and the Status Quo–A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany.” arXiv preprint arXiv:2106.09590 (2021). link
  • Tavakoli, Mohammadreza, Mirette Elias, Gábor Kismihók, and Sören Auer. “Metadata Analysis of Open Educational Resources.” In LAK21: 11th International Learning Analytics and Knowledge Conference, pp. 626-631. 2021. link

Péter Király, Juliane Stiller, Valentine Charles, Werner Bailer, and Nuno Freire. “Evaluating Data Quality in Europeana: Metrics for Multilinguality.” In Metadata and Semantic Research 2019. 12th International Conference, MTSR 2018, Limassol, Cyprus, October 23-26, 2018, Revised Selected Papers (Communications in Computer and Information Science, volume 846) Published by Springer, 2019. pp. 199–211. DOI 10.1007/978-3-030-14401-2_19
cited by:

  • Freire, Nuno, and Antoine Isaac. “Technical usability of Wikidata’s linked data.” In International Conference on Business Information Systems, pp. 556-567. Springer, Cham, 2019. DOI 10.1007/978-3-030-36691-9_47
  • Phillips, Mark E., Oksana L. Zavalina, and Hannah Tarver. “Using metadata record graphs to understand digital library metadata.” In International Conference on Dublin Core and Metadata Applications, pp. 49-58. 2020. link
  • Freire, Nuno, and Antoine Isaac. “Wikidata’s linked data for cultural heritage digital resources: an evaluation based on the Europeana data model.” In International Conference on Dublin Core and Metadata Applications, pp. 59-68. 2020. link
  • Kapidakis, Sarantos. “Consistency and Interoperability on Dublin Core Element Values in Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting.” In KEOD, pp. 181-188. 2020. link
  • Abgaz, Yalemisew, Amelie Dorn, José Luis Preza Díaz, and Gerda Koch. “Towards a Comprehensive Assessment of the Quality and Richness of the Europeana Metadata of food-related Images.” In Proceedings of the 1st International Workshop on Artificial Intelligence for Historical Image Enrichment and Access, pp. 29-33. 2020. link
  • Редькина, Н. С. “ЕВРОПЕАНА: ЦИФРОВОЕ КУЛЬТУРНОЕ НАСЛЕДИЕ ЕВРОПЫ.” Ученые записки (Алтайская государственная академия культуры и искусств) 2 (24) (2020). link

Péter Király. “Adat a könyvtárban” (Data in the library – paper in Hungarian about the changing status of data in LAM). In Hagyomány és újítás a 21. századi könyvtárban (Erdélyi Évszázadok. A Kolozsvári Magyar Történeti Intézet Évkönyve. III.) eds. Rüsz-Fogarasi Enikő, Monok István. Kolozsvár (Romania), 2018. ISBN 978-606-8886-1. pp. 49-74. http://real.mtak.hu/92256/1/ErdEvsz_tordelt_nyomdaba.pdf
cited by:

  • Virágos, Márta. “Open Science a könyvtárban: könyvtáros kompetenciák újraértelmezése.” Tudományos és Műszaki Tájékoztatás 67, no. 12 (2020): 739-756. link

Péter Király. “Measuring metadata quality”. PhD dissertation. DOI 10.13140/RG.2.2.33177.77920 (ResearchGate), Göttingen eDiss repository, Academia.edu.
cited by:

  • Skluzacek, Tyler J., Ryan Wong, Zhuozhao Li, Ryan Chard, Kyle Chard, and Ian Foster. “A Serverless Framework for Distributed Bulk Metadata Extraction.” In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, pp. 7-18. 2020. link

Péter Király. “Validating 126 million MARC records”. In DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage Brussels, Belgium — May 08-10, 2019. Published by ACM, 2019. ISBN: 978-1-4503-7194-0. pp. 161-168. DOI 10.1145/3322905.3322929
cited by:

  • Bryer, Evan, Theppatorn Rhujittawiwat, John R. Rose, and Colin F. Wilder. “SPELLING BASED RANKED CLUSTERING ALGORITHM TO CLEAN AND NORMALIZE EARLY MODERN EUROPEAN BOOK TITLES.” link
  • Bryer, Evan, Theppatorn Rhujittawiwat, Samyu Comandur, Vasco Madrid, Stephanie Riley, John Rose, and Colin Wilder. “Analysis of Clustering Algorithms to Clean and Normalize Early Modern European Book Titles.” In 2021 The 4th International Conference on Software Engineering and Information Management, pp. 106-112. 2021. DOI 10.1145/3451471.3451489
  • Ungváry, Rudolf. “MARC21 tartalmi adatmezők használata jelentősebb nagykönyvtárakban. Egy elemzés néhány tanulsága.” Networkshop (2020): 33-53. link

Péter Király and Marco Büchler. “A teljesség minőségjelzőként való mérése az Europeanában”. (Hungarian translation of “Measuring completeness as metadata quality metric in Europeana”) In Digitális Bölcsészet 2, 2019. pp. 57-76. DOI 10.31400/dh-hun.2019.2.388

2020

Péter Király. “Empirical evaluation of library catalogues”. In EuropeanaTech Newsletter 15, 2020. https://pro.europeana.eu/page/issue-15-swib-2019#empirical-evaluation-of-library-catalogues. In Spanish: “Evaluación empírica de los catálogos de las bibliotecas” (translator unkown - send me a message if you know the translator). Blog de la biblioteca de Traducción y Documentación de la Universidad de Salamanca, 2020. https://universoabierto.org/2020/06/01/evaluacion-empirica-de-los-catalogos-de-las-bibliotecas/

2021

Péter Király, and Jan Brase. “Qualitätsmanagement”. In Praxishandbuch Forschungsdatenmanagement. Berlin, Boston: De Gruyter Saur. doi: https://doi.org/10.1515/9783110657807-020, pp. 357–380.

About me

My name is Péter Király, and I am a software developer, analyst. My main interests are publishing and searching large text corporas in the web, and the new ways of web presence of cultural heritage (mainly library and archival materials with special focus on semantic web technologies). You can reach me via the methods listed in the contact page.

Sponsors

Thanks to GWDG for supporting my research in different ways, to Europeana and eTRAP research group for using their computers, to JetBrains s.r.o. for IntelliJ IDEA community licence, to developers of Open Source software packages, and infrastructure services I used in the research, and to Open Data publishers for their data.