The project

The Metadata Quality Assurance Framework is a research project on figuring out how can we decide in an algorithmical way whether a metadata record in a cultural heritage database is “good” or “bad”. During the project I will create a general framework, which lets metadata repositories and digital libraries (such as Europeana, TextGrid or Digital Public Library of America) to run a range of measurements on the collection, and get suggestion where they should improve the quality of their metadata.

These pages are about the process of the research – my early findings, results, codes, talks.

The first version of the web interface: http://144.76.218.178/europeana-qa

Source code

Publications

2015

Péter Király. “Metadata quality assurrance framework”. Unpublished doctoral research plan (2015) pkiraly.github.io
cited by:

  1. Vivien Petras, and Juliane Stiller. “A decade of evaluating europeana-constructs, contexts, methods & criteria.” In International Conference on Theory and Practice of Digital Libraries, pp. 233-245. Springer, Cham, 2017. DOI: 10.1007/978-3-319-67008-9_19
  2. Marcin Roszkowski. “Diagnostyka metadanych w kolekcjach cyfrowych.” Diagnostyka w zarządzaniu informacją: perspektywa informatologiczna (2017): pp. 365-390. researchgate.net
  3. Oksana L. Zavalina, Shadi Shakeri, Priya Kizhakkethil, and Mark E. Phillips. “Uncovering Hidden Insights for Information Management: Examination and Modeling of Change in Digital Collection Metadata.” In International Conference on Information, pp. 645-651. Springer, Cham, 2018. DOI: 10.1007/978-3-319-78105-1_74
  4. Branka Badovinac. “Merjenje kakovosti podatkov v bibliografskih in normativnih zapisih: študija primera izbranih podatkovnih elementov za fasetno omejevanje in izpis seznama zadetkov v COBISS+.” Organizacija Znanja 24, no. 1/2 (2019): pp. 1-20. cobiss.si
  5. Mark Edward Phillips, Oksana L. Zavalina, and Hannah Tarver. “Exploring the utility of metadata record graphs and network analysis for metadata quality evaluation and augmentation.” International Journal of Metadata, Semantics and Ontologies 14, no. 2 (2020): 112-123. DOI: 10.1504/IJMSO.2020.108326
  6. Rachel Jaffe. “Rethinking Metadata’s Value and How It Is Evaluated.” Technical Services Quarterly 37, no. 4 (2020): 432-443. DOI: 10.1080/07317131.2020.1810443
  7. August Wierling, Valeria Jana Schwanitz, Sebnem Altinci, Maria Bałazinska, Michael J. Barber, Mehmet Efe Biresselioglu, Christopher Burger-Scheidlin, Massimo Celino, Muhittin Hakan Demir, Richard Dennis, Nicolas Dintzner, Adel el Gammal, Carlos M. Fernández-Peruchena, Winston Gilcrease, Paweł Gładysz, Carsten Hoyer-Klick, Kevin Joshi, Mariusz Kruczek, David Lacroix, Małgorzata Markowska, Rafael Mayo-García, Robbie Morrison, Manfred Paier, Giuseppe Peronato, Mahendranath Ramakrishnan, Janeita Reid, Alessandro Sciullo, Berfu Solak, Demet Suna, Wolfgang Süß, Astrid Unger, Maria Luisa Fernandez Vanoni and Nikola Vasiljevic. “FAIR Metadata Standards for Low Carbon Energy Research—A Review of Practices and How to Advance.” Energies (2021) 14, no. 20, 6692. DOI: 10.3390/en14206692
  8. Matteo Lorenzini, Marco Rospocher, Sara Tonelli. “On assessing metadata completeness in digital cultural heritage repositories.” Digital Scholarship in the Humanities 36, Supplement_2, (2021) pp. ii182–ii188. DOI: 10.1093/llc/fqab036

2017

Juliane Stiller, and Péter Király. “Multilinguality of Metadata Measuring the Multilingual Degree of Europeana’s Metadata.” In M. Gäde, V. Trkulja, V. Petras (Eds.): Everything Changes, Everything Stays the Same? Understanding Information Spaces. Proceedings of the 15th International Symposium of Information Science (ISI 2017), Berlin, 13th—15th March 2017. Glückstadt: Verlag Werner Hülsbusch, pp. 164—176. URL (whole book): http://isi2017.ib.hu-berlin.de/ISI_17_ONLINE_FINAL.pdf (this paper): https://www.researchgate.net/publication/314879735_Multilinguality_of_Metadata_Measuring_the_Multilingual_Degree_of_Europeana%27s_Metadata
cited by:

  1. Fallert, Sarah. “Multilinguale Herausforderungen in der Sacherschließung.” Master’s thesis, Humboldt-Universität zu Berlin, 2020. edoc.hu-berlin.de
  2. August Wierling, Valeria Jana Schwanitz, Sebnem Altinci, Maria Bałazinska, Michael J. Barber, Mehmet Efe Biresselioglu, Christopher Burger-Scheidlin, Massimo Celino, Muhittin Hakan Demir, Richard Dennis, Nicolas Dintzner, Adel el Gammal, Carlos M. Fernández-Peruchena, Winston Gilcrease, Paweł Gładysz, Carsten Hoyer-Klick, Kevin Joshi, Mariusz Kruczek, David Lacroix, Małgorzata Markowska, Rafael Mayo-García, Robbie Morrison, Manfred Paier, Giuseppe Peronato, Mahendranath Ramakrishnan, Janeita Reid, Alessandro Sciullo, Berfu Solak, Demet Suna, Wolfgang Süß, Astrid Unger, Maria Luisa Fernandez Vanoni and Nikola Vasiljevic. “FAIR Metadata Standards for Low Carbon Energy Research—A Review of Practices and How to Advance.” Energies 2021, 14, 6692. DOI: 10.3390/en14206692

Péter Király. “Towards an extensible measurement of metadata quality.” In Second International Conference on Digital Access to Textual Cultural Heritage. Conference Proceedings. Göttingen, June 1-2, 2017. Published by ACM 2017. ISBN 978-1-4503-5265-9. pp. 111-115. DOI: 10.1145/3078081.3078109 URL: http://dl.acm.org/citation.cfm?doid=3078081.3078109
cited by:

  1. Gustavo Candela, Pilar Escobar, María Dolores Sáez and Manuel Marco-Such. “A Shape Expression approach for assessing the quality of Linked Open Data in Libraries.” Semantic Web pp. 1–21. DOI: 10.3233/SW-210441

Péter Király. “Measuring completeness as metadata quality metric in Europeana.” In Digital Humanities 2017. Conference Abstracts. McGill University & Université de Montréal, Montréal, Canada, August 8–11, 2017. Prepared by Rhian Lewis and the DH2017 Local Organizers: Cecily Raynor, Dominic Forest, Michael Sinatra and Stéfan Sinclair. pp. 291-293. URL (whole book): https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf, URL (the abstract): https://dh2017.adho.org/abstracts/458/458.pdf.

2018

Valentine Charles, Juliane Stiller, Péter Király, Werner Bailer, and Nuno Freire. “Data Quality Assessment in Europeana: Metrics for Multilinguality.” In Joint Proceedings of the 1st Workshop on Temporal Dynamics in Digital Libraries (TDDL 2017), the (Meta)-Data Quality Workshop (MDQual 2017) and the Workshop on Modeling Societal Future (Futurity 2017) (TDDL MDQual Futurity 2017) co-located with 21st International Conference on Theory and Practice of Digital Libraries (TPLD 2017) (Grand Hotel Palace, Thessaloniki, Greece, 21 September 2017), edited by A. Caputo, N. Kanhabua, P. Basile, S. Lawless, D. Gavrilis, Ch. Papatheodorou, D. Trandabat. (CEUR Workshop Proceedings Volume 2038. ISSN 1613-0073.), Published by CEUR, 2018. http://ceur-ws.org/Vol-2038/paper6.pdf.
cited by:

  1. Matteo Lorenzini, Rospocher Marco, and Sara Tonelli. “Proposta per una valutazione automatica della completeness dei metadati nel contesto delle biblioteche digitali.” DigItalia 2. (2020). pp. 159-167. DOI: 10.36181/digitalia-00023
  2. Subhi Issa, Onaopepo Adekunle, Fayçal Hamdi, Samira Si-Said Cherfi, Michel Dumontier, and Amrapali Zaveri. “Knowledge Graph Completeness: A Systematic Literature Review.” IEEE Access 9. (2021). pp. 31322-31339. DOI: 10.1109/ACCESS.2021.3056622

2019

Péter Király and Marco Büchler. “Measuring completeness as metadata quality metric in Europeana.” In 2018 IEEE International Conference on Big Data (Big Data). Published by IEEE, 2019. pp. 2711–2720. DOI: 10.1109/BigData.2018.8622487
cited by:

  1. Nadim Akhtar Khan, S. M. Shafi, and Humma Ahangar. “Digitization of cultural heritage: Global initiatives, opportunities and challenges.” Journal of Cases on Information Technology (JCIT) 20, no. 4 (2018): 1-16. igi-global.com
  2. Jongwook Lee. “Analysis and Suggestions of Digital Heritage Policy.” Journal of The Korea Society of Computer and Information 24, no. 10 (2019): 71-78. koreascience.or.kr
  3. Klara Martha Wanderley Freire. “A curadoria digital nas instituições culturais: possibilidades de reuso de dados de Arte.” (2019). repositorio.ibict.br
  4. Jussi Pajari. “Tutkimusaineistojen metatiedot: Metatietojen laatu data-ja metatietoarkistoissa.” Master’s thesis, 2019. trepo.tuni.fi
  5. Mohammadreza Tavakoli, Mirette Elias, Gábor Kismihók, and Sören Auer. “Quality prediction of open educational resources a metadata-based approach.” In 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT), pp. 29-31. IEEE, 2020. DOI: 10.1109/ICALT49669.2020.00007
  6. Kristen Schuster, and Stuart Dunn, eds. Routledge International Handbook of Research Methods in Digital Humanities. Routledge, 2020. books.google.com
  7. Mohammadreza Tavakoli, Ali Faraji, Stefan T. Mol, and Gábor Kismihók. “OER Recommendations to Support Career Development.” In 2020 IEEE Frontiers in Education Conference (FIE), pp. 1-5. IEEE, 2020. DOI: 10.1109/FIE44824.2020.9274175
  8. A. J. Million “Information Communication Technologies, Infrastructure, and Research Methods in the Digital Humanities.” In Routledge International Handbook of Research Methods in Digital Humanities, pp. 190-202. Routledge, 2020. DOI: 10.4324/9780429777028-15
  9. Mark Edward Phillips, Oksana L. Zavalina, and Hannah Tarver. “Exploring the utility of metadata record graphs and network analysis for metadata quality evaluation and augmentation.” International Journal of Metadata, Semantics and Ontologies 14, no. 2 (2020): 112-123. DOI: 10.1504/IJMSO.2020.108326
  10. Yalemisew Abgaz, Amelie Dorn, José Luis Preza Díaz, and Gerda Koch. “Towards a Comprehensive Assessment of the Quality and Richness of the Europeana Metadata of food-related Images.” In Proceedings of the 1st International Workshop on Artificial Intelligence for Historical Image Enrichment and Access, pp. 29-33. 2020. aclweb.org
  11. Johann Eder and Vladimir A. Shekhovtsov. “Data Quality for Medical Data Lakelands.” In International Conference on Future Data and Security Engineering, pp. 28-43. Springer, Cham, 2020. DOI: 10.1007/978-3-030-63924-2_2
  12. Matteo Lorenzini, Marco Rospocher, and Sara Tonelli. “Proposta per una valutazione automatica della completeness dei metadati nel contesto delle biblioteche digitali.” DigItalia 2. (2020). pp. 159-167. DOI: 10.36181/digitalia-00023
  13. Johann Eder and Vladimir A. Shekhovtsov. “Data quality for federated medical data lakes.” International Journal of Web Information Systems (2021). DOI: 10.1108/IJWIS-03-2021-0026
  14. Mark Edward Phillips and Hannah Tarver. “Investigating the use of metadata record graphs to analyze subject headings in the digital public library of America.” The Electronic Library (2021). DOI: 10.1108/EL-11-2020-0317
  15. Susan A. Barrett “Participatory Description and Metadata Quality in Rapid Response Archives.” Collections (2021): 1550190620981038. DOI: 10.1177/1550190620981038
  16. Matteo Lorenzini, Marco Rospocher, and Sara Tonelli. “Automatically evaluating the quality of textual descriptions in cultural heritage records.” International Journal on Digital Libraries 22, no. 2 (2021): 217-231. DOI: 10.1007/s00799-021-00302-1
  17. Lisa Wenige, Claus Stadler, Michael Martin, Richard Figura, Robert Sauter, and Christopher W. Frank. “Open Data and the Status Quo–A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany.” arXiv preprint arXiv:2106.09590 (2021). arxiv.org
  18. Mohammadreza Tavakoli, Mirette Elias, Gábor Kismihók, and Sören Auer. “Metadata Analysis of Open Educational Resources.” In LAK21: 11th International Learning Analytics and Knowledge Conference, pp. 626-631. 2021. DOI: 10.1145/3448139.3448208
  19. Lisandra Díaz de la Paz, Francisco N. Riestra Collado, Juan L. García Mendoza, Luisa M. González González, Amed A. Leiva Mederos, and Alberto Taboada Crispi. “Weights Estimation in the Completeness Measurement of Bibliographic Metadata.” Computación y Sistemas 25, no. 1. 2021. pp. 47–65. DOI: 10.13053/CyS-25-1-3355
  20. Петр Сергеевич Ершов, and Юрий Евгеньевич Хохлов. “Цифровая инфраструктура для работы с большими данными.” Информационное общество 4-5 (2021): 110-131. infosoc.iis.ru

Péter Király, Juliane Stiller, Valentine Charles, Werner Bailer, and Nuno Freire. “Evaluating Data Quality in Europeana: Metrics for Multilinguality.” In Metadata and Semantic Research 2019. 12th International Conference, MTSR 2018, Limassol, Cyprus, October 23-26, 2018, Revised Selected Papers (Communications in Computer and Information Science, volume 846) Published by Springer, 2019. pp. 199–211. DOI: 10.1007/978-3-030-14401-2_19
cited by:

  1. Nuno Freire and Antoine Isaac. “Technical usability of Wikidata’s linked data.” In International Conference on Business Information Systems, pp. 556-567. Springer, Cham, 2019. DOI: 10.1007/978-3-030-36691-9_47
  2. Mark E. Phillips, Oksana L. Zavalina, and Hannah Tarver. “Using metadata record graphs to understand digital library metadata.” In International Conference on Dublin Core and Metadata Applications, pp. 49-58. 2020. dcpapers.dublincore.org
  3. Nuno Freire, and Antoine Isaac. “Wikidata’s linked data for cultural heritage digital resources: an evaluation based on the Europeana data model.” In International Conference on Dublin Core and Metadata Applications, pp. 59-68. 2020. dcpapers.dublincore.org
  4. Sarantos Kapidakis. “Consistency and Interoperability on Dublin Core Element Values in Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting.” In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) Volume 2: KEOD, pp. 181-188. 2020. DOI: 10.5220/0010112001810188
  5. Yalemisew Abgaz, Amelie Dorn, José Luis Preza Díaz, and Gerda Koch. “Towards a Comprehensive Assessment of the Quality and Richness of the Europeana Metadata of food-related Images.” In Proceedings of the 1st International Workshop on Artificial Intelligence for Historical Image Enrichment and Access, pp. 29-33. 2020. aclanthology.org
  6. Н. С. Редькина “ЕВРОПЕАНА: ЦИФРОВОЕ КУЛЬТУРНОЕ НАСЛЕДИЕ ЕВРОПЫ.” Ученые записки (Алтайская государственная академия культуры и искусств) 2 (24) (2020). cyberleninka.ru

Péter Király. “Adat a könyvtárban” (Data in the library – paper in Hungarian about the changing status of data in LAM). In Hagyomány és újítás a 21. századi könyvtárban (Erdélyi Évszázadok. A Kolozsvári Magyar Történeti Intézet Évkönyve. III.) eds. Rüsz-Fogarasi Enikő, Monok István. Kolozsvár (Romania), 2018. ISBN 978-606-8886-1. pp. 49-74. http://real.mtak.hu/92256/1/ErdEvsz_tordelt_nyomdaba.pdf
cited by:

  1. Virágos Márta. “Open Science a könyvtárban: könyvtáros kompetenciák újraértelmezése.” Tudományos és Műszaki Tájékoztatás 67, no. 12 (2020): 739-756. tmt.omikk.bme.hu

Péter Király. “Measuring metadata quality”. PhD dissertation. DOI: 10.13140/RG.2.2.33177.77920 (ResearchGate), Göttingen eDiss repository, Academia.edu.
cited by:

  1. Tyler J. Skluzacek, Ryan Wong, Zhuozhao Li, Ryan Chard, Kyle Chard, and Ian Foster. “A Serverless Framework for Distributed Bulk Metadata Extraction.” In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, pp. 7-18. 2020. DOI: 10.1145/3431379.3460636

Péter Király. “Validating 126 million MARC records”. In DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage Brussels, Belgium — May 08-10, 2019. Published by ACM, 2019. ISBN: 978-1-4503-7194-0. pp. 161-168. DOI: 10.1145/3322905.3322929
cited by:

  1. Ungváry, Rudolf. “MARC21 tartalmi adatmezők használata jelentősebb nagykönyvtárakban. Egy elemzés néhány tanulsága.” Networkshop (2020): 33-53. real.mtak.hu
  2. Ungváry, Rudolf. “Ismeretszervező-könyvtári rendszerek tartalmi feltárásának összehasonlító vizsgálata MARC21 környezetben.” Tudományos és Műszaki Tájékoztatás, 2020. (67. évf.) 11. sz. pp. 655-680. tmt.omikk.bme.hu
  3. Evan Bryer, Theppatorn Rhujittawiwat, John R. Rose, and Colin F. Wilder. “Spelling Based Ranked Clustering Algorithm To Clean And Normalize Early Modern European Book Titles.” ihci-conf.org
  4. Evan Bryer, Theppatorn Rhujittawiwat, Samyu Comandur, Vasco Madrid, Stephanie Riley, John Rose, and Colin Wilder. “Analysis of Clustering Algorithms to Clean and Normalize Early Modern European Book Titles.” In 2021 The 4th International Conference on Software Engineering and Information Management, pp. 106-112. 2021. DOI: 10.1145/3451471.3451489
  5. Gustavo Candela, Pilar Escobar, María Dolores Sáez and Manuel Marco-Such. “A Shape Expression approach for assessing the quality of Linked Open Data in Libraries.” Semantic Web pp. 1–21. DOI: 10.3233/SW-210441
  6. Jakob Voß. “Datenqualität als Grundlage qualitativer Inhaltserschließung.” In Qualität in der Inhaltserschließung. Edited by: Michael Franke-Maier, Anna Kasprzik, Andreas Ledl and Hans Schürmann. Berlin, Boston: De Gruyter Saur. ISBN: 9783110691597, DOI: 10.1515/9783110691597 (Bibliotheks- und Informationspraxis, Volume 70) pp. 167-176. DOI: 10.1515/9783110691597-011
  7. Vyacheslav Zavalin, Oksana L. Zavalina and Rachel Safa. “Patterns of Subject Metadata Change in MARC 21 Bibliographic Records for Video Recordings.” Proceedings of the Association for Information Science and Technology 58, no. 1 (2021): 543-547. DOI: 10.1002/pra2.494
  8. Evan Bryer, Theppatorn Rhujittawiwat, John R. Rose, and Colin F. Wilder. “Improvement of Clustering Algorithms by Implementation of Spelling Based Ranking.” IADIS International Journal on Computer Science and Information Systems 2021. Vol. 16, No. 2, pp. 45-60 ISSN: 1646-3692. iadisportal.org

Péter Király and Marco Büchler. “A teljesség minőségjelzőként való mérése az Europeanában”. (Hungarian translation of “Measuring completeness as metadata quality metric in Europeana”) In Digitális Bölcsészet 2, 2019. pp. 57-76. DOI: 10.31400/dh-hun.2019.2.388

2020

Péter Király. “Empirical evaluation of library catalogues”. In EuropeanaTech Newsletter 15, 2020. https://pro.europeana.eu/page/issue-15-swib-2019#empirical-evaluation-of-library-catalogues. In Spanish: “Evaluación empírica de los catálogos de las bibliotecas” (translator unkown - send me a message if you know the translator). Blog de la biblioteca de Traducción y Documentación de la Universidad de Salamanca, 2020. https://universoabierto.org/2020/06/01/evaluacion-empirica-de-los-catalogos-de-las-bibliotecas/

Péter Király. “A magyar népzenei adatok története és a (digitális) archiválás lehetőségei. Bolya Mátyás. Információelmélet és népzenekutatás: Rendszeralkotás, nyilvántartás, digitális archívum. Budapest: MTA BTK Zenetudományi Intézet–L’Harmattan Kiadó, 2019.” Book review. In Digitális Bölcsészet 3, 2020. pp. 7-15. DOI: 10.31400/dh-hun.2020.3.1405

2021

Péter Király, and Jan Brase. “Qualitätsmanagement”. In Praxishandbuch Forschungsdatenmanagement. Edited by: Markus Putnings, Heike Neuroth and Janna Neumann. Berlin, Boston: De Gruyter Saur. ISBN: 9783110653656, DOI: 10.1515/9783110657807 (De Gruyter Praxishandbuch) pp. 357–380. DOI: 10.1515/9783110657807-020

Rudolf Ungváry, and Péter Király. “Bemerkungen zu der Qualitätsbewertung von MARC-21-Datensätzen”. In Qualität in der Inhaltserschließung. Edited by: Michael Franke-Maier, Anna Kasprzik, Andreas Ledl and Hans Schürmann. Berlin, Boston: De Gruyter Saur. ISBN: 9783110691597, DOI: 10.1515/9783110691597 (Bibliotheks- und Informationspraxis, Volume 70) pp. 177-227. DOI: 10.1515/9783110691597-011

About me

My name is Péter Király, and I am a software developer, analyst. My main interests are publishing and searching large text corporas in the web, and the new ways of web presence of cultural heritage (mainly library and archival materials with special focus on semantic web technologies). You can reach me via the methods listed in the contact page.

Sponsors

Thanks to GWDG for supporting my research in different ways, to Europeana and eTRAP research group for using their computers, to JetBrains s.r.o. for IntelliJ IDEA community licence, to developers of Open Source software packages, and infrastructure services I used in the research, and to Open Data publishers for their data.