1 Introduction
Bibliographic data science is a relatively new interdisciplinary field of research that lies at the intersection of library science (or, more broadly, cultural heritage science), history and social sciences, and certain components of computer science. The objective of bibliographic data science is to establish previously hidden or possibly only suspected historical or collection trends based on data sources containing a (typically but not exclusively) large number of bibliographic records, ideally all those related to a given topic (e.g., national bibliographies), and on data science methods. Some of the field’s research questions:
- What was the spatial distribution and prosopography of 17th-century German legal dissertations? (Heßbrüggen-Walter 2025)
- What degree of interdisciplinarity can be observed based on the metadata of philosophical dissertations? (Heßbrüggen-Walter 2024)
- How did the format and language of books change over time in different regions? (Lahti et al. 2019)
- What are the patterns of translations from a given language, how have they changed, and which languages were super-central, central, and peripheral in a given era? (Heilbron 1999)
- What impact do publishers have on fiction? (Bourdieu 2008)
- What were the profiles of the various book collections?
- Is there a correlation between the genre and format of the book? (Lahti et al. 2019)
- How have genre proportions changed? (Király and Kiséry 2025)
- How many early modern publications could have been destroyed without a trace? (Farkas, Káldos, and Király 2025)
- How can the reception of works be examined using bibliographic data? (Szemes and Dobás 2025)
- What is the quality of cultural heritage data, and what improvement strategies can be developed? (Király 2019)
- How do cultural heritage data, data structures, and standards help (or hinder) answering the above questions? What development opportunities does the research suggest for cultural heritage data standards? (Király et al. 2025)
Although digital humanities education has developed dynamically in recent years, computer-based analysis of bibliographic sources is unfortunately rarely featured, and similarly absent from library science and IT education. In my opinion, this gap could be remedied by a new informal vocational training program that would appeal to those who are interested in some of the above issues and who already have some knowledge in one of the relevant fields (e.g., library science, cultural history, literary sociology, information technology). The analysis of records based on library bibliographic standards would probably also be of interest in library training. The training may take the form of a summer university or a seminar/course jointly organized by several university departments. Participants in the training could be university students or practicing professionals.
1.1 Preparation
In this book we use the Python programming language. You should have a basic knowledge of the language and should know how to install it on your machine. In order to separate our environment from already installed Python modules, we use a virtual environment. To create it run the following:
python -m venv venvWhen you run the code in the book, you should first activate this virtual environment:
source venv/bin/activate… and when you finish the session, you should deactivate it:
deactivateWhen we talk about installing a module you should do it within this environment, then you can use the standard Python module installation method:
venv/bin/pip install pandasWe provide a list of modules used in this book, you can install them in a single step as:
venv/bin/pip install -r requirements.txtSome of the code examples run in the command line and written in bash, that is available by default in Linux and Mac machines. For Windows you can install it via WLS.
- Open command line or PowerShell and enter:
wsl --install -d Ubuntu
wsl --set-default-version 2
wsl --set-default ubuntu- in Windows search enter Ubuntu and click on the Ubuntu icon, or in command line/PowerShell enter
ubuntuWhen you enter this virtual Ubuntu the first time, you should give a user name (which might be the same or different as your Windows user name), and a password.
You can find more details and troubleshooting in the following documentation page: WSL Installation
1.2 A note on code
The code (Python, HTML, XML etc.) in this book is a bit formatted by adding spaces and line breaks in order to make it easier to understand. These changes neither affect the original intention of the code, nor the processing workflow. For the original format please check source code of the examples.