Machine learning for the study of ancient epigraphic cultures

Arch of Constantine. Rome, 315 CE. Photo by Thea Sommerschield


The PythiaPlus research project proposes to explore and interpret the nature of the written (epigraphic) cultures of the ancient Mediterranean using Machine Learning. 

Its aim is to transform our understanding of the use of epigraphic communication and the nature of cultural interference within the written and indirectly spoken languages of the ancient world. This will be achieved by revolutionising our ability to access and analyse the epigraphic data: state-of-the-art Machine Learning models will be trained to trace distinctiveness and change in the Greek and Roman epigraphic habits on an unprecedented large scale and in unparalleled detail, revealing new insights in linguistic and cultural interactions.

Machine Learning is a field of Artificial Intelligence. License: Attribution 2.0 Generic (CC BY 2.0)
The Code of Gortyn. Crete, 5th century BCE. License: ​​Attribution-ShareAlike 2.5 Generic (CC BY-SA 2.5)


Computational approaches have come to foreground or feature prominently in the research methodologies of Classical Studies and the Humanities, thus defining a unique moment and a remarkable opportunity to write an interdisciplinary history of the Graeco-Roman world in the Digital Age. Indeed, the rise of Machine Learning as a field has transformed the way data-driven research is done today, and it can meaningfully impact the way historical data is collected, analysed and interpreted. 

Inscribed texts (inscriptions) in Greek and Latin are primary evidence for reconstructing the history and thought of the Ancient World due to their large number and variety in content, forming a crucial repository of textual material culture. Machine Learning models could now reveal patterns in this data which historians were previously unable to identify in such detail and on such scale: PythiaPlus will enable the first "big data" study of epigraphic cultures over circa 1,500 years of ancient Mediterranean history using Machine Learning models.

Bronze fragment of a Greek inscription recording a citizenship grant. Akrai (Sicily), ca. 490–480 BCE. License: © Metropolitan Museum (CC-0)

The project is articulated into 3 research work packages:

  1. Dataset building: gather, sample and prepare the epigraphic data to be used by Machine Learning models, rendering machine actionable the largest digital corpora of Greek and Latin inscriptions. 
  2. Model training: train models on the data, evaluating their performance and tuning the model’s parameters to improve statistical performance and explainability. The tasks tackled will range from textual restoration and topic modelling, to the study of epigraphic metadata concerning the time and place of writing of Greek and Roman inscriptions.
  3. Result interpretation: interpret the new patterns discovered by the models in light of scholarly approaches to distinctive and distinct epigraphic habits.
Section of the Fasti Praenestini, calendar of Verrius Flaccus. Praeneste (Italy), early 1st century CE. License: © Marie-Lan Nguyen / Wikimedia Commons / CC-BY 2.5


PythiaPlus’ expected results are: 

  1. To deliver educational and research tools, the open-sourced machine learning models and the two standardised epigraphic datasets, to track textual connections and make the machine actionable data more accessible for future research. 
  2. To advance the state of the art in the contextualisation of written (epigraphic) evidence and the reconstruction of ancient epigraphic habits, also projecting the patterns discovered by the machine learning models onto evidence whose attribution is uncertain, as in the case of newly discovered, looted or irreparably damaged documents. 
  3. To pioneer a new approach to the analysis of textual material cultures using machine learning, applying the latest technological advances to the study of ancient artefacts and documents and drawing from both historical and digital methodologies. 
Inscribed fragment of a marble grave stele of a woman. Attica, ca. 400–390 BCE. License: © Metropolitan Museum (CC-0)
Biologically-inspired neural network models are studied by the field of Deep Learning. License: Free to use under the Unsplash License


  1. DeepMind is one of the world's leading organisations in artificial intelligence research. Its Research Teams use Google’s computing infrastructure to establish an unparalleled track record of AI breakthroughs resulting in hundreds of peer-reviewed papers. 
  2. The Department of Informatics of the Athens University of Economics and Business (AUEB) is one of the largest Computer Science departments in Greece. I will be hosted by the Information Processing Lab and I will be joining the Natural Language Processing Group
  3. The Digital Curation Unit (DCU) is part of the Information Management Systems Institute (IMSI) under the “Athena” Research Centre. It conducts research, develops technologies and apps, and acts as a national focus point in the field of digital curation (such as the APOLLONIS infrastructure for Digital Humanities).