PythiaPlus
Machine learning for the study of ancient epigraphic cultures

Arch of Constantine. Rome, 315 CE. Photo by Thea Sommerschield

About Research Outputs Secondments Partners Contacts

About

The PythiaPlus research project proposes to explore and interpret the nature of the written (epigraphic) cultures of the ancient Mediterranean using Machine Learning.

Its aim is to transform our understanding of the use of epigraphic communication and the nature of cultural interference within the written and indirectly spoken languages of the ancient world. This will be achieved by revolutionising our ability to access and analyse the epigraphic data: state-of-the-art Machine Learning models will be trained to trace distinctiveness and change in the Greek and Roman epigraphic habits on an unprecedented large scale and in unparalleled detail, revealing new insights in linguistic and cultural interactions.

Machine Learning is a field of Artificial Intelligence. License: Attribution 2.0 Generic (CC BY 2.0)

The Code of Gortyn. Crete, 5th century BCE. License: Attribution-ShareAlike 2.5 Generic (CC BY-SA 2.5)

Research

Computational approaches have come to foreground or feature prominently in the research methodologies of Classical Studies and the Humanities, thus defining a unique moment and a remarkable opportunity to write an interdisciplinary history of the Graeco-Roman world in the Digital Age. Indeed, the rise of Machine Learning as a field has transformed the way data-driven research is done today, and it can meaningfully impact the way historical data is collected, analysed and interpreted.

Inscribed texts (inscriptions) in Greek and Latin are primary evidence for reconstructing the history and thought of the Ancient World due to their large number and variety in content, forming a crucial repository of textual material culture. Machine Learning models could now reveal patterns in this data which historians were previously unable to identify in such detail and on such scale: PythiaPlus will enable the first "big data" study of epigraphic cultures over circa 1,500 years of ancient Mediterranean history using Machine Learning models.

Bronze fragment of a Greek inscription recording a citizenship grant. Akrai (Sicily), ca. 490–480 BCE. License: © Metropolitan Museum (CC-0)

The project is articulated into 3 research work packages:

Dataset building: gather, sample and prepare the epigraphic data to be used by Machine Learning models, rendering machine actionable the largest digital corpora of Greek and Latin inscriptions.
Model training: train models on the data, evaluating their performance and tuning the model’s parameters to improve statistical performance and explainability. The tasks tackled will range from textual restoration and topic modelling, to the study of epigraphic metadata concerning the time and place of writing of Greek and Roman inscriptions.
Result interpretation: interpret the new patterns discovered by the models in light of scholarly approaches to distinctive and distinct epigraphic habits.

Section of the Fasti Praenestini, calendar of Verrius Flaccus. Praeneste (Italy), early 1st century CE. License: © Marie-Lan Nguyen / Wikimedia Commons / CC-BY 2.5

PythiaPlus’ expected results are:

To deliver educational and research tools, the open-sourced machine learning models and the two standardised epigraphic datasets, to track textual connections and make the machine actionable data more accessible for future research.
To advance the state of the art in the contextualisation of written (epigraphic) evidence and the reconstruction of ancient epigraphic habits, also projecting the patterns discovered by the machine learning models onto evidence whose attribution is uncertain, as in the case of newly discovered, looted or irreparably damaged documents.
To pioneer a new approach to the analysis of textual material cultures using machine learning, applying the latest technological advances to the study of ancient artefacts and documents and drawing from both historical and digital methodologies.

Inscribed fragment of a marble grave stele of a woman. Attica, ca. 400–390 BCE. License: © Metropolitan Museum (CC-0)

Biologically-inspired neural network models are studied by the field of Deep Learning. License: Free to use under the Unsplash License

Outputs

The first output of the PythiaPlus project is Ithaca, the first deep neural network for restoring, dating and placing ancient Greek inscriptions. This research is the result of a collaborative effort between Ca' Foscari University of Venice, DeepMind, the University of Oxford and the University of Athens AUEB. The model was designed with collaboration in mind and is best used in conjunction with researchers where historical knowledge combines with Ithaca’s assistive input. While Ithaca alone achieves 62% accuracy when restoring damaged texts, when historians use it their performance leaps from 25% to 72%. Ithaca can also attribute inscriptions to their original location with 71% accuracy and date them with less than 30 years from ground-truth ranges. Machine Learning models like Ithaca can unlock the cooperative potential between AI and the humanities, transformationally impacting the way we study and write about ancient Greek history.

The article was published on the cover issue of Nature (vol. 603, no. 7900, 10 March 2022) and received media coverage worldwide (The Times, The Guardian, New Scientist, La Repubblica, Kathimerini, DeepMind, Ca’ Foscari News, the Greek Ministry of Culture).

To make the research widely available to researchers, educators, museum staff and others, the team partnered with Google Cloud and Google Arts & Culture to launch a free interactive version of Ithaca, and also open sourced the code, the pretrained model, and an interactive Collaboratory notebook.

Read the article (open access)
Watch the promotional video (curated by Nature)
Use Ithaca’s interface for your own research
Check out Ithaca’s code
Cite the article: Assael, Y.,* Sommerschield, T.,* Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., de Freitas, N. “Restoring and attributing ancient texts with deep neural networks.” In Nature, 603(7900). Cover issue of 10 March 2022. https://doi.org/10.1038/s41586-022-04448-z

Cover issue of Nature (vol. 603, no. 7900, 10 March 2022)

Secondments

DeepMind is one of the world's leading organisations in artificial intelligence research. Its Research Teams use Google’s computing infrastructure to establish an unparalleled track record of AI breakthroughs resulting in hundreds of peer-reviewed papers.
The Department of Informatics of the Athens University of Economics and Business (AUEB) is one of the largest Computer Science departments in Greece. I will be hosted by the Information Processing Lab and I will be joining the Natural Language Processing Group.
The Digital Curation Unit (DCU) is part of the Information Management Systems Institute (IMSI) under the “Athena” Research Centre. It conducts research, develops technologies and apps, and acts as a national focus point in the field of digital curation (such as the APOLLONIS infrastructure for Digital Humanities).

Type	Name	Sender (Domain)	Description	Duration	Policy
Type	Name	Sender (Domain)	Description	Duration	Policy
Essential	_ga	Google (unive.it)	It registers a unique ID used to generate static data on how the visitor uses the website.	2 years	Information by Google
Essential	_gid	Google (unive.it)	It monitors and analyses traffic data and it stores the user's behaviour in order to improve his/her experience.	1 day	Information by Google
Essential	_gat[*]	Google (unive.it)	Used by Google Analytics to limit the frequency of requests.	1 day	Information by Google
Essential	_shibsession[], _shibsstate[]	Unive.it (www.unive.it)	They maintain the session data of the SingleSignOn.	session	Information by Ca' Foscari University
Essential	PHPSESSID	Unive.it (www.unive.it)	Unique user identifier for the website applications.	session	Information by Ca' Foscari University
Essential	_Secure-3PSIDCC	Google (google.com)	It allows to count visits and traffic sources in order to measure and improve the website performances. It helps to discover which webpages are more/less popular and how visitors browse into the website. It does not collect information on the user's identity or any personal data. All information are aggregated and anonymised.	session	Information by Google
Essential	SIDCC	Google (google.com)	Security cookie used to protect users' data from unauthorised accesses.	1 year	Information by Google
Essential	__Secure-3PAPISID	Google (google.com)	It allows to count visits and traffic sources in order to measure and improve the website performances. It helps to discover which webpages are more/less popular and how visitors browse into the website. It does not collect information on the user's identity or any personal data. All information are aggregated and anonymised.	1 year	Information by Google
Essential	SSID	Google (google.com)	It collects information on the website use every time users visit webpages containing Google services.	2 year	Information by Google
Essential	__Secure-3PSID	Google (google.com)	It allows to count visits and traffic sources in order to measure and improve the website performances. It helps to discover which webpages are more/less popular and how visitors browse into the website. It does not collect information on the user's identity or any personal data. All information are aggregated and anonymised.	2 year	Information by Google
Essential	SID; HSID	Google (google.com)	They are used to autheticate users ensuring access to their account only if they are the relevant owner. The "SID" and "HSID" cookies contain records with digital signature and encrypted records related to the users' Google Account ID and their more recent access. The combination of these cookies allow to block many types of attacks, such as the attempts to steal the contents sent via Google forms.	2 year	Information by Google
Essential	SAPISID	Google (google.com)	It stores the user's preferences and information every time he/she visits webpages containing Google services.	2 year	Information by Google
Essential	APISID	Google (google.com)	It stores the user's preferences and information every time he/she visits webpages containing Google services.	2 year	Information by Google
Essential	cookie[*]	Unive.it (www.unive.it)	It stores the user's preferences on cookies. user preferences on cookies.	1 month	Information by Ca' Foscari University
Essential	cookie	idp.unive.it	It stores the user's preferences on cookies.	1 month	Information by Ca' Foscari University
Essential	_idp[*]	Unive.it (idp.unive.it)	Authentication management and SingleSignOn. management and SingleSignOn	session	Information by Ca' Foscari University
Essential	fe_typo_user	Unive.it (www.unive.it)	Unique user identifier for the reserved area of the website	session	Information by Ca' Foscari University
Essential	g_UA_45500906_8	Google (unive.it)	It collects information on the website use every time users visit webpages containing Google services.	session	Information by Google
Essential	JSESSIONID	Unive.it (www.unive.it)	Used to create web sessions into the Personal Area.	session	Information by Ca' Foscari University
Essential	ADMCMD_prev	Unive.it (www.unive.it)	Used to create web sessions into the Personal Area.	session	Information by Ca' Foscari University
Essential	unive.it	Unive.it (www.unive.it)	It stores the user's preferences on cookies.	6 months	Information by Ca' Foscari University
Essential	noiframe	Unive.it (www.unive.it)	It stores the user's preferences on cookies.	6 months	Information by Ca' Foscari University
Google - Youtube	__Secure-1PAPISID	Google (google.com)	Used for targeting purposes in order to acquire web visitors' interests and show them pertinent and customised Google advertising.	2 years	Information by Google
Google - Youtube	CONSENT	Google (google.com)	Used by Google to store the user's preferences.	17 years	Information by Google
Google - Youtube	NID	Google (google.com)	Used by Google to store the user's preferences.	6 months	Information by Google
Google - Youtube	__Secure-1PSID	Google (google.com)	Used for targeting purposes in order to acquire web visitors' interests and show them pertinent and customised Google advertising.	2 years	Information by Google

PythiaPlus
Machine learning for the study of ancient epigraphic cultures

About

Research

Outputs

Secondments

Partners

Institutions

Thea Sommerschield

Lorenzo Calvelli

PythiaPlus Machine learning for the study of ancient epigraphic cultures

About

Research

Outputs

Secondments

Partners

Institutions

Thea Sommerschield

Lorenzo Calvelli

PythiaPlus
Machine learning for the study of ancient epigraphic cultures