An Introduction to Digital Humanities

Expert insights into our digital landscape

Academic Advisor: Professor David De Roure, University of Oxford

Coordinator: Judy Dendy (Department of Engineering Science, University of Oxford)

Hashtags: #introDH and #DHOxSS20

Computers: Participants are not required to bring their own laptops for this workshop but may find it useful.


Broaden your understanding of the range of work the Digital Humanities encompasses and learn about the tools and techniques available for scholarly purposes.

This lecture-based survey course gives you a thorough overview of the theory and practice of Digital Humanities. Drawing on expertise from across the University of Oxford and our national and international collaborators, and on the University's library collections, it will appeal to anyone new to the field, or curious to broaden their understanding of the range of work the Digital Humanities encompass.


Sessions include talks, presentations, demonstrations, and practical workshops. On completing this course, you will be conversant with the variety and potential of the various technologies used to collate, interrogate, and facilitate digital work in the Humanities, and will have gained insight and practice in methods relevant to your own research.


Intended outcomes

Attendees of this strand will gain:

- a broad overview of the Digital Humanities field
- insights into the state of the art in the practice of digital methods in the humanities
- an awareness of future directions in the field an awareness of future directions in the field

To enable screen reader support, press Ctrl+Alt+Z To learn about keyboard shortcuts, press Ctrl+slash

Experience necessary

No prior technical knowledge is necessary for this course.


Computer and software requirements


Participants are not required to bring their own laptops but may find it useful.

Academic advisor

David De Roure is Professor of e-Research at the University of Oxford's e-Research Centre. Focused on advancing digital scholarship, David works closely with multiple disciplines including social sciences (studying social machines), humanities (computational musicology and experimental humanities), engineering (Internet of Things), and computer science (large scale distributed systems and social computing). He has extensive experience in hypertext, Web Science, Linked Data, and Internet of Things. Drawing on this broad interdisciplinary background he is a frequent speaker and writer on the future of digital scholarship and scholarly communications. Professor De Roure is also a Visiting Researcher at the Alan Turing Institute, working at the intersection of data science with libraries and GLAM (Gardens, Libraries and Museums at the University of Oxford), and a Visiting Professor at Goldsmiths, University of London.


Judy Dendy has been one of the key members of the DHOxSS Events Team since the 2017 Summer School. She assembles the nine-strand programme, as well as coordinating the Introduction to Digital Humanities strand. She handles speakers' travel and accommodation, the contents and production of the conference bags, and the design and produciton of the conference booklets. She works 'behind the scenes' in general support roles as part on the Events Team for DHOxSS.

"The strand was a perfect way to get introduced to the different existing technologies and to stimulate a reflection on how to enhance one's research through the use of digital tools".

DHOxSS 2019 participant


Monday, 13th July
Registration (Sloane Robinson building)
Tea and coffee (ARCO building)

Opening Keynote (O'Reilly lecture theatre)

Refreshment break (ARCO building)

Introductions and strand keynote
Digital scholarship: Intersection, Scale, and Social Machines 

Today we are witnessing many shifts in scholarly practice, in and across multiple disciplines, as researchers embrace digital techniques to tackle established questions in new ways and new questions afforded by our increasingly digital society and digitized collections. These methods include computational techniques but also citizen science, the notion of Social Machines, "experimental humanities", and artificial intelligence. We take a broad look at Digital Humanities and set the scene for the week's discussions

Speaker: David De Roure.


LUNCH (Dining Hall)


A Humanities perspective - Talk TBC
Speakers: David De Roure and Megan Gooch
Text Mining 

I will talk us through an approach to "forensic stylometry", that is, identifying the author of a text, based on a corpus of documents. This field made headlines in 2013 when two professors of computational linguistics proved that JK Rowling was the author of a detective series which she had written under a pseudonym. Traditionally this would have been done with a hand engineered sequence of components for removing stopwords, lemmatising words, and constructing a bag of words model. However recent advances in deep learning software have made it simple to build text classifiers with almost no feature engineering. In a few hours we will build a classifier to identify authorship which can be trained in a few minutes and will run on a regular laptop.

Speaker: Tom Wood

Refreshment break

Negotiating the digital archive

In 1995, Jacques Derrida wrote that 'nothing is less reliable, nothing is less clear today than the word “archive.”' Twenty-five years on, the archive, complicated still further by digital processes, is still unclear. Yet, we all use archives; and, we all contribute to them. This session will explore some of the processes and complexities of digital archives, providing a point of reflection for everyone engaged with digital research.


Speaker: Andrew Cusworth

Tuesday, 14th July



Bodleian Student Editions Workshops (Weston Lecture Theatre, Weston Library, Bodleian)

Bodleian Student Editions workshops bring students from across our University together in the Bodleian's Weston Library with items from Special Collections, curatorial, editorial, digital, and research expertise. Through working hands-on with early modern letters, participants are introduced to special collections handling, palaeography, transcription and editorial practices, metadata, and digital text at scale. The letters' transcriptions and metadata are added to Early Modern Letters Online as citable publications.


This session presents the development of the collaboration behind the workshops, shows some of early modern letters that have been transcribed, pedagogical practice of teaching with collections, and reflects on participants' and workshop leaders' response to the workshops.

Speakers: Helen Brown, Chris Fletcher, Miranda Lewis, Olivia Thompson, Mike Webb


Refreshment break (Blackwell Hall, Bodleian)


Foundations of Digital Preservation (Weston Lecture Theatre, Weston Library, Bodleian)


Have you ever lost important research data, or found digital files that have been corrupted and unusable? Have you considered what you would do if you did? How would you stop that loss from occurring in the first place? Preventing loss and mitigating risks to your digital materials is a foundational aspect of digital preservation. In order to protect your files for long-term access and use, early intervention with digital preservation practices is necessary. This introduction to digital preservation session will provide background on the risks to digital materials and the techniques that can help prevent them from happening to you. Digital preservation is not just the responsibility of libraries and archives: researchers also have an important role.

Speaker: John Southall

An Introduction to the International Image Interoperability Framework (Weston Lecture Theatre, Weston Library, Bodleian)

Description TBC

Speaker: Neil Jefferies

LUNCH (Dining Hall in Keble College)

Text Encoding: TEI in a research context (O'Reilly lecture theatre, Keble College)

In this talk we will give an overview of the many uses of the Text Encoding Initiative by looking at a range of projects and the different ways in which they create and publish TEI. We will touch on some technical aspects of TEI, but our main focus will be on TEI in a research context and how it can be used to address a variety of research questions.

Speaker: Huw Jones


Refreshment break (ARCO Building)

16:00 -17:00

Additional sessions (various venues)

Wednesday, 15th July

Reproducible Research in the Humanities


Reproducibility, documenting the process as well as the products of study, is an important part of digital research. Many researchers do not have the confidence or training to use some of the tools available to support reproducible research, or to write their own code for analysis. Writing code to automate a process can be one stage of this, and it then needs to be made available and shareable. Using the publicly available Early English Books Online Text Creation Partnership (EEBO-TCP) corpus, this session teaches participants to write some code that will extract data from the catalogue and create a figure based on that data. Participants will learn how to use tools and techniques to support its reproducibility through version control, licensing practices, and some basic Python coding using a pre-existing script. We will import code libraries, and discover data using the index, and to export the data. The session will include a practical session as well as discussion.

Speaker: Iain Emsley


Refreshment break (ARCO Building)

Linked Data for Digital Humanities: Introducing the Semantic Web

The Semantic Web can be thought of as an extension of the World Wide Web in which sufficient meaning is captured and encoded such that computers can assist in matching, retrieving, and linking resources across the internet that are related to each other. In a scholarly context this offers significant opportunities for publishing, referencing, and re-using digital research output. In this session we introduce the principles and technologies behind this ‘Linked Data’, illustrated through examples from Digital Musicology.


Speaker: Kevin Page

LUNCH (Dining Hall)

Digital Musicology


Digital musicology is a sub-field of digital humanities that applies computational tools to music source studies. The forms that music is expressed in, whether it's symbolic music or audio, means that dedicated techniques must be developed to give scholars the tools to understand and interact with these representations. This talk will provide an overview of the different techniques in use by digital musicologists, demonstrated with an introduction to the tools and projects used by these researchers, including the Digital Image Archive of Medieval Music (DIAMM), the Music Encoding Initiative (MEI), and the Single Interface for Music Score Searching and Analysis (SIMSSA) project.

Speaker: Andrew Hankinson


Refreshment break (ARCO Building)


Additional sessions (various venues)
Thursday, 16th July

The Zooniverse

The Zooniverse ( is the world's largest online platform for 'people-powered' research. Over the last decade it has grown from a single astronomy project to a platform hosting hundreds of different projects in diverse fields such as ecology, biomedical science, and the humanities, with more than 1.9 million registered volunteers. In this session, you will hear about this transformation from project to platform, the growth of Zooniverse humanities projects, and also about how the Zooniverse continues to evolve, incorporating machine learning and using internal research to ensure that projects continue to support research teams and volunteers alike. You will also find out how easy it is to create your very own crowdsourcing project using the Zooniverse Project Builder (

Speaker: Samantha Blickhan


Social Machines

Talk description TBC

Speaker: David De Roure


Refreshment break (ARCO Building)

An Introduction to Relational Databases (to be joined by the Humanities Data group in the O'Reilly lecture theatre)

This session looks at what a relational database is, and when and why it might be helpful to use one. It introduces some basic database concepts, and works through the process of designing one. We also look at some challenges posed by the sort of data often used in humanities projects, and how these might be addressed. Hands-on exercises give participants a chance to put what they’ve learnt into practice.

Speakers: Meriel Patrick and Pamela Stanworth


LUNCH (Dining Hall)



​Machine Learning and Music TBC

Talk description TBC

Speaker: TBC


Refreshment break (ARCO Building)


Additional sessions (various venues)

Friday, 17th July


TBC Talk from the Scultping Digital Cultural Heritage strand

Description TBC

Speaker: TBC


Refreshment break (ARCO Building)

An introduction to computer vision tools for the digital humanities: How to Search, Compare, Classify and Annotate your Images

Computer vision has made rapid progress in recent years: images are now as readily searchable as text is in web search engines. In this presentation, we will introduce software tools that enable researchers to organise and search large collections of images instantaneously - by allowing search queries based on images (such as a building or a book illustration) or categories (such as “gothic-architecture” or “birds”). We will demonstrate how these tools are being used in many projects within humanities disciplines such as art and book history; film studies; archaeology and literature. Attendees will leave the session knowing how to match, differentiate, classify and annotate many kinds of images. Since these tools are open-source, researchers can freely use them for any purpose. Attendees will have the opportunity to book an appointment to get these tools installed on their personal laptop computer, or will be provided with instructions for doing so themselves.

Speakers: Giles Bergel, Ernesto Coto


LUNCH (Dining Hall)

Round up discussion

Questions and thoughts reflecting on the week and the ways ahead for Digital Humanities.

Speaker: David De Roure

Speakers: Edith Halvarsson, Sarah Mason

Refreshment break (ARCO Building)

Closing Keynote (O'Reilly lecture theatre)

Speaker Biographies

David De Roure is Professor of e-Research at the University of Oxford's e-Research Centre. Focused on advancing digital scholarship, David works closely with multiple disciplines including social sciences (studying social machines), humanities (computational musicology and experimental humanities), engineering (Internet of Things), and computer science (large scale distributed systems and social computing). He has extensive experience in hypertext, Web Science, Linked Data, and Internet of Things. Drawing on this broad interdisciplinary background he is a frequent speaker and writer on the future of digital scholarship and scholarly communications.


Professor De Roure is also a Visiting Researcher at the Alan Turing Institute, working at the intersection of data science with libraries and GLAM (Gardens, Libraries and Museums at the University of Oxford), and a Visiting Professor at Goldsmiths, University of London.

Dr Megan Gooch studied Archaeology at the University of Cambridge and History at Durham University, specialising in medieval coins. She has worked at the British Museum and Historic Royal Palaces in curatorial and public engagement roles. She has recently completed leading a major research project, Lest We Forget, which examined the role of the Tower of London as a commemorative site. Megan is the Head of the Centre for Digital Scholarship and Digital Humanities Support at the University of Oxford.

Tom Wood studied physics as his first degree and then got interested in natural language processing. He did a Masters at Cambridge University in Computer Speech, Text and Internet Technology, and since then he has worked in machine learning and AI in various companies, including computer vision and designing dialogue systems (think of Siri), in the UK, Spain and Germany. He has worked as a data scientist for CV-Library, one of the UK's largest job boards, developing machine learning algorithms to parse jobseekers' CVs and make smart job recommendations. He works as a freelance data science consultant via his own company Fast Data Science Ltd (

Andrew Cusworth is an 1851 Research Fellow at the Bodleian Libraries attached to the Prince Albert Digitisation Project. He has held positions at the National Library of Wales, Ceredigion Archives, The University of Exeter Special Collections. His research interests centre around the intersections between digital research, the archive, cultural history and collective memory. He is also active as a musician and composer.

Chris Fletcher is Keeper of Special Collections at the Bodleian Libraries, a member of Oxford’s English faculty and a Fellow of Exeter College. Before coming to Oxford he was a curator of literary manuscripts at the British Library.

Mike Webb is the Curator of Early Modern Archives & Manuscripts, Bodleian Libraries. He has a degree in history and a diploma in Archive Studies, and has a particular interest in the Library’s 17th-century State Paper collections, and letters and diaries 1600-1900. He has curated three exhibitions ranging in subject from the Tudor and Stuart nobility to the First World War. He teaches early modern palaeography to History postgraduates.

Miranda Lewis is the Editor of Early Modern Letters Online [EMLO] and an Associate Member of the Faculty of History at the University of Oxford. With a background in early modern history, art history, and digital scholarship — including ten years on the research project Cultures of Knowledge [CofK] — her own research focusses at present on early modern collections and collecting.

Olivia Thompson is a DPhil candidate in Ancient History at Balliol College, Oxford. Her thesis focuses on changing notions of physical and intellectual property during and after the civil wars of the late Roman Republic. She is more broadly interested in the history of classical scholarship and ways in which digital research tools can be used to reconceptualize ancient sources (in particular, the correspondence of Cicero) and their editorial tradition.

Helen Brown is a DPhil candidate at the University of Oxford, based in the Faculty of English. Her research concerns the application of digital editorial and analytical methods to Alexander Pope’s correspondence. Alongside her studies, Helen is a Digital Editorial Assistant at Oxford University Press, working on projects such as Oxford Scholarly Editions Online and the Very Short Introductions series.

John Southall is Bodleian Data Librarian and Subject Consultant for Economics, Sociology and Social Policy. His role includes work on developing research data management infrastructure and training for researchers, librarians, and support staff.


Neil Jefferies is Head of Innovation for Bodleian Digital Library Systems and Services at Oxford. He is a scientist by training but has been working with internet technologies for nearly 20 years, mostly commercially – his first website was Snickers/Euro'96. He is PI and Community Lead for SWORDV3, a protocol for machine-to-machine transfer of digital objects, a co-author of the Oxford Common File Layout for preservation-oriented object storage and Technical Strategist for "Cultures of Knowledge", an international collaborative project to “reconstruct the correspondence and social networks of the early modern period”. Previously, he was also a co-creator of the International Image Interoperability Framework.

Huw Jones is Head of the Digital Library Unit and Digital Humanities Coordinator at Cambridge University Library, working with researchers, curators, and technical staff to make the Library's special collections accessible online. He has supported and collaborated with a wide range of TEI projects from descriptive catalogues such as the Cambridge Digital Library and Fihrist to digital edtions such as the Newton Project and Darwin Correspondence Project. He has taught TEI in Cambridge University as part of the Cambridge Digital Humanities learning programme.

Yasmin Faghihi is Head of the Near and Middle Eastern Department at Cambridge University Library and Chair of the FIHRIST Board of Directors (on-line union catalogue for manuscripts from the Islamicate world). She has been involved in collaborative work to create standardised practices for manuscript description in TEI. Yasmin teaches targeted use of TEI to new contributors to FIHRIST and coordinates workflow activities for the catalogue. She has been a major contributor to Cambridge Digital Library and taught TEI at workshops in Manchester, Oxford and as part of the Digital Humanities Programme in Cambridge.

Iain Emsley is a PhD student in Digital Media at the University of Sussex. He worked for the Oxford e-Research Centre on various Digital Humanities projects, such as Fusing Audio and Semantic Technologies (FAST) and Workset Creation for Scholarly Analysis (WCSA), and the Square Kilometre Array. His research interests include sustainability and sonification.

Kevin Page is a senior researcher and associate member of faculty at the University of Oxford e-Research Centre, where he applies Linked Data to the Digital Humanities. He is investigator of the AHRC ‘Unlocking Musicology’ project, a co-investigator of ‘Digital Delius’, ‘Mapping Manuscript Migrations’ and ‘Workset Creation for Scholarly Analysis’, and runs the AHRC Linked Art research network. As Technical Director of Oxford Linked Open Data (OXLOD) he works with collections across the Gardens, Libraries, and Museums of the University, and has participated in W3C activities including the Linked Data Platform (LDP) working group. From 2012-15 he convened the Linked Data workshop at DHOxSS, where he now runs the Digital Musicology course.

Andrew Hankinson specialises in large-scale high-resolution digital image delivery systems, computer vision, symbolic music encoding, and non-textual search. He currently sits on the board of the Music Encoding Initiative (MEI) and is technical co-ordinator for the Digital Image Archive of Medieval Music (DIAMM) project. Andrew earned his Masters in Library and Information Studies from McGill University in 2007 and his PhD in Music Information Retrieval, also from McGill, in 2014. He is currently senior software engineer in the Digital Research team at the Bodleian Libraries.

Samantha Blickhan is the Humanities Research Lead for, and is based at the Adler Planetarium in Chicago. She oversees Humanities projects on the Zooniverse platform, consults with project teams, and manages the development of new tools that support Humanities efforts, such as those for collaborative transcription, as well as the newly-launched ALICE (Aggregate Line Inspector and Collaborative Editor) tool for viewing and editing transcribed text.

Giles Bergel is Digital Humanities Research Ambassador in the Visual Geometry Group at the University of Oxford, and Teaching Fellow in Digital Humanities at University College London. As well as computer vision, his interests include text encoding, Linked Data and the study of early printed books.

Ernesto Coto is a Research Software Engineer in the Visual Geometry Group (VGG) at the University of Oxford. He has several years of experience developing software in academic and industry environments. His current research interests are Computer Vision, Machine Learning and Scientific Visualization.

Meriel Patrick is an Academic Research Technology Specialist in the Research Support team at IT Services. Much of her work focuses on helping researchers to work more effectively with data. She is also Lecturer in Theology and Philosophy for Wycliffe Hall's visiting student programme, SCIO.

Pamela Stanworth has over a decade’s experience working on databases with researchers and departments across the University. She brings a pragmatic approach to building projects that are effective, reliable and sustainable. Pamela’s roots are in engineering, with blue-chip industrial companies, technical consultancy and small businesses. Her commitment in teaching and consulting is to enable people to use appropriate technology in their work, efficiently and to a high standard.



  • Black Twitter Icon

© 2020 University of Oxford