Humanities Data: Case Studies and Approaches

A guided tour of the humanities research data landscape

Convenors: John Southall, Rowan Wilson, Meriel Patrick, David Tomkins

Hashtag: #HumDat and #DHOxSS20

Computers: Please bring your own laptop (no tablets please) where you have admin privileges

Abstract

This strand introduces a variety of approaches to dealing with humanities data. It covers modelling, structuring, and working with data, plus longer-term curation and preservation. Data types discussed include textual, tabular, image-based and time-based media. The goal is to equip researchers to select solutions that will work for them.

 

Intended outcomes

 

Participants will be given the chance to learn about a range of methods for working with humanities data, with a view to enabling them to select the most fruitful approach for their own research project. They will hear from presenters experienced in working with these methods, and be given the opportunity to try some of them for themselves via practical exercises. They will be provided with an overview of key issues that need to be considered during data-driven humanities research, and pointed towards relevant resources. Participants will also be encouraged to think about the life of their data after the end of their project, and to explore some of the options for preservation and sharing.

Experience necessary

No prior technical knowledge necessary.

Computer and software requirements

Please bring your own laptop (no tablets please) where you have admin privileges.

 

Convenor biographies

 

John Southall is Bodleian Data Librarian and Subject Consultant for Economics, Sociology and Social Policy. His role includes work on developing research data management infrastructure and training for researchers, librarians, and support staff.

 

Rowan Wilson is Team Leader for the Research Support team at IT Services. He has been involved in supporting humanities researchers for many years, having previously worked for the Oxford Text Archive. His specialisms include IPR, licensing, free and open source software, and data protection and security.

 

David Tomkins is is Curator of Digital Research Data at the Bodleian Libraries in Oxford and manages ORA-Data, the University’s institutional repository for research data. He has led a number of high-profile digitization, content creation and crowd-sourcing projects for the Bodleian, including Queen Victoria’s Journals, What’s the Score?, Mapping Crime and Electronic Ephemera, having previously undertaken similar roles at the Victoria & Albert Museum and the Institute of Historical Research.

 

Meriel Patrick is is an Academic Research Technology Specialist in the Research Support team at IT Services. Much of her work focuses on helping researchers to work more effectively with data. She is also Lecturer in Theology and Philosophy for Wycliffe Hall's visiting student programme, SCIO.

 

"The content during the workshops, lectures and keynote presentations was quite inspirational for my future research. The organisation of the summer school as a whole was fantastic. So, thank you very much for a really fun and rewarding week!"

DHOxSS 2018 participant

TIMETABLE

 
 
 
 
 
 
Monday, 13th July
08:00 - 09:00

Registration (Sloane Robinson building)
Tea and coffee (ARCO building)
09:00-10:00

Opening Keynote (O'Reilly lecture theatre)

10:00-10:30

Refreshment break (ARCO building)

10:30-12:00

Introduction: A critical review of humanities data approaches

After an introduction to what’s coming up during the rest of the week, Neil Jefferies will provide a critical review of humanities data approaches. Choosing the correct approach for your data can have a significant impact on the success, or otherwise, of your research, and this talk will encourage you to critically evaluate all standards and practices much as you would evaluate your scholarly sources.

 

Speakers: Introductions by John Southall, Rowan Wilson, Meriel Patrick, David Tomkins, with Neil Jefferies presenting a critical review 

12:00-13:30

Lunch (Dining Hall)
13:30-14:30

Working with texts 
 

Description TBC

Speakers: Ylva Berglund Prytz

14:30-15:30
Working with images: Introduction to the International Image Interoperability Framework (IIIF) 
 
Description TBC
 

Speaker: Andrew Hankinson, Bodleian Libraries, University of Oxford

 
15:30-16:00

 

Refreshment break (ARCO building)
16:00-17:00

Working with time-based media

Description TBC

Speaker: TBC

Tuesday, 14th July
09:00-10:30
 
Taylor Digital editions

This session will give an overview of the Digital Editions course taught to students at the Taylor Institution Library. It demonstrates how to create, store, preserve and publish your digital objects for free!

Speaker: Emma Huber and TBC Frank Egerton

10.30-11:00
 
Refreshment break (ARCO building)

 

11:00-13:00
Preparing your data for the future

This workshop will delve into three interlocking topics: preservation, reproducibility, and metadata. A lot of hard work and skill goes into creating or collating a research dataset, so it's important to plan from an early stage how it can be preserved for the long term, and how its continued usefulness can be ensured. In many cases, it will be desirable to make the data available for other researchers to use (which also allows the dataset creator to get credit for their hard work via citations). It's also important that it's clear how research conclusions were reached: making the process transparent and reproducible is a key part of research integrity. Good metadata and documentation - the contextual information needed for proper interpretation of the data - is foundational to both preservation and reproducibility. We'll look at some of the things you need to consider while working on your data, and introduce some techniques and resources that can help.

 

Speakers: Members of the Research Data Oxford team

13:00-14:30
Lunch (Dining Hall)
14:30-15:30
Networks in data
Description TBC
Speaker: Laurence Brown

15:30-16:00
Refreshment break (ARCO building)

 

16:00 - 17:00
 
Lectures (various venues)
Wednesday, 15th July

09:00-10:30  
 
Framing digital objects with context and provenance (1)

Cultural and historical objects derive a lot of their meaning and interpretation from the contexts in which they are created and subsequently experienced. When digital surrogates or born-digital artefacts are created, it is important that this contextual information is also represented in the digital domain. This talk will explore the nature of context and provenance (which can be seen as a historical series of contexts), and consider how they might be modelled digitally.

Speaker: Neil Jefferies 

10.30 -11:00
 
Refreshment break (ARCO building)
11:00-13:00

Framing digital objects with context and provenance (2) ​

Continuation of earlier session

Speakers: Neil Jefferies

13:00-14:30
Lunch (Dining Hall)
14:30 - 15:30  

 

More to explore

The programme so far has introduced a range of methods for working with humanities data, but this is only a small fraction of what's out there. This session provides a showcase of some other tools and techniques, with an overview of what they might be used for, and some pointers for exploring them further after the summer school.

Speakers: Members of the Research Data Oxford team

15:30-16:00

Refreshment break (ARCO building)
16:00-17:00

 

Lectures (various venues)

Thursday, 16th July

 

09:00-10:30
Introduction to OpenRefine
 

OpenRefine is a powerful tool for working with messy data. It can be used for cleaning data, for transforming it from one format to another, and more. It can help make the process of editing data swifter and more straightforward, aiding with tasks that would take many weeks of work to accomplish manually. Moreover, it's free and open source! This workshop will provide a hands-on introduction to the tool, and demonstrate some of the ways it can be used to work with humanities data.

Speakers: Members of the Research Data Oxford team

 

10.30-11:00
 
Refreshment break (ARCO building)

11:00-13:00

Introduction to relational databases (to join the Introduction to Digital Humanities group in the O'Reilly lecture theatre)

This session looks at what a relational database is, and when and why it might be helpful to use one. It introduces some basic database concepts, and works through the process of designing one. We also look at some challenges posed by the sort of data often used in humanities projects, and how these might be addressed. Hands-on exercises give participants a chance to put what they’ve learnt into practice.

 

Speakers: Meriel Patrick and Pamela Stanworth

13:00-14:30

Lunch (Dining Hall)
14:30 - 15:30
Computer vision
Computer vision has made significant progress in recent years, thanks in part to developments in machine learning (or ‘AI’), and is now an eminently practical tool for the humanist. The University of Oxford’s Visual Geometry Group (VGG) has been a pioneer in computer vision, and maintains a suite of free and open-source software tools, many developed in collaboration with humanists. This session will introduce VGG's tools and the wider techniques behind them; show how digital humanities projects are making use of computer vision; and outline some of the critical and ethical perspectives that humanists bring to the field. It will include time for hands-on exploration of some of the methods and tools, allowing participants to leave with a knowledge of how to make their own images searchable.
 
Speaker: Giles Bergel
15:30-16:00
Refreshment break (ARCO building)

 

16:00-17:00
 
Lectures (various venues)

Friday, 17th July

09:00-10:30
Data visualisation
 

Taking examples from the work of Oxford's Interactive Data Network and elsewhere, this session will present examples of good practice in the visualisation of humanities data, and how it can both increase publication impact and facilitate the creation of new knowledge. There will also be an opportunity to try out some of what's covered, with hands-on practical exercises.

Speaker: Charlie Hadley

10.30-11:00
 
Refreshment break (ARCO building)
11:00-13:00
Data Visualisation

TBC continuation of previous session

 

Speaker: Charlie Hadley

 
13:00-14:30

Lunch (Dining Hall)
14:30-15:30

Wrap-up session

An opportunity for an informal discussion about key issues in the world of humanities data and talking points arising from the week’s presentations. What questions do those working with humanities data need to consider, and what special challenges (and opportunities) do humanities researchers face? How can data-driven humanities research best be harnessed to produce good scholarship? There will also be an opportunity to share details of the work you're looking to undertake with humanities data, to think about your next steps, and to tap into the knowledge and experience of your fellow delegates.

 

Speakers: Members of the Research Data Oxford team

15.30-16:00
 
Refreshment break (ARCO building)

16:00-17:00
 
Closing keynote (O'Reilly lecture theatre)
Speaker biographies

Emma Huber, Subject Librarian for German, has worked for several large digitisation projects. Her latest role, before switching careers to academic librarianship, was to lead two work packages for the European IMPACT (Improving Access to Text) Project, disseminating best practice in digitisation with partner institutions including the Bayerische Staatsbibliothek, the Koninklijke Bibliotheek, the British Library, the Bibliothèque Nationale de France and Biblioteca Nacional de España.

 

Frank Egerton, Sackler-Taylor Operations Manager, is a member of the TORCH Digital Humanities Steering Group and the Bodleian Research Data Management Group. He is a core course tutor on the MSt in Creative Writing and assessor for Creative Writing on the Certificate of Higher Education programme. In 2016 he was a co-investigator on an Oxford e-Research Centre visual analytics project on textual shape. He is a member of common room at Kellogg College.

Neil Jefferies is Head of Innovation for Bodleian Digital Library Systems and Services at Oxford. He is a scientist by training but has been working with internet technologies for nearly 20 years, mostly commercially – his first website was Snickers/Euro'96. He is PI and Community Lead for SWORDV3, a protocol for machine-to-machine transfer of digital objects, a co-author of the Oxford Common File Layout for preservation-oriented object storage and Technical Strategist for "Cultures of Knowledge", an international collaborative project to “reconstruct the correspondence and social networks of the early modern period”. Previously, he was also a co-creator of the International Image Interoperability Framework.

Ylva Berglund Prytz

Andrew Hankinson specialises in large-scale high-resolution digital image delivery systems, computer vision, symbolic music encoding, and non-textual search. He currently sits on the board of the Music Encoding Initiative (MEI) and is technical co-ordinator for the Digital Image Archive of Medieval Music (DIAMM) project. Andrew earned his Masters in Library and Information Studies from McGill University in 2007 and his PhD in Music Information Retrieval, also from McGill, in 2014. He is currently senior software engineer in the Digital Research team at the Bodleian Libraries.

Pamela Stanworth has over a decade’s experience working on databases with researchers and departments across the University. She brings a pragmatic approach to building projects that are effective, reliable and sustainable. Pamela’s roots are in engineering, with blue-chip industrial companies, technical consultancy and small businesses. Her commitment in teaching and consulting is to enable people to use appropriate technology in their work, efficiently and to a high standard.

Giles Bergel is Digital Humanities Research Ambassador in the Visual Geometry Group at the University of Oxford, and Teaching Fellow in Digital Humanities at University College London. As well as computer vision, his interests include text encoding, Linked Data and the study of early printed books.

Charlie Hadley

  • Black Twitter Icon

© 2020 University of Oxford