Humanities Data: Case Studies and Approaches

"A guided tour of the humanities research data landscape"

Convenor: John Southall, Rowan Wilson, Meriel Patrick, David Tomkins

Hashtag: #HumDat and #DHOxSS2019

Computers: Please bring your own laptop (no tablets please) where you have admin privileges

Abstract

This strand introduces a variety of approaches to dealing with humanities data. It covers modelling, structuring, and working with data, plus longer-term curation and preservation. Data types discussed include textual, tabular, image-based and time-based media. The goal is to equip researchers to select solutions that will work for them.

 

Experience necessary

No prior technical knowledge necessary.

Intended outcomes

 

Participants will be given the chance to learn about a range of methods for working with humanities data, with a view to enabling them to select the most fruitful approach for their own research project. They will hear from presenters experienced in working with these methods, and be given the opportunity to try some of them for themselves via practical exercises. They will be provided with an overview of key issues that need to be considered during data-driven humanities research, and pointed towards relevant resources. Participants will also be encouraged to think about the life of their data after the end of their project, and to explore some of the options for preservation and sharing.

 

Convenor biographies

 

John Southall is Bodleian Data Librarian and Subject Consultant for Economics, Sociology and Social Policy. His role includes work on developing research data management infrastructure and training for researchers, librarians, and support staff.

 

Rowan Wilson is Team Leader for the Research Support team at IT Services. He has been involved in supporting humanities researchers for many years, having previously worked for the Oxford Text Archive. His specialisms include IPR, licensing, free and open source software, and data protection and security.

 

David Tomkins is is Curator of Digital Research Data at the Bodleian Libraries in Oxford and manages ORA-Data, the University’s institutional repository for research data. He has led a number of high-profile digitization, content creation and crowd-sourcing projects for the Bodleian, including Queen Victoria’s Journals, What’s the Score?, Mapping Crime and Electronic Ephemera, having previously undertaken similar roles at the Victoria & Albert Museum and the Institute of Historical Research.

 

Meriel Patrick is is an Academic Research Technology Specialist in the Research Support team at IT Services. Much of her work focuses on helping researchers to work more effectively with data. She is also Lecturer in Theology and Philosophy for Wycliffe Hall's visiting student programme, SCIO.

 

"The content during the workshops, lectures and keynote presentations was quite inspirational for my future research. The organisation of the summer school as a whole was fantastic. So, thank you very much for a really fun and rewarding week!"

DHOxSS 2018 participant

TIMETABLE

 
 
 
 
 
 
Link to overview of the week's timetable including evening events.
Monday 22nd July
08:00 - 09:00

Registration (Sloane Robinson building)
Tea and coffee (ARCO building)
09:00-10:00

Opening Keynote (Sloan Robinson O'Reilly lecture theatre)

10:00-10:30

Refreshment break (ARCO building)

10:30-12:00

A critical review of humanities data approaches

After an outline of what’s coming up during the week, and a brief look at some key issues in the world of humanities data, Neil Jefferies will provide a critical review of humanities data approaches. Choosing the correct approach for your data can have a significant impact on the success, or otherwise, of your research, and this talk will encourage you to critically evaluate all standards and practices much as you would evaluate your scholarly sources.

 

Speaker: Neil Jefferies, with an introduction from Rowan Wilson

12:00-13:30

Lunch (Dining Hall)
13:30-15:30

Art making, digital curation and real-world value
Musicking and musicological data

 

This talk provides an overview of a current qualitative interdisciplinary research project that identifies the seeking, creation, management and use of digital objects as a critic element of contemporary art practice, and explores the relationship between digital curation - the active management of digital files over time - and the sustainability of contemporary visual art careers. The second part then looks at the challenges of working with musicological data. Current and future directions in digital musicology, including representations of musical documents and applications of Linked Data to music research are outlined.

Speaker: Laura Molloy and Daniel Bangert

 

15:30-16:00

 

Refreshment break (ARCO building)
16:00-17:00

Data IPR and regulation

Personal data protection, database rights, confidentiality, PREVENT/anti-radicalisation responsibilities and copyright; there are a range of considerations around the general activities of data gathering and compliance. This talk will seek to discuss how to be compliant without it becoming a chore.

 

Speaker: Rowan Wilson

Tuesday 23rd July
09:00-10:30
 
Framing digital objects with context and provenance (1)

Cultural and historical objects derive a lot of their meaning and interpretation from the contexts in which they are created and subsequently experienced. When digital surrogates or born-digital artefacts are created, it is important that this contextual information is also represented in the digital domain. This talk will explore the nature of context and provenance (which can be seen as a historical series of contexts), and consider how they might be modelled digitally.

Speaker: Neil Jefferies

10.30-11:00
 
Refreshment break (ARCO building)

 

11:00-13:00
Framing digital objects with context and provenance (2)
Continuation of earlier session
 
Speaker: Neil Jefferies

13:00-14:30
Lunch (Dining Hall)
14:30-15:30
From project to preservation: institutional data repositories
What happens to your data when your project is complete? This session provides an overview of archiving and data management from the perspective of institutional repositories.
Speaker: David Tomkins

15:30-16:00
Refreshment break (ARCO building)

 

16:00 - 17:00
 
Lectures (various venues)
Wednesday 24th July

09:00-10:30  
 
Keeping your digital data safe from harm, forever(?)

The original sense of the verb “to preserve” was “to keep safe from harm” – to anticipate the dangers and potential threats ahead, and put in place strategies and counter-measures to reduce any risks. To successfully address the many threats to digital data requires taking active steps at each stage in the data management lifecycle, doing much more than simply ensuring you backup regularly or archive the end-product of your research. But such steps need not be onerous or costly, and in this session we will outline some tried-and-tested workflows and tools which can significantly improve the chances that your data will remain accessible, uncorrupted, and usable for future generations.

Speaker: Michael Popham & Polonsky Digital Preservation Fellows   

10.30 -11:00
 
Refreshment break (ARCO building)
11:00-13:00

Introduction to Relational Databases

 

This session looks at what a relational database is, and when and why it might be helpful to use one. It introduces some basic database concepts, and works through the process of designing one. We also look at some challenges posed by the sort of data often used in humanities projects, and how these might be addressed. Hands-on exercises give participants a chance to put what they’ve learnt into practice.

Speakers: Meriel Patrick and Duncan Young

13:00-14:30
Lunch (Dining Hall)
14:30 - 15:30  

 

Introducing the International Image Interoperability Framework (IIIF)

This session will provide an overview of the community-driven standards and software of the International Image Interoperability Framework (IIIF). After an introduction to the underlying technology, we will look at implementations of several IIIF-based tools for comparing, annotating and remixing digitized images, in order to understand the potential of and challenges facing IIIF as a humanities research tool.

Speaker: Emma Stanford

15:30-16:00

Refreshment break (ARCO building)
16:00-17:00

 

Lectures (various venues)

Thursday 25th July

 

09:00-10:30
Taylor Digital editions
 

This session will give an overview of the Digital Editions course taught to students at the Taylor Institution Library. It demonstrates how to create, store, preserve and publish your digital objects for free!

Speaker: Emma Huber and Frank Egerton

 

10.30-11:00
 
Refreshment break (ARCO building)

11:00-13:00

Reproducibility and Humanities data

What do we mean when we talk about 'reproducibility' in the context of the Digital Humanities? It can seem like an alien concept, but taking reference points from the more familiar spheres of provenance, collaborative resource development and software iteration we can begin to answer the question.

 

Speaker: Maja Zaloznik

13:00-14:30

Lunch (Dining Hall)
14:30 - 15:30
Corpus linguistics
Language is the principal means by which we communicate, and it permeates all disciplines in the humanities and social sciences. An ability to explore and analyse language is a vital part of our understanding our cultural heritage and our current discourses. Language is the object of study in linguistics and related disciplines, and the sub-field of Corpus Linguistics is all about the data-driven exploration of language. This session will examine how the methods, tools and datasets of Corpus Linguistics can be of use to search, explore and interpret books, letters, speech, and other forms of discourse in digital form.
 
Speaker: Martin Wynne
15:30-16:00
Refreshment break (ARCO building)

 

16:00-17:00
 
Lectures (various venues)

Friday 26th July

09:00-10:30
The Time is Now – Challenges and Solutions to Making Time-Based Media Digitally Accessible
 

This session looks at the challenges of providing digital access to time-based media (video, audio, film). The audience will learn about why media preservation can’t wait, collection processing and digital preservation principles, become familiar with available open source tools, and review case studies of notable media digitization and access initiatives.

Speaker: Carla Arton

10.30-11:00
 
Refreshment break (ARCO building)
11:00-13:00
Data Visualisation

Taking examples from the work of Oxford's Interactive Data Network and elsewhere, this talk will present examples of good practice in the visualisation of Humanities data and how it can both increase publication impact and facilitate the creation of new knowledge.

 

Speaker: Martin Hadley

 
13:00-14:30

Lunch (Dining Hall)
14:30-15:30

Wrap-up session

An opportunity for an informal discussion about key issues in the world of humanities data and talking points arising from the week’s presentations.  What questions do those working with humanities data need to consider, and what special challenges (and opportunities) do humanities researchers face? How can data-driven humanities research best be harnessed to produce good scholarship?  There will also be an opportunity for attendees to share details of work they are looking to undertake with humanities data, and possibly to harness the knowledge and experience of their fellow delegates.

 

Convenors: Meriel Patrick, John Southall and David Tomkins

15.30-16:00
 
Refreshment break (ARCO building)

16:00-17:00
 
Closing plenary (O'Reilly lecture theatre)
Speaker biographies
Laura Molloy is an artist and researcher. She is currently completing an interdisciplinary doctoral project with the Oxford Internet Institute and the Ruskin School of Art, both University of Oxford, and artists’ advocacy organisation DACS. Her research interests focus on visual art making practices, research data creation and management, ethical practices in research, qualitative enquiry, digital curation, digital preservation and advocacy for sustainable digital object handling skills beyond the lab sciences.
Daniel Bangert is a librarian and musicologist. He is currently a Scientific Manager at the Göttingen State and University Library, working on European projects related to open science, including the Research Data Alliance Europe. His research interests include research data management, scholarly communication and digital musicology.

Michael Popham is Head of Digital Collections and Preservation at the Bodleian Libraries, University of Oxford. He has a long-standing interest in the creation and preservation of digital data having previously worked on some of the Bodleian’s mass digitization projects, and as the Head of the Oxford Text Archive. Michael recently managed the team of Polonsky Felllows (www.dpoc.ac.uk), dedicated experts who were funded for 2.5 years to review and enhance the digital preservation capabilities of the Bodleian Libraries.

Emma Stanford is the Bodleian Libraries’ Digital Curator. She manages the digitization of new and legacy image content via Digital Bodleian, conducts training and outreach, and writes occasionally about digitization policy and public engagement.

 

Emma Huber, Subject Librarian for German, has worked for several large digitisation projects. Her latest role, before switching careers to academic librarianship, was to lead two work packages for the European IMPACT (Improving Access to Text) Project, disseminating best practice in digitisation with partner institutions including the Bayerische Staatsbibliothek, the Koninklijke Bibliotheek, the British Library, the Bibliothèque Nationale de France and Biblioteca Nacional de España.

 

Frank Egerton, Sackler-Taylor Operations Manager, is a member of the TORCH Digital Humanities Steering Group and the Bodleian Research Data Management Group. He is a core course tutor on the MSt in Creative Writing and assessor for Creative Writing on the Certificate of Higher Education programme. In 2016 he was a co-investigator on an Oxford e-Research Centre visual analytics project on textual shape. He is a member of common room at Kellogg College.

Carla Arton is a Digital Project Manager at the Bodleian Libraries. She has over a decade experience in audiovisual preservation, digitization and access, holding positions at the Library of Congress' Motion Picture, Broadcasting, and Recorded Sound Division, the Wende Museum of the Cold War, and Chace Audio by Deluxe. Before coming to Oxford, Carla was Director of Technical Operations for the second phase of Indiana University’s Media Digitization and Preservation Initiative where she managed all aspects of technical operations, quality control, and post-production to make 30,000 films (24PB) accessible for teaching and research. 

Neil Jefferies is Head of Innovation for Bodleian Digital Library Systems and Services at Oxford. He is a scientist by training but has been working with internet technologies for nearly 20 years, mostly commercially – his first website was Snickers/Euro'96. He is PI and Community Lead for SWORDV3, a protocol for machine-to-machine transfer of digital objects, a co-author of the Oxford Common File Layout for preservation-oriented object storage and Technical Strategist for "Cultures of Knowledge", an international collaborative project to “reconstruct the correspondence and social networks of the early modern period”. Previously, he was also a co-creator of the International Image Interoperability Framework.

 

Rowan Wilson is Team Leader for the Research Support team at IT Services. He has been involved in supporting humanities researchers for many years, having previously worked for the Oxford Text Archive. His specialisms include IPR, licensing, free and open source software, and data protection and security.

Maja Zaloznik is a demographer and affiliate at the Oxford Institute of Population Ageing. She recently completed a post doc at the institute where she worked on a project exploring the implications for food production of the ageing agricultural sector in Vietnam, as part of the Oxford Martin School Future of Food Programme. Over the years she has analysed datasets ranging from census data to specialised surveys in the fields of demography and health at both local and global scales. She has a particular penchant for visualising data for both exploratory analysis purposes as well as communicating results, where interactive visualisations can be particularly useful, all the while ensuring a reproducible workflow and open access to the data and analysis scripts.

Martin Wynne has been involved in the digital humanities for thirty years, since starting to study on a Masters course in 'Lingusitics and Information Processing' in 1989. He has worked in a number of universities and research institutions carrying out research and teaching in corpus lingusitics. For more than twenty years he has been involved in the curation of language corpora and other literary and linguistic datasets, and runs the Oxford Text Archive, based in the Bodleian Libraries. Martin is one of the founding members of CLARIN, a European research infrastructure which makes digital language resources and tools available to researchers across the social sciences and humanities.

Martin Hadley is a technology and data science consultant, who has worked in both industry and academia to promote the use of interactive technologies for education and sharing of knowledge. Currently, Martin is the tech lead for University of Oxford’s Interactive Data Network (idn.it.ox.ac.uk) which supports researchers in creating interactive data viz of open access datasets. Outside of the University, Martin runs a data science consultancy called Visible Data which specialises in reproducible research practices using R.

 

David Tomkins is Curator of Digital Research Data at the Bodleian Libraries in Oxford and manages ORA-Data, the University’s institutional repository for research data. He has led a number of high-profile digitization, content creation and crowd-sourcing projects for the Bodleian, including Queen Victoria’s Journals, What’s the Score?, Mapping Crime and Electronic Ephemera, having previously undertaken similar roles at the Victoria & Albert Museum and the Institute of Historical Research. David is co-author of Illustrating Empire: a visual history of British imperialism, and has also written book chapters, articles, and an online course for the Oxford University Department for Continuing Education.

 

Meriel Patrick is an Academic Research Technology Specialist in the Research Support team at IT Services. Much of her work focuses on helping researchers to work more effectively with data. She is also Lecturer in Theology and Philosophy for Wycliffe Hall's visiting student programme, SCIO.

Duncan Young is a teacher in the IT Learning Centre, providing courses and consultations in software skills aimed at both staff and researchers across all University disciplines. Since his debut with Excel 4 Introduction in October 1993 he was worked for a number of organisations including Sophos and Microsoft where he was a co-author of their internal Office XP training programme that he then toured around their European offices. Since 2006 he was worked at the University of Oxford and specialises in spreadsheets, databases and programming.

  • Black Twitter Icon

© 2019 University of Oxford