Bursary Reports 2019
Ellen Roberts, University of Birmingham
Workshop: Applied Data Analysis
As a Masters student, hoping to pursue a PhD in the near future, my week at the Oxford Digital Humanities Summer School 2019 was an invaluable experience - and one I would recommend to anyone looking to further their knowledge of the Digital Humanities (DH). I am so grateful to be a bursary holder and to have had the opportunity to attend the Applied Data Analysis workshop. The summer school was well organised, and an enjoyable week of interesting and inspiring content delivered by some of the top researchers in the field.
The Applied Data Analysis workshop was an advanced course on both Data Science and Python, run by Giovanni Colavizza and Matteo Romanello. I applied to this course in order to improve my understanding of Python and how it could be used efficiently in data science tasks. Before attending, I had really struggled with the concepts of tidy data and data modelling. In particular, I had issues with the Python package of Pandas - which has comprehensive, yet overwhelming documentation. As a result, I had resorted to using another coding language to wrangle and tidy my data, which was both time-consuming and inconvenient. However, after attending the sessions dedicated to Pandas on the Applied Data Analysis course, I now have a greater understanding of this powerful Python package and others.
I believe I now feel confident enough to use them in my future research, thanks to DHOxSS. These 'lightbulb' moments were a common occurrence for me throughout the week - the combination of practical/follow-along tutorials and presentations given by both Matteo and Giovanni made many data analysis concepts so much clearer. They are both amazing teachers - especially their patience, clarity, and extraordinary wealth of knowledge they have and shared with us over the course of the week. I'd like to thank them again for their thought-provoking workshops which made these (seemingly) abstract concepts accessible and understandable, in an impressively short space of time.
It is amazing how much we covered in a (exceptionally hot!) week. The workshops were well structured, and considered both the practical and theoretical aspects of data analysis. The hands-on opportunities were an enjoyable part of the week, with access to different types of datasets, such as: Elon Musk's tweets, Venetian apprenticeship contracts, and an African-American movie dataset. We were encouraged to ask our own questions of the data available, and to explore the datasets in the afternoon sessions through the use of Python and Unix shell commands. Alternatively, many members of the group used this time to apply the code and concepts we had learnt in the morning workshops to their own data with the help of Giovanni and Matteo.
For me, the most interesting part of the Applied Data Analysis workshop were the conceptual aspects of data science, especially the theoretical considerations relating to the production and use of data frames. Learning to consider dataframes as a series of variables (columns) and observations (rows), changed my perspective on using data and how arranging data in this manner can aid interpretation. These concepts also encourage researchers to ask what their data actually consists of, and how this may relate to other data fields/variables. I had not previously considered these aspects of data science before attending DHOxSS, but this has revolutionised how I now perceive my data, and I will continue to use these ideas going forward.
Overall, the Applied Data Analysis workshop had a steep and fast-paced learning curve, but a very enjoyable and incredibly useful one! I would recommend it to anyone who is familiar with Python, yet wants to gain a broader understanding of data analysis, and the life-cycle of data in research (collecting, tidying, modelling, visualising, and communicating data).
Alongside the Python skill-based presentation-style and practical sessions in the Applied Data Analysis strand, we were able to attend lectures on the Text to Tech workshop during the afternoons. This was a great part of the structure and organisation of the Applied Data Analysis workshop, as it enabled us to hear from guest speakers and see how the Python skills we were learning could be (or have been) used in ‘real world’ DH projects. I particularly enjoyed Gard Janset’s presentation on how word-embeddings and machine learning techniques can be used to tackle grammatical prediction tasks: his team worked on training a model to predict the alternation in the use of the dative in English. It was inspiring to see how these algorithms can be applied to linguistic problems, and used to predict the presence or absence of a phenomenon based on training. The success of the model has made me wonder if a similar approach could be used in my own future research into Early Modern genre - something that I look forward to exploring further.
Another highlight of the Text to Tech lectures, was James McCracken's talk on the Oxford English Dictionary and its Application Programming Interface (API). James demonstrated how the additional 'metadata' and lexicographic information for words in the OED can be used to enrich lexical explorations. Although I am familiar with the OED API (it is how I collected data on Milton in my Masters project), there are so many new features which James demonstrated in his talk. One particularly impressive example demonstrated how the spelling variant data in the OED could be used to ‘translate’ an archaic poem into present day English. This has the potential to be another tool that could be useful in future research for standardisation tasks as an alternative or complementary tool to VARD.
Thank you to Barbara McGillivray, Gard Janset, Giovanni Colavizza and Matteo Romanello for organising this seamless crossover of the two strands and broadening the content of the course, meaning that we were able to get the benefit of two workshops on a single course!
Outside of the Applied Data Analysis course, I greatly enjoyed the two keynote lectures which bookended the week. Barbara McGillivray gave an inspiring keynote on blending the digital and computational with the humanities, which was a perfect start to the week with some of the key issues within DH currently outlined on, and a (possible?!) record for the mention of the issue of genre in under an hour! Similarly, Marieke van Erp's lecture proposing that Digital Humanities can be the 'sweet connection' between university departments, was a positive way of ending the week by looking ahead to the exciting future of DH.