Schedule

General pattern

Each meeting, we will begin at 9 to allow for childcare, commuting, and other arrangements. You are welcome to arrive at 8.30 as per the official schedule, but I will not begin until 9. Consequently, I will also not be stopping for the remainder of the scheduled time. Feel free to bring croissants, coffee, toast, etc. Scrambled eggs are to be avoided as they are messy.

In certain weeks, a designated person will take us through their engagement with a particular Programming Historian tutorial, as indicated below. There will be readings to support this engagement; I will discuss these individually with you once we divy up the work. Some will highlight uses of the approach, or perhaps issues with the approach, or could’ve usefully been improved by the approach… or… or… or. I will expect you to also clearly articulate connections in other research you’ve done, read, or courses you’re taking/have taken (this alone is an important habit to cultivate.) You can also bring digital history projects in the wild into the discussion.

(nb Even if it’s not your week to present, it will be a richer experience if you’ve given the tutorial a shot as well.)

The remaining time will run along the lines of a mini unconference. That is, I expect you to have a sense before class of things you want to work on/discuss/collaborate on. As thatcamp.org says:

at an unconference, the program isn’t set beforehand: it’s created on the first day with the help of all the participants rather than beforehand by a program committee. Second, at an unconference, there are no presentations — all participants in an unconference are expected to talk and work with fellow participants in every session. An unconference is to a conference what a seminar is to a lecture; going to an unconference is like being a member of an improv troupe whereas going to a conference is (mostly) like being a member of an audience.

I’ve had far too many seminars that felt like dreadful dreadful conferences. So, let’s give this a try. One thing that I think I would like you to discuss every session: how does this particular tutorial move us closer to the final project goal? What could we do with this? How can we open this thing up even more?

These sessions will be opportunities for the more techy to help the less, for the more theoretically inclined to help the more methodologically inclined. I will say this though:

doing embodies theories of knowing and how you do things reveals what you know. So know what you’re doing.

This schedule will be updated with who-will-lead-or-do-what-when after our first meeting. The schedule/load can/might be adjusted depending on enrollment.

Background Context on Digital History in Canada

At some point before the Fall Break, please read

Gaffield, Chad. ‘Clio and Computers in Canada and Beyond: Contested Past, Promising Present, Uncertain Future’. The Canadian Historical Review, vol. 101, no. 4, 2020, pp. 559–84. link

Kim Martin. ‘Clio, Rewired: Propositions for the Future of Digital History Pedagogy in Canada’ The Canadian Historical Review, vol. 101, no. 4, 2020, pp. 622-639. link

Meeting 1: Getting Started

To install:

Obsidian OR Tangent; if in doubt, the learning curve for Tangent is much gentler. If this is your first exposure to these kinds of note-taking applications, I’d suggest Tangent

Zotero for research management (bibliographies, citations, pdf annotations, and note making)

Tropy for research management of photographic materials (whether your own photos or other kinds of imagery)

We should probably set up github accounts too. Do not pay for anything. Nothing I ask you to do here should involve paying for an account or access. If you find yourself at any point this term being asked for a credit card, stop and talk to me.

We will spend a bit of time setting up your own personal research management environment and talking about this in general; this isn’t so much a part of ‘digital history’ as ‘strategies to keep you sane.’

To do:

Once we get set up, let’s do -

Some command line shennanigans: Terminus; Command Line Murders.

Simpkin, Sarah. ‘Getting Started with Markdown’. Programming Historian, Nov. 2015. programminghistorian.org, link.
Tenen, Dennis, and Grant Wythoff. ‘Sustainable Authorship in Plain Text Using Pandoc and Markdown’. Programming Historian, Mar. 2014. programminghistorian.org, link.

To mull:

Baker, James. ‘Preserving Your Research Data’. Programming Historian, Apr. 2014. programminghistorian.org, link
Heppler, Jason A. ‘How I Use Obsidian Jason A. Heppler’. Jason Heppler Weblog, July 2024. jasonheppler.org, link.

Meeting 2: Digital History in the Wild

We’ll consider some examples of what digital history looks like, and the kinds of questions it might ask (or permit the asking). We’ll look ahead on the syllabus and divvy up the various tutorials.

Before coming to class read:

Catherine D’Ignazio and Lauren F. Klein, “What Gets Counted Counts” and “The Numbers Don’t Speak for Themselves,” Data Feminism (2020), https://data-feminism.mitpress.mit.edu/.
Ryan Cordell, ‘Q i-jtb the Raven’: Taking Dirty OCR Seriously’ Book History 20 (2017), 188-225 (see also link)

To do

Then, select two pieces from Current Research in Digital History. Read the piece. See if you can find the underlying data. Does the project make the data available? Is there a public facing website for the larger project? Is there an intersection with Public History? Be prepared to talk about how the underlying data for these projects are presented, discussed, curated, and explored. List any tools/techniques that are new to you. What can you find out about how to use such a tool? Part of the challenge of doing digital history lies in what might be called ‘Dependency Hell’ - to use this, you need that; to do that you need this other thing… and so on. But we can start to map this out…

What comes next

Take a look at your assigned Programming Historian tutorial. Pay attention to the requirements, and try to work out what assumptions the tutorial’s author has made about your previous experience and what you need to know to be successful with the tutorial. What are the known unknowns (as it were) for the method?

Meeting 3. Quick Static Websites

Being able to control your own space online enables a certain kind of freedom. Take a look at some academics’ scholarly websites- Kathleen Fitzpatrick; Jason Heppler; Chantal Brousseau; Tim Sherratt. What unifies them? How are they different? What audience(s) do they serve? What constitutes effective presence?

To read

Please read the following tutorials about building static websites, especially the why of it all. I’m not a fan of Jekyll - I find it frustrating to use - but I want you to know these things. Don’t worry about trying to put together a Jekyll powered site using these tutorials (unless you really want to).

Visconti, Amanda. ‘Building a Static Website with Jekyll and GitHub Pages’. Programming Historian, Apr. 2016. programminghistorian.org, link.
Visconti, Amanda, et al. ‘Running a Collaborative Research Website and Blog with Jekyll and GitHub’. Programming Historian, Nov. 2020. programminghistorian.org, link.

To do

We’ll build a website using Pelican, which is a python package that will read a folder of text files (in the markdown format), pass them through a template, and spit out the necessary html files that make a website. We’ll then put these files online using Github Pages. (See my notes on using Obsidian - or other noteaking app using wikilinks eg Pelican - to write the markdown here). (update Sept 19: Given the dependency, path, and security issues we had in class, here for future reference are a set of notebooks that show Pelican working for both a blog or a collections website.)

Meeting 4. Networks

I was a relatively early proponents of network analysis in archaeology. We’ll talk about what a network perspective might offer (hey, it made my entire PhD!), as well as perils and pitfalls. (Here’s a past mre using a network approach on Ontario history)

To Read

Ahnert, Ruth, Sebastian E. Ahnert, Catherine Nicole Coleman, and Scott B. Weingart. 2020. The Network Turn: Changing Perspectives in the Humanities. Cambridge: Cambridge University Press. link. (Our library, direct link). This is a short work, read the intro and part 1, dip into anything else that strikes your fancy.
For examples of network analysis in the wild, this issue of the Journal of Historical Network Research is great - see in particular Ruffini’s conclusion to the issue, which addresses the ‘so what’ and the ‘we knew this already’ and ‘what if we’re wrong’. This is important. On a similar note, see Lincoln 2015 on ‘confabulation in the humanities’ here.

To do

A handy tool for quick network visualizations: https://networknavigator.jrladd.com/. Here is a dataset of the index of correspondence for the Republic of Texas; knowing nothing else about the Republic of Texas, how might visualizing this correspondence network provoke new insights or questions?

Düring, Marten. ‘From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources’. Programming Historian, Feb. 2015. programminghistorian.org, link.
Ladd, John R., et al. ‘Exploring and Analyzing Network Data with Python’. Programming Historian, Aug. 2017. programminghistorian.org, link.
Brey, Alex. ‘Temporal Network Analysis with R’. Programming Historian, Nov. 2018. programminghistorian.org, link. (You can install RStudio on your machine to run R, or you can change the runtime for Google Colab from Python to R like so.)

Meeting 5. Topic Models and Text Analysis

This week, we’ll do interesting things with large volumes of text. But we’ll spend a bit of time on getting the text too, which involves OCR. Handy bit of code: Here’s a Google Notebook I made that uses something called ‘paddleOCR’ to identify text in an image and then OCR it link. There are many other options for OCR’ing text.

Now that we’ve got a whole bunch of text, what might we do? I love the Data Sitters Club - they’re a group of scholars using a wide variety of DH approaches to understand an important book series from the ’80s & ’90s. Read about their misadventures with topic modeling here. Let’s also play with Voyant.

To do

Mähr, Moritz. ‘Working with Batches of PDF Files’. Programming Historian, Jan. 2020. programminghistorian.org, link. After OCR’ing pdfs, it does some topic modeling.
If your documents are in a folder of text files, give this a try instead. The topic modeling tool uses MALLET under the hood (and you can learn more about how that works and why, here.) Here’s a zip file with the text of historical plaques from Toronto that you can try fitting a topic model to. What patterns in ‘public memory’ do you find?

Meeting 6. examining images at scale

What can we see if look at vast amounts of historical imagery at once? I’ve just completed a project looking at social media and the trade in human remains (people buy and sell human remains online). A major tool we used were various neural network models trained to discriminate different classes of materials (including retraining such models for our own purposes). This week, I’ll talk about that for a bit, and we’ll think about under what conditions such approaches would be useful in your own research, and what dangers may lurk.

To Read

Chapter 1 in Arnold & Tilton’s book - OA version
Wevers, M. J. H. F., Vriend, N., & De Bruin, A. (2022). What to do with 2.000.000 Historical Press Photos? The Challenges and Opportunities of Applying a Scene Detection Algorithm to a Digitised Press Photo Collection. TMG – Journal for Media History, 25(1) link.
Melvin Wevers, Thomas Smits, The visual digital turn: Using neural networks to study historical images, Digital Scholarship in the Humanities, Volume 35, Issue 1, April 2020, Pages 194–207, link.

To do

Strien, Daniel van, et al. ‘Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 1)’. Programming Historian, Aug. 2022. programminghistorian.org, link.
Strien, Daniel van, et al. ‘Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 2)’. Programming Historian, Aug. 2022. programminghistorian.org, link.

Meeting 7. Images & Handwriting

Speaking of images, sometimes the most useful thing we can do is systematically keep track of both the images AND our annotations. That’s what Tropy’s for. And once we’ve gotten a collection of annotated images, sharing those notes, images, and annotations might be the most generous thing we could do.

Side Quest: Put your research photos online using a static website via Tropy (instructions here)

One thing that we often might want though from our images is an actual transcription of the handwriting. And many people don’t know how to read cursive anymore. And historians deal with vast amounts of materials. What’s a person to do?

To read

Nockels, J., et al. 2024. The implications of handwritten text recognition for accessing the past at scale, OCR & Handwritten text. Journal of Documentation 80.7, 148-167 link.

To do

Blackadar, Jeff. ‘Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision’. Programming Historian, Dec. 2023. programminghistorian.org, link.

(also: some multi-modal AI models might be quite good at transcribing images… or they might just make shit up. We ought to investigate a bit.)

Meeting 8. Open Research

This week’s theme is a little less-cohesive than usual, but stay with me here.

Ask yourself: has your research to date been sustainable or reproducible or replicable? What do these terms mean, for us? Why should historians care about this sort of thing?

There are two kinds of ‘research data’ that we could make available. There are our primary materials (and all of their related annotations and mark-up). There are our thoughts on the secondary materials we read to contextualize those primary materials.

To Read

It’s over 10 years old now, but Caleb McDaniel’s Open Notebook History still is posing questions that historians largely avoid. Have a read of what several archaeologists argue about ‘open science’ in archaeology. Where are the points of intersection between McDaniel and the archaeologists? What parts of this spark joy? What bits fire up anxiety? What does this mean for you and your research? (On a somewhat related note, see this collection of previous CU History graduate degree work; what opportunities are lost here, gained here?)
Marwick, Ben, et al. ‘Packaging Data Analytical Work Reproducibly Using R (and Friends)’. The American Statistician, vol. 72, no. 1, Jan. 2018, pp. 80–88. link. What bits are applicable to us?

To do

There are two rather different things we could do this week; nevertheless, there is a connection between them.

Let’s try setting up Alexandra Phelan’s workflow for connecting Zotero with Obsidian. It’s entirely possible this is too complex given the reward: in which case, find and then tell us about other ways of using Zotero for literature reviews. If you’re using Tangent, this workflow obviously won’t work - but how could you adapt it? (This article re a new piece of software called ‘Lattics’ is of interest). Now: knowing what we know about static websites… what if you made these materials available on the web?
See what doing open notebook history through creating a ‘datapage’ could look like here and here’s the template for making such a page ourselves. Other options exist (including things like datasette.io). Find a historical dataset. (You could use the network file I provided you at our 4th meeting, which was derived from this volume). Create a datapage for the dataset (you’ll need to check out The Historian’s Macroscope to find out how the data was transformed into this format.)

Meeting 9. Mallory’s papers: scraping & apis

Sometimes, historical information that we might want to study is provided via an ‘application programming interface’ or API. Sometimes, it’s in a webpage that you need to parse in order to get it into a useful format.

To read

Everyone should read and, perhaps, do:

Sugimoto, Go. ‘Introduction to Populating a Website with API Data’. Programming Historian, May 2019. programminghistorian.org, link.

To do

First, try this Colab notebook that works with the Chronicling America newspaper API: link

Then give this a try:

Williamson, Evan Peter. ‘Fetching and Parsing Data from the Web with OpenRefine’. Programming Historian, Aug. 2017. programminghistorian.org, link.

Another approach is to parse the structure of the website, and use the Wget command to give it a go. See

Milligan, Ian. ‘Automated Downloading with Wget’. Programming Historian, June 2012. programminghistorian.org, link.

and

Kurschinski, Kellen. ‘Applied Archival Downloading with Wget’. Programming Historian, Sept. 2013. programminghistorian.org, link.

!warning! Badly-formed wget commands (or commands not correctly shut down) can lead to downloading an awful lot of data and can make you look like a bad-actor, which we do not want.

Meeting 10: LLMS & Associated Technologies

There’s a lot to unpack about using the latest fad, the LLM (marketed as ‘AI’ because when you rig one up to a chatbot interface you get the illusion of intelligence). There’s a lot of angst about using these things for cheating in academic work. There are serious concerns about how these things undermine creative human endeavour. There are serious concerns about their environmental impacts during their training. But I still think you ought to have a deeper engagement with these things than what marketers and grifters want you to do with them (as the underlying language model for a chatbot that bullshits, in the philosophical sense of the word).

To read

I have a list of useful bookmarks in my undergrad class on this topic that are worth exploring. Select at least two items off that list. See also:

Bender, Emily M., et al. ‘On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜’. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, 2021, pp. 610–23. ACM Digital Library, link.
Graham, Shawn. ‘Smash the Looms’ Electric Archaeology link
Morley, Neville. ‘We Can Read It For You Wholesale’ Sphinx link

Given that this is all ultimately sophisticated autocomplete, should we even bother?

To do

Brousseau, Chantal. ‘Interrogating a National Narrative with GPT-2’. Programming Historian, Oct. 2022. programminghistorian.org, link
Graham, Shawn. ‘LLM as a discovery bridge for an API’ Electric Archaeology link

Meeting 11. Mallory’s papers: LLMs for NLP

I’m not going to bother with ‘generative ai’ in the sense of using LLMs to write text. That’s just foolish. I want to use their statistical understandings of (usually, most often) English to transform historical documents into forms that enable me to ask/explore questions of interest.

In this, I’m regarding LLMs as a powerful form of ‘natural language processing’, which is a whole swathe of tools and techniques meant to use the structural properties of language in order to impose (or extract) order from ‘unstructured’ materials like manuscripts, articles, diary entries… basically any text that isn’t in a database of some sort.

To read

Graham, Shawn et al. Investigating antiquities trafficking with generative pre-trained transformer (GPT)-3 enabled knowledge graphs: A case study. Open Research Europe. 2023 3:100. link

To do

Graham, Shawn. ‘Using a Large Language Model and Pydantic to Extract Structured Data for Cultural Heritage Crime’ XLab link

…obviously, my interests in antiquities crime are driving that piece of code. But try to adapt it. Ask yourself, ‘what would be a useful schema for extracting structured knowledge about some historical information I’m looking at?’… and then modify accordingly.

Meeting 12. Wrapping it all up

We’ll just leave this one blank, so that we have some lee-way in the schedule in case we need to change plans, etc.