Part 1: Should Students in English learn to train & fine tune language models?

I get this question often from folks who are organizing DH and Data Science syllabi. Weighing how important it is for students to be able to train or fine tune a model — even if they don’t have a robust intuition of the underlying math.

It’s a difficult question, first and foremost, because the topic is outside the comfort zone of most students in English. Do I start from concepts, like underfitting and overfitting? Do I just have them run a notebook? Do I start them from a data gathering exercise? It’s a dizzying position.

I might have more to offer on this later. What I can provide here, however, is some industry experiences that highlight this work is necessary.

Throughout my career, and a little to my surprise, more and more visual artists and creative writers have reached out with questions about models they have trained. Usually trained on their own portfolios or a target style they want to capture. I’ll get questions like:

”I’m trying to tune this image generator to mimic my art style, but why do the edges look weird?” ”I’m trying to tune a language model to act as a companion character for a game I’m building in my free time, but my model just sounds kind of creepy.”

Getting asked these questions really gets one thinking.

Folks in cultural analytics and info/social science rightfully tend to frame machine learning as a tool for exploring data or building influential systems in our world. When we talk about including machine learning in humanities departments, we tend to emphasize how much ML can help scholars explore our corpora. Or navigate power structures embedded in ML systems.

That is all still true! On top of that, I think we also need to really consider the kind of relationship English Departments want with students in the various creative arts who want to experiment with generative machine learning models. Is it enough to send off students with cautionary tales of machine learning and admonitions of corporate culture? Or do we want to be able to guide them in other ways?

Because, increasingly, I see a real need for people who can speak to both machine learning and aesthetic, cultural, historical perspectives. Creative professionals are looking for people who can say ML things like:

“Well a known issue with this loss function is that visual borders get hazy, as the model is trying to minimize an error value and hedging its bets. Let’s see if we can scale this loss function in a different way.”

Or humanities things like:

“Okay, companion characters are very hard to write. You’re dealing with a trope that has a wide range of expectations. Let’s double check and make sure this is actually an issue with your training as opposed to genre tensions.”

So, when English folk ask me if students in English need to learn this. I think about my work with creative professionals a lot. Because, if English students don’t learn how to do this, then creative professionals will go to someone else who can answer these questions.

And two observations have me anxiously feeling like there are already other parties that they will turn to. One, I’m interviewing more and more Computer Science PhDs who specialize in training and tuning LLMs to generate fictional prose under different constraints. Two, I see a lot of decent ML advice in Discord servers coupled with meh perspectives on culture, fiction, history, etc.

This, of course, isn’t to say that CS folks and rando Discord lobbies can’t have robust perspectives on these topics. I just feel like students in English, who take a few math and programming questions, are going to better serve the kinds of questions creative writers and artists have.

Currently, however, my anecdotal sense is that the humanities are actively conceding this conversational space, at a pivotal moment, where amateur and professional creators are figuring out their relationship to Machine Learning. And, perhaps more importantly, figuring out who knows how to answer their questions.