Alt-titles: “How to Have Fun Writing AI Research Papers”, “How to Survive Writing Alone”, “How to be Expressive in your Writing”
Vared Schwartz wrote some excellent tips about writing an NLP paper. Many other NLP faculty have similar advice on their homepages. However, this is good advice only for writing a traditional paper, *after* you have the experiments, results, etc. This sequence of work typically happens in assignment papers and research papers at large grad-school/ industry lab settings where you have a topic — often dictated by your advisor’s grant goals or lab’s product goals. Further, you have an advisor (or a research manager) with whom you can meet regularly for frequent feedback on your explorations.
Many folks interested in AI research may not identify with this. Perhaps you are in a grad school with an advisor who is not meeting with you often (famous professor problem), or you are a researcher in a small school or startup without much “community” for meaningful feedback, or you are a lone warrior with no pedigree to show other than good hacking skills, and a couple of GPUs in your rig. If you identify with this class, have a strong interest in participating in science discourse, and wonder how to write a paper for fun, I am writing this essay for you. I am calling this crowd the Free Radicals.
Before we start, do read Vared’s post, which describes the structure and style of papers acceptable to the NLP community. It is essential to mimic that in your writing, as reviewers consciously or subconsciously match those templates and can quickly smell a submission from the “outgroup”, which can bias acceptance. This happens even when founders write pitch decks/memos to VCs, but I digress. If you work outside of Vision, Robotics, Speech, or any other area of AI, fret not. Most faculty in your area have some or the other didactic advice on paper writing — go snoop around on their homepages!
Okay, you are a Free Radical, and you want to publish. How do you proceed?
Start with One Intuition or an Experiment
If you read a lot of papers (an essential prerequisite for publishing), you might already have a hunch about something. It could be a gotcha when reading a paper, a “what if?”, a “this is interesting!”, or a “wtf! That makes no sense”. This is your One Intuition. You might have several One Intuitions, but pick one that will keep you going. Keep the rest handy in someplace. Write a short paragraph about your One Intuition and why it matters.
If you are the hacker type and write code before anything else, you might have experimented with a dataset and have some results. Don’t bother if the experiment is novel or not. If you ask and examine enough questions deeply about any experiment, you are guaranteed to bring out novelty. For example, you ran BERT to classify sentences and have F1 scores today in 2023. This “result” is so 2018, but no problem. Write it up and tabulate the results. This is your First Experiment.
The write-up you have about your One Intuition or First Experiment is the kernel of your paper. It is there to get you started instead of a blank page. You can then ask “why”, “what if”, and “how” questions around the kernel. Answering these might require you to do more experiments. Write those up too. You have one layer of material accruing around the tiny kernel you started with. After a few more rounds, more layers will deposit, and you might eventually replace this kernel with something else.
So in our BERT example, you might ask:
- Why does my setup not do as well on this dataset as on this other dataset?
- What if we replace the tokens with OOV? How does that change the result? What kinds of tokens can be more OOV than others?
- What if we were to mix different tokens from different languages (code-switching)?
- Why is it taking x seconds/token for classification? Can I do better?
You can ask endless questions, but this should give you an idea of proceeding from the kernel to something deeper.
Creating new science is often an improv act, and writing is a tool for clarifying thinking. The paper is merely a byproduct than the end product. Once you have a “paper” (of sorts), you might still want to go back to the drawing board and rewrite it so it looks like you had a clear theme all along.
Start with the Abstract
Contrasting with Vared’s advice, you, my Free Radicals, will start by first writing down the article’s abstract. Initially, it should read like a PR notice of what you want to happen if the experiments go well come true for that One Experiment or One Intuition. It will accumulate more details as you progress, and it should stop reading like a PR notice. Let your abstract become your guiding star while you navigate writing the rest of the article. As the article progresses and updates, in a true EM fashion, the abstract gets updated too. It is possible for your current exploration path to end in a dead end, and you might have to make significant changes to the abstract (again based on your intuition) for it to become a different north star. This is like a “random restart” in EM experiments. Paper writing is a heavily iterative operation. Get comfortable with it.
Don’t be Afraid of Using Big Words
I find this blanket ban on using vocabulary above high-school reading level silly in academic papers. I know we are not making students study Latin/Greek and the reading levels of the general population is coming down, but that doesn’t mean we dilute our writing over time.
In 2009, I opened an abstract with this sentence:
“In the menagerie of tasks for information extraction, entity linking is a new beast that has recently drawn much attention from NLP practitioners and researchers.”
A casual reader might question what’s the point of using a word like “menagerie” here. This was 2009, and I built complex entity-linking systems before Deep Learning, before LLMs. I had spent more than a year wrestling with large datasets, over painful Sun Grid Engine, dealing with all sorts of hairy curve balls that languages like to throw at us hapless practitioners. I channeled that suffering in writing this sentence by comparing it to a “beast”. We ranked 1st among all the Entity Linkers evaluated by NIST that year, which included submissions from the best NLP labs. We had overcome the beast and lined it up in the menagerie along with other “solved” IE tasks. A coauthor wanted to remove this sentence, but I pressed to retain it. If the words mean something to you, keep it.
To give another example, the “Principle of Parsimony” would sound less memorable/effective if the creator did not use the word “parsimony”. Trust me; there is room for using the word “utilize”, too. Using words for pretentiousness’ sake will become obvious to everyone, and you will feel awkward even when writing it. Know your words and use them wisely.
Problem with “Write How You Speak”
This well-meaning advice is particularly bad. First, it assumes native speakers in the language you are writing in. For L2 speakers (aka non-natives) of English, it will lead to some high-perplexity English constructs. Secondly, it allows native speakers to impose their idioms and -isms. For example, when I first started interacting with white speakers in the US, I was thrown off by Americanisms like “heads up” and “touch base”. Like all immigrants, I struggled, and now I can speak Americano, mostly indistinguishable from those who were born here, but this experience has made me quite sensitive about how I write. Often when people say, “Write How you Speak”, they mean more like “Write how the culturally dominant class speaks”.
Don’t Confuse Good Writing/Science with Getting a Paper Accepted
Getting papers accepted at peer-reviewed venues is wonderful; you should aspire to that. In some ways, by not publishing, you are putting yourself out of the official scientific discourse, even if you did the work and tweeted about it. However, good and clear writing alone is not sufficient to get a paper accepted. Submitting a paper to a conference often involves writing defensively for reviewer rejections. There is nothing wrong with this, but you should know the game you are playing.