Teach Don't Document

by  on

Yesterday, HuggingFace launched their "HuggingFace course" — A self-paced introduction to their NLP library along with short explainers. My first reaction was "oh great, another tutorial in the glut of Deep Learning educational resources", but since it was from HuggingFace, I put my opinion aside and gave it a closer look. The more I looked, one thing was apparent — this should be how developer tools/library "documentation" should be done.

Back in undergrad, when I was into systems programming, I would pour hours reading UNIX man pages, like this one:

It was esoteric and required a certain trained eye to find the right man page, make sense of it, and skim over parts that didn’t matter. You typically gained that eye with at least a couple of years of doing it, and if you hung out with other systems programmers or on mailing lists, you could sense a certain masochistic pride in reading terse dry documentation. Many of them had spent a considerable part of their lives to know the difference between F_SETLK and F_SETLKW. The expression RTFM("Read The Fucking Manual") reeks of a superiority fed by such masochisms.

If you look at machine learning and statistics libraries from the past, like Scikit, Numpy, or Pandas, their developer outreach started and ended with documentation that resembled UNIX-style man pages. Pages upon pages of terse stuff (NumFocus is trying to change that now).

With Machine Learning increasingly looking like a software engineering discipline as opposed to a research area producing and consuming within its community, MOOCs lowering the barrier to how much one has to invest to "do machine learning", and a new library coming out every other week, developers have little patience to spend months mastering terse documentation before doing something useful. How then do you capture the mind share of the talent who will go on to evangelize your precious framework or library in their workplaces?

You teach! You teach with short engaging videos. You teach with ready-to-use code snippets that can be copy-pasted in any work context and give the developer a quick win. The old-school man-page-style developer documentation still exists (docstrings in Python land) but its purpose now is as a secondary reference. You go from the HuggingFace course to asking questions in their forums or on Stackoverflow until someone gives you what you want in a copy-pastable snippet or a link to a documentation page describing exactly what you want. RTFM can suck it. I am not here to debate whether this is better or worse but to say this is how things are. HuggingFace is not alone in doing this. Weights & Biases, for example, seem to be doing this from their start.

If you are remotely involved in the Developer Tools or Developer Productivity business, then aim to teach and not just document. Don’t just throw your code on a Github repo with a README and docstrings. The terse documentation is for your 1% expert audience. For the rest – Inspire, Empower, Entertain!

Tags: 

Copyright © 2021. Delip Rao