“Coding is easy”
There’s an attitude that I believe is reasonably common among applied economists, that coding is the easy part of what we do and where we add the least value. If we were building a house, coding is like laying bricks - it’s the “menial” and “boring” part of the job (no slight meant against actual bricklayers) - and we add the most value as the architect by coming up with an interesting research question, thinking about endogeneity and so on.
This is true in a narrow sense: it’s not hard to learn how to replace missing values, or to merge two datasets. But you need more than that to execute a research project. You have to perform each of these little tasks over and over again in various permutations, and you have to figure out how to do so in a way that doesn’t drive you nuts. That is, you actually need to know how to organize the project and have a reproducible workflow. Going back to the house metaphor (and here we reach the limits of my knowledge of the construction sector), you need a site manager or foreman - someone to make sure that things are done efficiently and come together in a coherent manner.
I don’t disagree that our value-added comes primarily from research ideas, but I do think that project management and workflow needs to feature more prominently in the conversation. This is an activity we perform for days on end and it is all too easily swept under the carpet.
What’s this got to do with R?
A sentiment I often hear being expressed by R skeptics is that you can do whatever task you need to in Stata.
I agree with this.
OK, don’t get too excited. Yes, most things an applied economist needs to do, they can do in R, Stata, Matlab, and sometimes (dare I say it) Excel. But in my opinion, comparing languages based on whether they can perform specific tasks is a fairly unhelpful exercise. It’s focusing on the brick-laying at the expense of the “foremanning”. Where R (+ RStudio + tidyverse) excel, and where I believe Stata is lacking, is in project management and workflow.
I won’t try too hard to justify that claim here, but just as an illustration, this blogpost on NIH principal investigators was written completely within R. From scraping to cleaning to joining to visualization to writing to compiling the post itself. To be able to do all these things within a single programming environment is really, really nice.
Don’t NOT use R because you are counter-signaling
My bigger concern about the “R and Stata are basically equivalent” argument is that it is often accompanied by the claim that using R is more about signaling than practical value. This same argument is often made in the context of comparing LaTex and Microsoft Word. Occasionally this is followed up by some statement about how you don’t need fancy new tools to write a good paper. This feels like a counter-signal to me: I’m not wasting my time with new tools because I’m too busy thinking deep thoughts.
Look, I get it. Learning new languages takes a lot of time, there are network effects to consider, and we’d all rather start playing with the data as soon as we can. These are all perfectly legitimate reasons to stay with the language you’re most familiar with. But it disappoints me when experienced researchers skirt around the issue of tooling or take pride in not being a sheeple following the latest gizmos, because it sends a message to budding researchers that these aren’t important issues. Sticking with Stata could be optimal for someone who’s used Stata for 10 years; that doesn’t make it optimal for a grad student starting on their first research project.
Use R because it makes your life easy
Which brings us back to the title of our post 1: use R because it makes your life easy, not because you’re trying to signal some kind of technical wizardry. The irony in the idea that R is a way to signal technical competence is that many R developers today are trying to make it as easy to use as possible. For instance, a theme that Hadley Wickham often talks about is designing tools so that users fall into a “pit of success”.
As a more concrete example, RStudio has “point-and-click” options via its Addins because, well, sometimes point-and-click can be better.
I honestly find working in R a lot more enjoyable than working in Stata, which is why I’ll extol its virtues to anyone willing to listen. But of course it’s fine (sort of) if you decide it’s not the right tool for you. The problem isn’t that people aren’t reaching the right conclusion about tooling. It’s that far too often we’re having the wrong conversation, or not even having it at all.