Thursday, April 24, 2014

R you srs?

Who has heard of the open source statistical software package R? Anyone interested in statistics or who uses statistical software on a regular basis should know about R. Obviously, all of the six people who read this blog know about R, but I'm going to keep going with this for the people who accidentally stumble across this page looking for information on R. Or maybe for people who were searching for this who just happen to be familiar with statistical software packages but don't yet know R (an obviously miles-wide swath of the internet population).

As I mentioned in my previous post, I'm enrolled in an econ/development program at Evergreen right now, and along with that I have an individual learning contract with my professor studying econometrics using R. Econometrics is the mathematical application of economics. Mostly we're talking about regression models used to analyze things as mundane as marketing strategies for fast-food companies and as far reaching (and perhaps as abstract) as the value of a statistical life. Econometrics has enormous application potential, and trying to learn it or read studies or articles based on econometric research without a pretty solid foundation in statistics is, um, challenging.

R is a programming language, a topic I know essentially nothing about. What I do know is that R is based on S. So there you go. Any programmer folks who want to chime in on a richer history of R should definitely do so in the comments. It's actually pretty cool stuff, which I have read a bit about, but don't know enough of off the top of my head to say much more than I already have.

What's so great about R? Well, has anyone used Excel for any mathematical or statistical applications? It's pretty cool, right? First of all, Excel has a pretty easy to use interface. It's mostly a point and click tool. Also, for someone who is just learning stats and/or stats software, Excel seems to be moderately powerful. You can obviously store, arrange, and analyze all of your data in spreadsheets, and, using the data analysis, you can run ANOVA, basic regression models, and t-tests and F-tests and the like. Excel is good for these things. It's also good for keeping track of your budget, or doing a school presentation, or designing tables and graphs, etc etc. Excel is a broad tool with numerous applications whose mathematical and statistical applications are limited by the programmers who designed it.
R, by contrast, is limited only by the user. As I mentioned before, R is open source and was designed specifically for statistical analysis. What does open source mean? Any R user can design their own package in R to cater to their own particular needs. Theoretically, if there is a test you can't perform in R, you can design and program the test yourself and tell R to run it for you. With R, you are your own programmer. Not only that, but R has a huge community of users, many of whom are statisticians and programmers, who are often available to offer assistance in the use of R. The strengths of R can also be a barrier to using it. With R, it's garbage in, garbage out. There is a steep learning curve, and it is very intimidating to those of us who do not come from a world of programming.
A vital point which makes R perhaps the best bang for the buck: it is totally free to use. Go here to download it right now. Excel isn't too expensive, relative to programs like SPSS, SAS, and STATA which can cost hundreds of dollars (but which are all more powerful than Excel).

So anyhow, I'm learning both R and econometrics right now, but what I'd really love is to know of anyone who uses R in epidemiology applications. I've begun to delve into the academic world of epidemiology in reading Epidemiology: An Introduction, but I'd love to hear from epi students about the kinds of statistical methods used in epidemiology. It would also be brilliant to hear from some epi students who know/use R. Also, what other statistical software are folks using in the world of epidemiology? Does anyone have experience with multiple software packages and have a preference?
I'd love to hear thoughts and opinions on this.

No comments:

Post a Comment