Thursday, April 24, 2014

R you srs?

Who has heard of the open source statistical software package R? Anyone interested in statistics or who uses statistical software on a regular basis should know about R. Obviously, all of the six people who read this blog know about R, but I'm going to keep going with this for the people who accidentally stumble across this page looking for information on R. Or maybe for people who were searching for this who just happen to be familiar with statistical software packages but don't yet know R (an obviously miles-wide swath of the internet population).

As I mentioned in my previous post, I'm enrolled in an econ/development program at Evergreen right now, and along with that I have an individual learning contract with my professor studying econometrics using R. Econometrics is the mathematical application of economics. Mostly we're talking about regression models used to analyze things as mundane as marketing strategies for fast-food companies and as far reaching (and perhaps as abstract) as the value of a statistical life. Econometrics has enormous application potential, and trying to learn it or read studies or articles based on econometric research without a pretty solid foundation in statistics is, um, challenging.

R is a programming language, a topic I know essentially nothing about. What I do know is that R is based on S. So there you go. Any programmer folks who want to chime in on a richer history of R should definitely do so in the comments. It's actually pretty cool stuff, which I have read a bit about, but don't know enough of off the top of my head to say much more than I already have.

What's so great about R? Well, has anyone used Excel for any mathematical or statistical applications? It's pretty cool, right? First of all, Excel has a pretty easy to use interface. It's mostly a point and click tool. Also, for someone who is just learning stats and/or stats software, Excel seems to be moderately powerful. You can obviously store, arrange, and analyze all of your data in spreadsheets, and, using the data analysis, you can run ANOVA, basic regression models, and t-tests and F-tests and the like. Excel is good for these things. It's also good for keeping track of your budget, or doing a school presentation, or designing tables and graphs, etc etc. Excel is a broad tool with numerous applications whose mathematical and statistical applications are limited by the programmers who designed it.
R, by contrast, is limited only by the user. As I mentioned before, R is open source and was designed specifically for statistical analysis. What does open source mean? Any R user can design their own package in R to cater to their own particular needs. Theoretically, if there is a test you can't perform in R, you can design and program the test yourself and tell R to run it for you. With R, you are your own programmer. Not only that, but R has a huge community of users, many of whom are statisticians and programmers, who are often available to offer assistance in the use of R. The strengths of R can also be a barrier to using it. With R, it's garbage in, garbage out. There is a steep learning curve, and it is very intimidating to those of us who do not come from a world of programming.
A vital point which makes R perhaps the best bang for the buck: it is totally free to use. Go here to download it right now. Excel isn't too expensive, relative to programs like SPSS, SAS, and STATA which can cost hundreds of dollars (but which are all more powerful than Excel).

So anyhow, I'm learning both R and econometrics right now, but what I'd really love is to know of anyone who uses R in epidemiology applications. I've begun to delve into the academic world of epidemiology in reading Epidemiology: An Introduction, but I'd love to hear from epi students about the kinds of statistical methods used in epidemiology. It would also be brilliant to hear from some epi students who know/use R. Also, what other statistical software are folks using in the world of epidemiology? Does anyone have experience with multiple software packages and have a preference?
I'd love to hear thoughts and opinions on this.

Shaking off the Cobwebs and an Econ Hangover

My last post started with a statement along the lines of "It's been awhile since my last post." Well, it's been even longer since that one. Here's my excuse-o-rama:

1.) I was unable to get into any public health related programs at Evergreen. Well, more accurately, there weren't any for me (I'll explain why after a few hundred words about econ/development). Instead, I've been in a economic development program. This was NOT my ideal choice, but there are some obvious parallels and overlaps between the goals of public health and economic development.
My takeaway from the program thus far?

   a) Economics is a phenomenally frustrating subject. Too many assumptions are accepted as fact and there are too little opportunities to test assumptions before they become accepted theories in the field. Thus, far-reaching policies are designed based on those assumptions. This isn't an econ blog, so I'll provide quick example with no evidence to support my claim: the Washington Consensus and the Bretton Woods Institutions that championed free-trade starting in the late 70s and early 80s. What a horrifying failure. I encourage anyone interested in seeing why the world has been thrust into such atrocious poverty to do a little research into those two things.

   b) Development economics is a phenomenally frustrating subject. There are a lot of people working to solve the world's poverty issues who are doing great work, trying to make the world a better place for everyone. Unfortunately, development is another area rife with unchecked and untested assumptions. The book Poor Economics is a discussion on performing randomized controlled trials (RCT) to test ideas in small settings before creating systemic policy changes. This is an idea of which I generally approve coming from the public health mind frame, but in the world of development, there are massive ethical snags (for which the authors seem to have no thought). Performing RCTs on a vulnerable population to observe whether or not giving mosquito nets away for free (rather than selling them for market price or at a subsidized price) is the best way to get the highest number of nets out into an area where malaria is prevalent seems... sketchy to me. There is no informed consent for the subjects of the trial. There is little interaction with the people themselves to ask what they might think would work best for their personal situation. Treating people as little more than test subjects rubs me the wrong way.
On the other hand, I think the RCT approach is better than making assumptions and implementing them with no data to suggest possible outcomes ahead of time. RCTs could be done better, and could very well provide small scale interventions for people suffering some of the worst effects of gut-wrenching poverty. I do recommend the Poor Economics book.

   c) Reading textbooks by economists is phenomenally frustrating. To illustrate, I'll cherry pick a suggested policy prescription from one of the development textbooks the class is reading this quarter:

"[I]ndiscriminate educational expansion will lead to further migration and unemployment. [There are] important policy implications for curtailing public investment in higher education."

... Right. So one solution for the problem of mass migration in say, India, from rural to urban areas is to curtail education. The thought here is that educated people are more likely to migrate into the cities but there are too few jobs, so people wind up crowding into filthy, crowded shantytowns. Because (in part) there is too much indiscriminate investment into education. I'll just let you think about that one, fair reader. (This came from the book Economic Development by Michael P. Todaro and Stephen C. Smith, FYI. The section quoted came from a description of the "Todaro migration model" as described in the textbook. I have so many complaints and critiques of this book I could write an entire counter-book against it. But I'm not a PhD, so what do I know?)


   d) There are some great people doing brilliant work in development. Paul Farmer has one of the best books I've ever read on the subject, Pathologies of Power. Any looking into Paul Farmer's history will reveal my bias toward him; he's a physician and a medical anthropologist whose work in development is based on what he calls "O for the P," option for the poor. He insists that any policy that doesn't benefit the worst off people is missing the point at best, or more likely useless. I'll write more on Farmer in the future. For now, along with Pathologies of Power, I recommend Mountains Beyond Mountains by Tracy Kidder for a bit of a biography of Farmer and his organization Partners in Health.
Another man who has changed the way development is viewed is Amartya Sen, Nobel prize winner, philosopher, economist, and someone who reminded economists and policy makers that Adam Smith's original intent was to solve the problems of poverty rather than increase revenue (in not so many words). Sen was a co-creator of the Human Development Index, which measures economic development not just in terms of income per capita, but also in measures of agency, access, health, and education. Go figure.


(Now continuing with my excuses...)

2.) I haven't actually taken any epidemiology courses. As much as I love Evergreen, I wish the college's pedagogical design allowed for some advancement in certain subject areas. For anyone unfamiliar with Evergreen, the college offers no majors or degree tracks (other than in the sciences, which are intensive and incredibly competitive, and three graduate programs). Unfortunately, this means I essentially exhausted my public health study options last year, hence my enrollment in this development/econ course.
I've been reading an epidemiology textbook (y'know, for fun, in all my spare time) and hope to move more into this world after I graduate, and I hope to provide posts more germane to epidemiology on here soon.


3.) My family is growing. Baby number three is on the way, and should be here in fewer than two weeks (fewer than two days, my wife hopes). Here's to baby girl!


4.) I'm lazy.


5.) I've been busy. Lay off me.

Alright, that's what's going on right now. Thanks for reading the novel. I get carried away sometimes. But guess what? I have more to say. Check back for (hopefully) more frequent updates that actually pertain to epidemiology.