Sunday, May 30, 2010

Easy is Complicated

For my Day Job (More signs that I've been officially hired -- my search entry in the campus people directory has been updated, although the office phone number is wrong.) I'm supposed to be learning a some scientific workflow software. This is open source software, and using the open source lingo I would say that it is less "free as in beer" but more "free as in puppies." And maybe a bit "free as in 'you get what you pay for.'"

Maybe I find it sort of silly because the current project involves running a formidable data set through R (which would be controlled by a script in batch mode) and then taking the results and passing it to another piece of software (my boss's pet software) that has a python scripting interface. So here I am learning a very complicated graphical user interface with some sort of thespian metaphor to basically say, "Run a script. When that's done, take the results and feed them into another script." And the purpose of this? To make things easier for non-programmers. I'm pretty sure that anyone whose work involves tasks like, "Run this script, then run that script" would most likely do that by... writing a script as a wrapper around the two other scripts. But what do I know?

Since I have not yet figured out this workflow software, I have simplified my working task down to: Run an R script and then do something. I am not yet worrying about getting the software to extract the outputs from R or feeding anything into the next piece of software. Baby steps.

Not really wanting to use one of the formidable data sets that I've been working with (you know, the ones that fill all 4GB of my laptop's memory), I grabbed some math department data that I still have (and theoretically should delete, as I no longer work there, but it is such pretty data). My universe of students are first-time-freshmen who entered the university in Fall 2008. From there I subsetted down to the ones who took College Algebra in their first semester, were unsuccessful, and then retook the course in the second semester. Here's a barplot of their spring grades. Note: "NC" (No Credit) is a special grade that is like a C-, D, and F all rolled into one -- but that doesn't count in one's GPA. This is the "non-success" grade in College Algebra, Freshman Comp, and Intro to Engineering, among other courses.