Baby Got Stats

Lyrics by Dorry Segev

To the music of "Baby Got Back" by Sir Mix-A-Lot


(spoken by 2 epi grad students)

Oh my god, Becky, look at his log file.. it is so big,

he looks like one of those biostats grad students.

But, you know, who understands biostats?

They only hired him because he looks like a total geek, ok?

I mean, his log file is like 200 pages.

I can't believe he even calculated, like, what's a Schoenfeld residual?

I mean gross. Look! He's just so... smart!


I like good stats and I cannot lie.

You other brothers can't deny

that when you get some data, and you put it in STATA,

and it spits out a beta of 10 you get sprung..

and you're thinkin' "No way gonna send that to JAMA today!"


Couldn't be any greater, an unbiased estimator.

Oh data, I wanna get with ya, regress and fit ya.

Scott Zeger tried to warn me, but those odds ratios get me so horny!

Ooh add a spline term, you say you wanna perfect fit?

Just add a quadratic... with STATA it's automatic.


I seen her 2-tail, kickin' it on the log scale,

no leaf, no stem, got it going with GLM.

I'm tired of magazines saying exact tests are the things.

Take the average grad student, she will say: "I like your logit way"


So fellas, fellas, did you download the DTA?

Reshape it, reshape it, reshape it with LDA.

Baby got stats


I like my R-squared's big.. the AUC I dig. I just can't help myself,

analyzin' like an animal, now here's my scandal:

I wanna sit at home and sum, double-up, sum-of-squares!

I ain't talkin' exact test, large sample assumption is the best.


I want a high coefficient, so find a cohort study.

If the data's muddy, I'll clean it up with my buddy.

Put a paper in Nature or Cell, takes 7 years to do it well...

You can tell everyone I'm a geek, but I write my grants in a week.


A word to you epi sistas.. I wanna get with ya, I won't overfit ya.

But it's gonna be great when we're playin' with Cox models all night long,

STATA got it goin' on!

A lot of people gonna be served..

They don't check for proportional hazards, but I got me a log log curve.

And I shout, without a doubt, don't make me take my DO file out!


So ladies, ladies, using methods from the 80's?

Well don't resign, get STATA 9, and your thesis will be fine.

Baby got stats


Yeah baby, when it comes to models,

Epi ain't got nothing to do with my selection.

2x2 tables?

Yeah, only if it's AJE (*American Journal of Epidemiology)


So your girlfriend rolls a Honda, runs a study in Rwanda,

but all I want is the data in the back of her Honda.

My anaconda don't want none unless your p is 0.01!

You can do stepwise or subsets, or even AIC.


Some brothers analyze survival, tell you censoring just don't count.

But I find them, and remind them: competing risks are behind them.

So they teach STATA in class, and the real world uses SAS,

but remember that STATA is nearly free,

and for SAS you pay a yearly fee.


Scatterplots i do adore, but i can do much more.

You want prediction, I'll create it, bootstrap and validate it.

A 650 geek went too far, tried to do it all in R,

He had game but he didn't perfect it, and Annals had to reject it.


So ladies if your budget's tight, ask me to do your stats tonight,

You'll have a valid sample size, and get your Nobel prize

Baby got stats


(you have no bias but you have no life)

(word to your data)