Hi Reddit! My name is Rebecca Goldin, and I am a Professor of Mathematics at George Mason University in Virginia. I direct STATS.org, which helps journalists understand statistical results from the sciences and social sciences, as well as other mathematical topics that come up in their work. You can find some of my posts for STATS.org here.

We get asked a lot of questions by journalists, including: what are the different kinds of errors in polls? What is the difference between "statistical significance" and "clinical significance"? Why is causality so hard to determine?

Ask me anything about how statistics and mathematics are used (or abused) in the media you consume, or what kind of questions journalists deal with that statistics can help clarify! I'll start answering at noon EST, and I'll be around for 2 hours to answer questions.

Thanks to the Mathematical Sciences Research Institute (MSRI) for setting up this AMA.

Proof: https://twitter.com/rebegol/status/798287722553155586

Photo: https://twitter.com/NatMathFestival/status/800738337019396096

Edit, 2 p.m.: Thanks, everyone, for all the great questions! My time is up, but you can find my articles at STATS.org (linked above) or my Twitter, and I'll be at the National Math Festival on April 22, 2017 in Washington, DC with dozens of other mathematicians and math organizations from around the US. If you want to know more about who'll be there and what kind of free events are happening for all ages, you can check out the website. I hope to see some of you in April!

Here are some articles (a few mentioned below, some just interesting), for those that want to dig deeper:

Gentle intro to reading scientific study for nonexperts

More about p-values from Vox and 538

Causation vs. correlation

New today from STATS.org: What the 2016 Presidential Election taught us about polling, predictions

I recently commented in the Wall Street Journal for an article on online hotel reviews and the bottom line (paywall)

For fun, some spurious correlations to explore.

Misguided science with fMRI (measuring brain activity):

Comments: 103 • Responses: 20  • Date: 

gerpsohappy19 karma

Hello Dr. Goldin, former student of yours here. Cool to see you doing an AMA on Reddit!

My question is: how can we improve the polling accuracy for elections? Days before the most recent presidential election, NYT had Hillary winning with 90% likelihood. Only one or two major news networks come to mind that predicted Trump's victory. What are the major shortcomings that led to this huge difference in expectations and reality?

RebeccaGoldin13 karma

Fantastic question! Lots of people are opining on this one. One issue is the question of who pollsters were speaking with. Trump referred to "dirty polls" and made other disparaging remarks of polling, and polls were often linked to media (also disparaged). Perhaps some supporters didn't want to speak to pollsters. If we know how big this effect is, it can be corrected for... but it might have been a new wrench in the process.

RebeccaGoldin7 karma

The organization I direct, STATS.org, just posted a few comments on exactly this question. They are written by members of our Statistical Advisory Board!

natewOw10 karma

Hi Dr. Goldin, thanks for doing this. Let me preface this by saying I am not a conspiracy theorist in any way, and I am a professional statistician. My question is - given how widely all of the polls from the weeks leading up to the election were projecting Hillary to win, and how completely wrong all of them were, is there something to Trump's accusations that the polls were in fact "rigged"? It seems improbable to me that so many pollsters could be so completely wrong, in such a way that systematically favored one candidate over another, by pure chance. Thanks!

RebeccaGoldin15 karma

Systematic bias could definitely create the outcome we saw. Systematic bias would include anything that unexpectedly favored recording the opinions of Clinton supporters over Trump supporters.

Another factor may have been the estimates about who would actually vote. Pollsters had to predict who would actually show up to vote. If they (as a whole) over-estimated the likelihood of Clinton supporters voting, and under-estimated the likelihood of Trump supporters, we would also see this effect!

Finally, another contributing factor pertains to when people made their decisions. Earlier on in the election cycle, people who didn't like either candidate stated that they would not vote. Perhaps more of these people decided to vote, and vote for Trump, than expected.

normee8 karma

Hi Dr. Goldin, thank you for participating in this AMA.

As a statistician I am worried about the confluence of a "post-truth" political environment, increasing distrust of media and academic research on both the left and right, and inevitable blowback to the "big data" hype of the past few years. I have a couple of questions for you about this shift:

  • What strategies can people communicating quantitative information take to defuse a lack of receptiveness due to cultural polarization or distrust of intellectual authorities?

  • I am concerned about the ability of federal research services like BLS and the Census Bureau to do their jobs under an incoming administration that believes unemployment data are "phony" and has expressed interest in dismantling several longstanding government agencies. Do you think this fear is founded, and if so, what do you think the social science communities relying on these data can do to push back?

RebeccaGoldin10 karma

Thanks for your questions. I will aim to be as apolitical as possible in responding.... Generally speaking, people do not respond well to being bombarded with information, or even explained to why their information is incorrect. So, to your first point, one way to communicate effectively is to tell a story. I mean a real story, not about the numbers, but about you or the real impact. It may seem statistically invalid or even offensive to someone who hates n=1, but a story about one person's experience with their home flooding is more convincing that climate change is real than is a discussion on the modeling choices and confidence intervals of sea level rise. This tactic has long been used by people who doubt statistics -- "My aunt smoked her whole life and she never got cancer" or "My son has autism and he was vaccinated just before he got it." Telling stories can effectively communicate your view, too, when you're speaking about topics of importance -- not when teaching statistics. For example, "I have a son with autism; he and his cousin were both vaccinated, but his cousin didn't get autism. I wanted to know what people who do research on autism thought might be behind it." Notice that I didn't say: "if you think your kid got autism from vaccines, you're WRONG! The statistics prove it!"

For unemployment data, it's easy to find stories of people not finding jobs, or people finding them if that's your outlook, and then talking about numbers.

Another effective tool: use frequencies rather than percentages, and avoid jargon. It's more digestible to hear, "For every 100 people looking for a job in the United States, eight people will be unable to find one in the next six months," compared to "The unemployment rate has reached 8 percent." (I am just making up those numbers, and I'm not even sure how the unemployment rate is calculated).

As far as what happens with the data collection agencies, it's hard to know. I hope there are many people who will resist efforts to dismantle these agencies, but of course it's hard to predict what the new administration will bring. Historically there have been times that the president didn't view science and progress they way many people in the science community did, and the country as weathered it. For example, NIH funding rates were sometimes frozen or modestly reduced, but hardly ever entirely removed. I don't think I have great insight here: write to Congress/visit your congressional representatives, make sure your professional organizations are "on it" and advocating for the value of the data, put together explanations of why the data are so important, and keep voicing your opinions.

And when you talk to a congressional aide, make sure to tell him or her a story to back up your view.

johhnytexas8 karma

I am taking statistics online right now, how much would you charge to do my homework for me?

RebeccaGoldin24 karma

I'll give you a binomial distribution with p=0.0.

SecretAgentZeroNine6 karma

  1. What is your go to statistics novel, textbook, or both?

  2. R or Python for analysis?

  3. What is your favorite visualization tool?

  4. Do you have a presence on YouTube or Twitter?

RebeccaGoldin9 karma

Twitter handle is @rebegol

I like R for analysis but Python is easier to learn. It's not as clearly "set up" for statistics.

Don't have a favorite vis tool.... but often impressed by things that others come up with.

As far as favorite statistics novel, that's something to think about. Would love to see a statistics novel for kids, so we could consider it for the Mathical Book Prize!

itssomeone5 karma

What would you say are the most common abuses/misuses of statistics in the media?

RebeccaGoldin19 karma

The biggest abuse is, in its own way, the easiest to disentangle. It's the implication of causality when a correlation is found.

RebeccaGoldin21 karma

Perhaps our brains are hardwired to believe there's a reason behind everything -- and we jump to the most obvious conclusions. You may enjoy this site with a lot of spurious relationships.

Frajer4 karma

When statistics are abused as you say or misconstrued do you think that's generally intentional or accidental?

RebeccaGoldin10 karma

Well, it depends! Over the years, my sensitivity to this issue has introduced some additional nuance. Sometimes people believe something, so they cite statistics to support this view. At times, they abuse statistics, but it's hard to ascribe it as intentional or accidental. It may be accidental because it fits with an intentional belief system. This happens particularly with heated topics that really get people riled up.

But I believe most journalists are honest, and that their misconceptions (and abuse) are truly accidental. Learning math and statistics are huge investments; if you have limited time, you try to wrap your head around the piece of it you need, or try to express a concept in your own words, and it doesn't come out right. A lot of times, journalists don't have the time or expertise to wrap their heads around the big picture, so they just leave out important aspects or caveats that come from the statistics that they are using.

But I will say that many media sources are motivated not to dig deeper, because the story is better without doing so. This happens especially when media suggest that one thing causes another, "as explained in a recent study".

russresearcher3 karma

Hi there- My son is a 6th grader and has expressed interest in studying statistics- he seems to see numbers in everything:-) Can you recommend any specific books, fiction or nonfiction, for him to read? Any other tips/suggestions for him besides doing well in math?:-) Thanks, in advance, and looking forward to the National Math Festival- it's on the calendar!

RebeccaGoldin2 karma

The first suggestion that comes to mind is that you should get him involved in science experiments and/or science fairs. Kids who can use statistics are lauded and appreciated over those who don't, and he will gain a real sense of why it's so important to know about elementary statistical ideas: sample sizes, confounding factors, effect size... they will all come into play in fairly basic science experiments. Of course learn all about describing his data using statistics like mean/media and variance. He can easily graduate to more advanced topics as he goes along.

russresearcher2 karma

Thanks- I'll share your response with him. He loves doing science experiments at home and has participated in science fairs. He actually did one a couple of years ago where flipped a penny 100 times to see the resulting head/tails:-). Any interesting books for middle schoolers that you can recommend? TIA

RebeccaGoldin3 karma

In addition to the books suggested by MSRI and my emphatic agreement that Martin Gardner wrote fantastic puzzles, here are are a few others I have enjoyed. Some of these authors have published multiple books, so look them up. And I should add that several of these books go way beyond the 6th grade level, but you should encourage your son to enjoy any book as he wants/can without getting too worried about understanding the whole of it.

The Phantom Toll Booth by Norton Juster , a Mathical Book Award winner

The Pea and the Sun: A Mathematical Paradox, by Leonard M. Wapner

The Adventures of Penrose – the Mathematical Cat, by Theoni Pappas

Anno’s Mysterious Multiplying Jar, by Masaichiro and Mitsumasa Anno,

Flatland by Edwin Abbot

Really Big Numbers by Richard Evan Schwartz (A Mathical Book Award Winner)

I may come back with more as they come to mind. Please excuse any typos!

golflady163 karma

What advice to you have for an aspiring college Statistics major? (undergrad)

RebeccaGoldin9 karma

Understand WHY you apply the statistical tests you apply, and what assumptions need to hold. You will be a much more powerful thinker if you ask that question in a regular way. Good luck!

davideisenbud2 karma

Why do medical results seem to "decay" over time? -- in the sense that repeats of big studies seem to show less effect, or less significance, than the original?-- (Hello from MSRI--great that you're doing this.)

RebeccaGoldin9 karma

Some medical results result from large data sets that were not specifically designed to test a hypothesis. For example, you might observe in a large data set that women who use hormone therapy have fewer heart attacks than women who do not. Since we are implicitly (or explicitly) combing through the data for possible relationships, there's a much higher chance that the relationship we found (lower risk of heart attacks is associated with hormone therapy use) is spurious than if we had specified what we planned to look for before we we started combing through the data.

You might ask why this is: If there's absolutely no relationship between reduced heart attack risk and hormone therapy, there's always a chance that we could get data suggesting that there is. We might call this "extreme data". The chance of seeing extreme data is small... but the more relationships you look for (assuming there are none at all), the larger the chance that some extreme data will exist, entirely due to the randomness of the process, not a true relationship.

Once we find a connection in a large data set, we might set out to see if another data set has a similar relationship. In our new data set, we test again to see if hormone therapy is associated with reduced heart attack risk. Since we're now looking for one relationship only, if there is in fact no relationship between lower heart risk and hormone therapy, the chance that our data suggests a relationship is small.

davidmanheim2 karma

I've taught graduate classes where people have trouble with p-values, and so I'm wondering two things.

1) Can you discuss how you think p-values and NHST are confusing to journalists, and what can be done to make that easier?

2) Do you think that bayesian methods could be easier to understand, once the initial lack of familiarity is overcome?

RebeccaGoldin7 karma

Let me try to answer the first question as a start (and my apologies to the poster, as I'll explain more than someone teaching graduate statistics would need to hear, for the benefit of readers who are unsure about p-values):

The p-value is hard for new statistics learns to wrap their heads around. I start with admitting the real problem: we want to know what the probability that a medicine works or not. But the p-value cannot give you this. Already, most journalists are baffled because they (as with many people who have heard of the term) believe it has something to do with the question of whether the experiment shows us something real or not.

After explaining what we wish we could know, I point out that we can easily find a different probability: if the medicine DOESN'T work, then how likely are we to get this data, or more extreme? This is the p-value. Sometimes I discuss coin flipping and finding p-values -- no calculations, but graphical displays with histograms for probabilities makes it easy for people to figure out a p-value. I'm setting up the discussion cognitively by insisting that the null hypothesis really is true. They "get" the p-value in that context.

Then I might ask people what the probability is that I have a fair coin if, for example, I flip a coin 10 times and 9 times it's a H. They are rightfully unsure how to answer that. But it's very clear to them that the question about how likely it is that the coin is fair or not has no direct relationship with the probability of seeing extreme data. That allow us to talk about suspicion and other means that could inform us that the coin might not be far. (This is a nice place to bring in Bayesian methods..... but in workshops I have done, there isn't time to go there). Then we return to a medical example or a physics example, and I go over the logic again....

Usually I don't bring up the term "null hypothesis" at all, or I only discuss it after I have explained what a p-value is actually the probability of.

Dank_psyche2 karma

What's your opinion on graphs in media that don't start at 0?

RebeccaGoldin10 karma

Context is everything! You may have seen this video put out by Vox. It has some good material in it, though title is a little lacking :)

They point out that if you're graphing your temperature every day throughout the month, it would be hard to see any variation if you included a temperature-axis that starts at 0. The tiny changes from 98.6 to 101.2 would barely be visible! And of course it would be even worse if you graphed it on the Kelvin scale.

If context suggests not starting at 0, provided that the vertical axis exists and has clear labels, I am ok with that! The real issue is not to choose the units to make a point that is somehow misleading, such as suggesting huge changes when they are actually really small.

Babe_Vigoda2 karma

Are there some news organization that so a better job of reporting on statistical data than others?

RebeccaGoldin6 karma

I think Vox Media does a great job in describing some of the issues that come up with statistics. And 538 does a great job in talking about, and using, statistics as well. Most news organizations aren't really reporting on data -- they are reporting on a result that uses data. Some media outlets are doing what is called "Data Journalism", in which they provide their own analysis or description of data. I wish they would involve statisticians more with that!

pussgurka2 karma

How do you explain statistics to the general public?

RebeccaGoldin15 karma

With jokes is the best way! A great example would be to explain the effect of confounding factors on polls (or data collection generally). Here's a poll pointing to Democrat vs. Republican experiences with sex

I bet they didn't control for the confounder of gender: women are more likely to be Democrats.

Ndemco1 karma

How do we solve the problem of scientists falsifying their research results to make the data support what they're trying to prove in order to get funded for more research?

RebeccaGoldin3 karma

I believe only a small proportion of scientists are actually falsifying research results -- and there are some cheaters in every profession that exists.

More generally, it would be nice if funding agencies appreciated the value of high quality work enough that it would trump large volumes of research. I'd like to see repeated, independent studies funded for high impact research. Replicability is grossly underfunded in my opinion.

Tino91271 karma

I recall seeing something about how there is some discussion amongst experts regarding the validity of the p value.

I don't believe it went anywhere, but if it had, and it was determined to be an invalid tool for statistical analysis, what would come of that?

RebeccaGoldin5 karma

Many people agree that the tool is valid, but that the conclusions we draw are typically too strong. Another concern is that a lot of information that can't be summarized in a p-value is not mentioned -- effect size for one.

n8dawgindahouse1 karma

Math major here! Do you have any advice for getting accepted into a graduate school? I am learning both SAS and R and have high grades in my undergraduate statistics classes. Any suggestions for resumé builders outside of class?

RebeccaGoldin2 karma

Research projects! Talk to your professors and ask how you can be involved.