In 2014 The Economist set up a data team to make our coverage more quantitatively driven and rigorous. Some of our work relies on new data sources, as in our article on the ethnic makeup of Lebanon. Sometimes we produce statistical prediction models, like our recent one that gave Marine Le Pen just a 1% chance of victory before the first round. We’ve used data to illuminate social and economic trends, such as our dive into dark-web drug markets. And we produce visually gripping charts, including one depicting 1,200 years of cherry-blossom dates in Japan. Ask us anything!

Here's proof:

Here's some of our stuff:

Our French election model

Lebanon census

Shedding light on the dark web

Japan’s cherry blossoms are emerging increasingly early

Crime, ink - What can be learned from a prisoner’s tattoos

That's all we've got time for today but thank you for all your questions and we'll try to answer the rest of them shortly. Here's today's Daily chart on post-apocalyptic literature and the Doomsday Clock, and here's our Twitter feed if you want a daily dose of charty goodness.

Comments: 150 • Responses: 47  • Date: 

bengraham1627 karma

What's the most interesting/unexpected data you've come across or have researched?

AlexS-B47 karma

The MS Access database of every Floridian prisoner's tattoos was pretty interesting (I think we stumbled over it on Reddit, actually), and the 1.5TB of dark web scrapings yielded a nice story too.

danwin4 karma

For anyone reading this, I'm not sure if this was the post in question that caught the Economist's eye, but here's a r/datasets post about the Florida prisons database from a year ago:

For non-Windows users, here's an example Gist script which shows how to convert the Access file into more portable CSV files:

AlexS-B2 karma

I so wish we'd had that script when we were working on this. We had to dust off an old XP machine and manually copy everything out of the Access database. The biggest challenge though was the typos. Incorrectly spelt tattoos are a well-known phenomenon ("No regerts" is a fine example), but we also had to contend with typos from the prison administrators as they were entering the tattoo text into the system. As always, there was a lot of data cleaning to be done before analysis could start.

Laminar_flo21 karma

Question: What is your view on core data integrity and core data validation? How do you approach this issue philosophically? Or do you see a problem with this at all?

I'm asking from the perspective of someone on Wall St who currently works in quant fin on the credit side, and who has done a lot of similar conditional/covariant 'deep data' work over the years.

Just to offer my opinion:

I think this idea of data integrity/validation in practice is a massive issue that's largely ignored. To frame my position, I was on a struc fin desk before/during the 08 crisis. I built a lot of structured products. We built these fantastically complicated models that took hundreds of inputs to spit out a result (price/optimal tranche position/default prob/etc). However, knew that there were issue(s) with the core data we were using and because the data could not be validated, the model could not be validated - garbage in, garbage out. But 1) only a handful of people understood this and 2) nobody cared until too late. A slightly more recent example would be the polling data most (if not all) outlets used predicted a crushing HRC victory, when reality turned out to be...different (and no, Nate Silver has not addressed invalid data integrity - he hand waived it, and that's being generous. This is a very real issue).

People in the field (including several coworkers) deeply avoid this issue because it tends to call into question the results generated by the entire field. Its just too deep and unsettling an issue to even contemplate; and frankly there aren't any good answers. I'm guilty too - nobody can reasonably comb 3M datapoints to determine validity/integrity.

Poor data integrity/validation were some of the root causes of the financial crises, but it has almost been completely ignored since 08. It also seems like little has changed except that bigger, faster computers and digging through more and more data sets and spitting out just as many dubious predictions.

At my fund, we are massively de-levered compared to 08, which helps mitigate the practical impacts of unbound risks (data integrity/validation included), but it does nothing to actually address the underlying root problems.

Sorry for the mini-rant, but I'm curious to view your thoughts on it all. Thanks!

AlexS-B21 karma

We actually have an article in the works on the weaknesses of a lot of the survey data that many macroeconomic statistics are based on. Our own projects usually don't require such large datasets (though there are exceptions, like the Lebanese census piece). But even when they do, it's important to keep the alternative in mind: if your data aren't perfect but you don't expect the flaws to reflect systematic biases, they're still a valuable addition to non-quantitative information like anecdotes and expert opinion.

BobIsrael14 karma

What are the educational backgrounds of your staff?

AlexS-B17 karma

Here in the data team, we're a bit of a mixture. Some studied expected things like economics and journalism, others did history, chemistry or modern languages.

NikolaosM14 karma

How is company morale?

AlexS-B22 karma

Pretty good at the moment, but getting better all the time. I'd give it a solid 8/10.

greencracker14 karma

What kind of tools do you use? Python, Ruby, SQL, D3?

AlexS-B26 karma

Hello greencracker, we do a lot of our data exploration and analysis in R and Python (and even Excel), then finish off our charts in Illustrator. For interactives, we use D3.

juanjon10 karma

Men's underwear purchases. Good indicator of how the market is doing or nah?

AlexS-B9 karma

Good question! We are looking into it.

danwin9 karma

Day to day, how is collaboration between you and the "non-data" journalists facilitated?

AlexS-B10 karma

Often a journalist will just wander in and say that they've found some interesting data, or we'll bump into someone in the lift and ask what they're working on and that might kick-start a collaboration. There's no formal structure to it, which allows for some serendipity.

aardwolf218 karma

I'm a first-year undergrad student interested in data science as it applies to politics and economics - pretty much all the stuff y'all work on. Is there specific career advice you'd recommend as far as courses to take/things to do to get good at modeling? I've heard that you have to be a math major/get a PHD in math to do data science as a career, but I don't think this is feasible for me given my current degree plan. I am reasonably good at math/compsci but I'm not really sure what courses to take/what skills to develop. Love your work, thank you for doing this!

AlexS-B23 karma

You definitely don't need to major in math or do graduate work--as long as you have the skills, no one will care how you acquired them. There are lots of programs like Coursera and General Assembly where you can learn outside of an academic setting. Within a university, coursework in statistics, econometrics, and/or computer science is probably more relevant than pure math. You definitely need to know how to code, typically in R and/or Python. In terms of modeling, you'll learn a lot about regressions from a typical economics path. Machine learning might be more in the computer-science realm.

psmgx8 karma

So do you actually go out and buy Big Macs all over the world? And, if so, how can I get a job doing that?

AlexS-B15 karma

Sadly not. Ronald gives us data for most countries then we phone restaurants in the countries for which we're still missing data. At no point do we get any Big Macs.

krdaito8 karma

Do more people on your team learn data skills and come from a journalism background, or vice versa?

AlexS-B11 karma

Most of the DJs studied economics and maths then picked up programming skills on the job.

sunsqshd6 karma

As a data scientist and ex-web programmer, I love data journalism and cool visual things in news media -- but I'm also a big web accessibility advocate. Do you have any best practices on making charts and interactives accessible to all visitors?

AlexS-B2 karma

We're looking at ways of making our charts more accessible by using the SVG format so the text can be picked up by screen-readers. We also try to avoid colour combinations that present difficulties to our colour-blind readers.

eoinmurray926 karma

What tools do you use, and how to collaborate with each other? git, Jupyter notebooks?

AlexS-B10 karma

Hello eoinmurray92, we use Slack for quick collaboration and git for code-sharing.

DrAnother5 karma

What are your influences in data journalism from other outlets?

AlexS-B6 karma

We read pretty much everything published at the New York Times's Upshot vertical and at FiveThirtyEight, and pay attention when Vox does their own number-crunching. We also digest a lot of academic papers in political science--often summarized at the Washington Post's Monkey Cage blog--and economics.

champybh5 karma

Hello! How does The Economist go about validating data sources for accuracy before performing analysis that is used in reporting and do you ever make the raw data available so that others that are interested can do a deeper dive on the topics you investigate? thanks!

AlexS-B13 karma

Hello champybh, we have an excellent research department that finds and checks most of our data, but when one of our data journalists (DJs) comes across something, we ask another DJ to sense-check the data. Most of what we publish is very proprietary so we can't share too much, but we're exploring ways of making some non-proprietary data available on git too.

econguy9775 karma

I have two quick questions:

-Could you recommend any good books on interpreting data/data science for the lay reader? I'm thinking of books like Daniel J. Levitin's A Field Guide to Lies. Any others?

-I'm a first-year in college studying Economics. What classes/skills would you recommend that I take in order to get into the kind of work that The Economist's data department does?

AlexS-B5 karma

For lay readers, "The Lady Tasting Tea" gives a nice non-technical overview of the development of the field of statistics.

Econometrics and statistics classes are probably the most useful. I think more economics students would benefit from how economists and social scientists actually work with data. Try to read as many academic papers as possible. Most will be too difficult for first-year students, but some, especially in applied fields like labour economics might be clear enough for you gain a high-level understanding of how the authors approached a particular problem.

Much of what we do is just very applied social science. "Mastering 'Metrics" provides a nice overview of some the techniques used by social-science researchers, and is a good complement to traditional econometrics/statistics textbooks.

shiruken4 karma

After the 2016 US Presidential Election, I suspect that a lot of people became much more skeptical about blindly trusting "data journalism" from news organizations. What editorial practices does The Economist have in place to ensure that its readers are not being mislead by faulty statistics or malintent?

AlexS-B11 karma

Sorry, the original answer here was to the wrong question (have left it below). Here's the actual answer (no malintent!):

The 2016 campaign was not a particularly bad one for data journalism. Trump did trail in the polls, but only by small margins; most rigorous models put his odds in the 15%-30% range, suggesting there was a real chance he could win. (We wouldn't consider the ludicrously overconfident models that said Clinton was a 99-1 or 98-2 favorite with a 3-point polling lead to be serious "data journalism".) Whatever consensus existed that Clinton was a shoo-in to win came together in spite of data journalism, not because of it. We maintain the same editorial practices on our data team that we do across the whole company. Every story goes through a painstaking fact-checking process, and is edited up to seven times before it is published for clarity and accuracy. When we think the numbers tell a definitive story (for example, that Marine Le Pen's polling deficit was too big to overcome), we are not afraid to convey it strongly; when they're either less reliable or less certain, we're straightforward in saying that we don't have enough information to reach a sturdy conclusion.

[Original answer, wrong place] Whenever possible, we try to get numbers from official sources--governments, multilaterals, databases like the Bloomberg terminal, or the companies we're writing about. If our data are less reliable--as with, say, our scraping-based story on dark-web drug markets--we're just up front with readers about the limitations of the numbers, and what they might misrepresent or omit. We're generally pro-transparency and often post our code online.

Lostinspace443 karma

As a freshman in college working towards my Econ degree, I'm interested in this kind of career field. What is the past path to get to get to this sort of job?

AlexS-B1 karma

The degree choice puts you on a good path, but the two additional things that would really give you an edge in this field is getting some writing experience (for a college paper, say) and learning some coding skills (Python is pretty versatile).

forava73 karma

What projects are you guys looking to do in the future?

AlexS-B3 karma

We're currently deciding whether to try modelling the German election (the French one went pretty well), but generally we don't know what's coming very far in advance. Some of our best data stories have come from stumbling across an inauspicious-looking dataset and digging around in it.

ArkeryStarkery3 karma

Do you ever wish you had a byline?

AlexS-B2 karma

Not really. This explainer goes through the reasons behind the no-byline stance, and the advent of Twitter means our journalists do now get some recognition.

ArkeryStarkery1 karma

I didn't ask why the Economist doesn't give you one.

AlexS-B2 karma

Sorry. Reflex action when asked about bylines.

cmc06263 karma

Are you hiring?

AlexS-B5 karma

We are indeed looking for an interactive data journalist. Here's the job listing.

Lostinspace443 karma

Does this include internships?

AlexS-B3 karma

We don't have any within the data team but there are several internships elsewhere within The Economist.

coryrenton2 karma

the economist is known for its sassy captions -- is there an editor responsible for that -- how does that process work? Have you ever opposed the sassiness of a caption accompanying your work?

AlexS-B1 karma

Here in the Data team, we don't have much input on the sassy captions, and there isn't one particular editor responsible for them (although what a job that would be!). Usually the author of the piece will casually suggest something brilliant, or maybe the section editor. We get to come up with some chart titles though, which has its own rewards. Recent favourites include "Mistakes on a plane" (United airlines), "Alternative fats" (about butter's resurgence over margarine) and "Bureau de change" (Comey).

Kdeaarnr2 karma

How long is a typical piece in the works? Do you mainly do analysis week to week for each issue, or is it on a longer timeline?

Do you ever want to write about a topic but simply find it impossible to find the required data?

Also, this offline interactive graphic was wonderfully creative. Well done.

AlexS-B7 karma

There's no standard project length. Sometimes, we can turn around a study within a day of news breaking; others, as with our dark-web article, it can take months to acquire, clean, and analyse the data and complement it with traditional interview-based reporting. And yes, probably the majority of ideas we come up with wind up withering on the vine because the necessary data are not available.

And thanks. Print-based interactive graphics are always a challenge but that one worked well.

lugzann2 karma

How big is your team? What is a typical work day like for the majority of the people in this department

AlexS-B2 karma

There are half a dozen DJs who are each working on one or two stories at a time, and the same number of visual data journalists (VDJs) who create all of our chart output (for print, the web and our apps). And one interactive data journalist at the moment (see job listing above).

Our working week has a very strict rhythm: we all attend section meetings on Fridays and Mondays to discuss what's going in the paper, and either write or create charts up until (late) Wednesday night. The paper goes to press on Thursday morning, then we breathe for a minute and start thinking about what will go in the next week's issue. We also run a blog called Graphic detail where we publish chart-led articles every day.

mhsabbagh2 karma

As an independent journalist, if I created a data analysis and I would like to get an expert tip about it (for free) to see if my analysis are true, what/where do you think I can do/go?

AlexS-B2 karma

I heartily recommend the good people here on Reddit, specifically the r/dataisbeautiful subreddit.

NikolaosM2 karma

How is company morale?

AlexS-B3 karma


Ugghe2 karma

You've said both 8/10 and 9/10, which is the alternative fact?! Is this the type of inconsistency we should expect from the economist?

Sorry, I don't know how to make jokes online. I love the economist. Thank you for making news I can stand reading.

Edit. Can, not can't. Big difference.

AlexS-B2 karma

Ha, you're doing fine. Thanks for the kind words, and to actually answer the original question, morale is great. The Economist is an incredible company to work for and my colleagues are as friendly as they are smart.

Sir_Bantersaurus2 karma

Are there any plans to update the Economist applications to better reflect the types of interactive charts you do? At the moment they're pretty good but still follow the paper based design without taking advantage of what is possible on these devices.

AlexS-B1 karma

We are indeed looking into this, although I can see it being a bit of a challenge.

thimkerbell1 karma

Who spends time considering what other barometers (like the Big Mac index) would indicate how we're really doing, and how to collect that data?

AlexS-B1 karma

We all do, really. We have a weekly data team meeting in which we discuss ideas for projects and the kind of data-driven barometers that give a snapshot of a particular event or issue.

UneAmi1 karma

What did you study to become data person ?

AlexS-B3 karma

The path is varied. I personally studied the films of the Nouvelle Vague and Italian neorealism but this is not a guaranteed way-in. A lot of our DJs studied economics and maths, but the main thing is to pick up coding skills like R and Python to help when working with data.

kraahn1 karma

which tools do you generally use? which do you recommend?

AlexS-B2 karma

Python and R for working with data, Adobe Illustrator for creating static charts and D3 for interactives. There are many other tools out there but these are what we use primarily.

rtripathi1 karma

What are the various sources of data you rely on and how do you make sure they're authentic numbers ?

AlexS-B2 karma

Whenever possible, we try to get numbers from official sources--governments, multilaterals, databases like the Bloomberg terminal, or the companies we're writing about. If our data are less reliable--as with, say, our scraping-based story on dark-web drug markets--we're just up front with readers about the limitations of the numbers, and what they might misrepresent or omit. We're generally pro-transparency and often post our code online.

_k0k0r0_1 karma

What do you think of redefining Big Data as a natural and/or renewable resource?

What do you think of Big Data as a resource protected as a Public Trust?

AlexS-B0 karma

Funnily enough we wrote a big package on this last week.

AutoModerator1 karma

Users, please be wary of proof. You are welcome to ask for more proof if you find it insufficient.

OP, if you need any help, please message the mods here.

Thank you!

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

urlwolf1 karma

Any plans to do a special report on fintech, particularly in Asia? There's one graph in your current banking special report that is eye popping. Asia is growing like crazy in Fintech

AlexS-B1 karma

There's nothing in the pipeline (you can see a schedule of upcoming special reports here) but it's an interesting topic.

Twisted-Biscuit1 karma

Hey guys, appreciate the AMA. Broadcast Engineer here in a news and current affairs department.

Just wondering what sort of challenges you face when it comes to reaching new audiences? Do you see much value in pushing content through new distribution channels like Snapchat or Instagram? If so, do you think active engagement with your audience through these mediums is important?

AlexS-B3 karma

Hello Twisted-Biscuit. Social media is massively important to us because we have an awareness problem, particularly across the pond where half of Americans have never heard of us, according to our research. And many who have heard of us think we write only about economics and finance. Today, we have a team of 10 social media editors and writers to help us get the word out. We have over 40 million followers across Facebook, Instagram, Twitter, LinkedIn and LINE. Social media is crucial in helping us reach more people and ultimately get more subscribers. You can learn more about what we do on Medium.

fusionsc21 karma

As a reader, and subscriber I absolutely love the Economist.

I have a few quick questions about the business side of the paper: how much money does the Economist make each year? Would you have any idea how that money is broken down?

AlexS-B1 karma

Hello fusionsc2, very glad you like us. All of our annual reports are in the public domain here.

BaakensBlog1 karma

Who typically leads on a data project: one of the data team or a specialist writer or editor? And does your team get challenged by other folk in editorial to track down data useful sets?

AlexS-B3 karma

Typically a data journalist, in consultation and collaboration with the relevant beat writers and editors. And yes, we're frequently asked by colleagues to dig up numbers on a topic and try to answer questions that they've had trouble getting a firm answer on from their sources.

Young_Economist1 karma

Hi! I am a subscriber to the Economist, that's how much I love your work. I would love to know what you have done (where you worked, what did you do?) before you started working on the team you are now on.

What stories did you investigate but it came out that there was nothing to write about? Please write about some failures, if you would.

Are you situated in London? Where do you go for lunch?

Edit: I just wanted to tell you how awesome the "data journalism" thing is.

AlexS-B1 karma

Hello Young_Economist, thanks for saying how much you like what we do. My path to working in the Data team was probably quite irregular (a week at the Ritz, a year at Calvin Klein then 17 years at the EIU), but that's kind of the point. If you can tell an interesting story with data, it doesn't matter where you studied or worked before.

Regarding stories that come to nothing, Kenny Rodgers put it best.

And lunch-wise, we mostly grab something and bring it back to the office. Except Wednesdays. We live for Wednesdays.

mrmedia20161 karma

are there plans to provide significant raises and/or bonuses for your hardworking data journalists?

AlexS-B3 karma

All the hardworking data journalists have already received their bonuses and significant raises. Ha ha. But seriously, get back to work 'Mr Media', if indeed that is your name.

fuckp21 karma

What's the expected/typical level of education in your department?

AlexS-B2 karma

Everyone has a degree, and there's a couple of people with a Masters, and one PhD candidate.

__andrei__1 karma

Frequentist or Bayesian?

AlexS-B0 karma

I am reliably informed that we are most definitely Bayesian.

Masterbrew1 karma

Can you make the site zoomable on mobile? I can't zoom in on the cherry blossom chart.

AlexS-B1 karma

Sorry, we know this is annoying and we are working on making our charts zoomable on mobile. In the meantime, here's a bigger version of the chart.

iwas99x1 karma

u/AlexS-B, will you answer more questions later or no?

AlexS-B1 karma

Yes, I'll do my best but Wednesday is our busiest day so I might not be able to get to all of them.

greencracker1 karma

What's an instructive mistake you've made -- hopefully one caught in editing/review?

AlexS-B3 karma

We've all made a Reinhart/Rogoff-style Excel mistake in our time, but we have many layers of fact-checkers and proof-readers so things get caught before publishing.

Fyooree1 karma

Do you think it's true that, very broadly speaking, there is decreasing trust in data among the public and that experts' opinions are less convincing than before? Very often I hear friends and colleagues talk about the "post-truth" world we live in and this era of populism when public relations and echo chambers reign supreme... Do you believe that data or evidence or logic have the power to influence the world on a popular level? Or is this approach flawed and the audience for data should be small, targeted, and contextual? Perhaps the idea of "putting a human face on data" has some merit?

AlexS-B3 karma

You can torture statistics to support any claim you want if you try hard enough. Sadly, many politically and ideologically motivated actors do so frequently. Certainly in the United States, belief in the value of expertise and trust in supposedly impartial sources like the non-partisan media has now itself become a partisan trait. Data can only influence voters if they're willing to change their minds when confronted with evidence that contradicts their prior beliefs. Figuring out how to promote such open-mindedness is one of the most pressing, and difficult, challenges of our time.