Hi! 10 years ago, as grad students at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), we were fed up with all of the bogus journals and conferences that spam researchers and charge crazy fees for articles they don’t even read before accepting.

So we created “SCIgen”, a program (available here) that randomly generates grammatical nonsense computer-science papers. SCIgen made major headlines when our SCIgen paper “Rooter: A Methodology for the Typical Unification of Access Points and Redundancy” got accepted to the World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI).

WMSCI un-invited us from attending the event in Orlando, but we managed to crowd-fund $2,500 to fly from MIT, rent a room inside the conference space, and hold our own series of randomly-generated talks, wielding fake names, fake business cards and fake moustaches.

SCIgen continues to be used hundreds of thousands of times a year. Last year IEEE and Springer Publishing got rid of more than 120 papers from their websites after a French researcher determined they were “written” by SCIgen.

More info: MIT News story, CSAIL Tweet

A little bit about us…

Dan Aguayo (@aguayoooooo)
* worked on the Roofnet project at MIT until 2006 when he left to join Meraki, which had spun out of the project
* he has remained a member of the technical staff there since (Meraki is a part of Cisco since 2012)

Max Krohn (@maxtaco)
* co-founded SparkNotes and OKCupid as an undergrad at Harvard
* now leads Keybase, which is aimed at making cryptography more accessible

Jeremy (no Twitter - follow @MIT_CSAIL instead! Or find me on GitHub)
* worked at IBM, Google and Nicira
* just joined Keybase this month

For the 10-year anniversary of SCIgen, we made a new program called “SCIpher”, which hides messages inside the “calls for papers” (CFPs) emails that bogus conferences like WMSCI are always barraging grad students with.
With SCIpher, you put in a secret message you want to tell a friend, and SCIpher creates text that looks like a CFP; send it to your friend, and if they put it back into SCIpher, the message will reveal itself.

We'll be here starting at 2 p.m. EST. Feel free to ask us questions about anything, including:
- how and why we created SCIgen and SCIpher
- what it’s like to perpetrate a hoax
- our favorite programming language
- what it was like to be at MIT (we were all in CSAIL’s Parallel and Distributed Operating Systems group)

Disclaimer: we are by no means speaking for MIT, Cisco Meraki or Keybase in any official capacity!

Proof: https://twitter.com/maxtaco/status/587438503836827648

https://gist.github.com/strib/c0aa6b1a0f1a39e168d8

UPDATE 4:05 EST: thanks for all of your questions! Hope to be on again soon.

Comments: 57 • Responses: 17  • Date: 

amc2200420 karma

What do you make of the fact that last month Springer Publishing released a tool that can detect SCIgen papers? Is this a good thing for publishers, or, as Slate suggests, "a tacit admission that even at the most reputable publishing houses, some peer-reviewed journals are incapable of providing even the most minimally competent peer review"?

SCIgenAMA27 karma

Jeremy: Yeah, this is pretty standard arms-race stuff. I think it would be trivial to beat that detector, and they could then beat THAT generator, and so on. At some point it's easier just to do "minimally competent peer review", right?

Though as I said in another response, one reasonable use for such a detector is to find people that have already used SCIgen to pad their CVs in the past. It's hard to believe, but such people actually exist! I swear I am not one of them, though some conference rejections I've received might imply otherwise.

wonkypedia14 karma

I got inspired by this and created my own. I used very basic Markov chains trained on a bunch of paper abstracts. The results seem pretty good if you have good training data.

What is under the hood on scigen?

SCIgenAMA29 karma

Jeremy: we explicitly avoided Markov chains or anything else that was technically challenging, in the service of trying to make the papers as funny as possible. With Markov chains, you might get something syntactically correct, but it is likely to be boring.

With SCIgen, we literally sat around for two weeks and just brainstormed buzzwords, clauses, paragraph structures and other paper elements just based on what we thought would be funny. That's the grammar. Then SCIgen itself just goes through the grammar and makes random choices to fill stuff in. That's why you see things like "a testbed of Gameboys" in the evaluation sections sometimes -- we just thought it would be hilarious.

maltamal8 karma

Which are the most prestigious conferences or journals, I mean one that is not ordinarily thought of as a fraudster conference, that has actually accepted one of these auto-generated papers? (especially in CS, but any other fields okay too)

SCIgenAMA10 karma

Jeremy: The highest profile ones I know of are the Springer and IEEE journals: http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763. Those ones are pretty interesting actually, because I don't think it was the intention of the submitters to expose the journals as fraudulent -- they were just trying to pad their own resumes!

That said, those particular journals are not considered prestigious. They were just using a well-known brand name. Any actual prestigious conferences use peer review, as they should.

shinypidgey8 karma

Who had the idea to put the author's name in some of the fake citations in order to make it look like they were citing some of their own work? That's my favorite little touch.

SCIgenAMA3 karma

Jeremy: I don't remember, but I think that was pretty much a requirement if you're taking on academic papers. It was basically pre-ordained from the start of the project.

malgorithms4 karma

For each of you, what was the most rewarding part of the project: reading the papers? presenting them? writing the grammar? seeing it get accepted in the conference?

SCIgenAMA11 karma

Jeremy: putting the fake conference together was probably my favorite part. Setting up shell corporations, getting disguises, tricking the hotel into thinking we had a real purpose there -- it was like we were getting a real taste of what it was like running WMSCI!

That, and the fame and fortune.

SCIgenAMA5 karma

Max here: I loved the claim for a while that ROOTER was the most widely-read CS systems paper. I wonder if that is still or was ever true?

autocorrector4 karma

What's your favorite thing about CSAIL? I UROPed there recently and I miss it.

SCIgenAMA5 karma

Max here. I'm not sure how much of our experience was specific to CSAIL or to being a grad student in general, but our adult supervisors were pretty hands-off when it came to letting us roam free on this project. We got a few eye rolls every now and then but they gave us the leeway to burn many hours on SCIgen. Cheers to them. And of course there was a huge multiplicative factor on our work, so we as a research group have wasted man-centuries of time that could have been spent reading coherent papers or otherwise contributing to society.

oconnor6634 karma

What was the best post-SCIgen paper generator you guys heard about? Did the Postmodern Essay Generator come after you guys?

SCIgenAMA3 karma

SCIgenAMA3 karma

Jeremy: I have a vague recollection of POMO from around that time, but I'm pretty sure I didn't know about it when developing SCIgen.

silence73 karma

To what extent (if any) were you inspired by Alan Sokal?

SCIgenAMA3 karma

Jeremy: I will have to plead ignorance on Sokal at the time -- didn't learn about that hoax until I started doing media interviews and got asked about him. But since I have learned of his awesomeness, and considered myself retrospectively inspired.

ankscricholic2 karma

What's your favorite programming language? Who came up with this idea and how long did it take to implement it?

SCIgenAMA5 karma

Max here: The original version was programmed in ..... Perl! I ripped the code off from TheSpark.com's high school english paper generator, which was also written in Perl. It's since been modernized. All of the magic is in the grammar rules though. Those are in a DSL, I guess you'd call it nowadays, or in a "data file" as we called it back then.

SCIgenAMA3 karma

Actually the SCIgen code -- still available via CVS! -- is still in Perl, and it's a disaster. But the new SCIpher code (https://github.com/strib/scipher) has been upgraded to Python so it can leverage NLTK.

The original SCIgen took about 2 weeks for the three of us. The media frenzy that followed took much longer to deal with.

heroltz9982 karma

EDIT: The link was just messed up for me. No conserns for anyone else. I don't know the reason and apologize for the false info.

Did you know that your last link to gist doesn't work? At least in my browser It has extra symbols which are not shown directly in browser, but are in the actual link.

My actual question for you is that has anyone tried to get payback because you created this program, like threaten to sue you or something like that?

SCIgenAMA4 karma

Jeremy: weird about the gist. The html of the post looks right, I don't know where the extra symbol is coming from. Anyway, here's the link: https://gist.github.com/strib/c0aa6b1a0f1a39e168d8

As for your question: the worst "payback" we got was when original conference that accepted our paper retracted the acceptance, after the media attention. Boo! And then when we set up our own fake conference next to their conference, they tried to keep their attendees from coming.

We've gotten very little negative feedback for SCIgen in general. I think the anti-SCIgen position is pretty hard to defend.

amc220042 karma

What's the strangest conference that's accepted a SCIgen paper? Do you know of any non-CS journals that have accepted SCIgen papers?

SCIgenAMA3 karma

Jeremy: you can see http://pdos.csail.mit.edu/scigen/#relwork for a few weird ones. Honestly I stopped keeping track of the success stories a while back so it's a bit out of date. I do particularly like the Russian story though: http://pdos.csail.mit.edu/scigen/blog/

(EDIT: original link was broken, sorry.)

Especially since the Russian word for "Rooter" now implies low-quality science. And that was really our goal from the start.

samipjain1 karma

Is there any kind of software which can catch the paper is created by SCIgen while submitting?

SCIgenAMA4 karma

Jeremy: Yes! http://scidetect.forge.imag.fr/

Springer is positioning it as a positive thing, but it seems like just a way for them to avoid having real peer review. I guess one good thing about it is you can use it to catch the resume-padders out there just trying to exploit the system. But besides that, there are better ways to solve the problems exposed by SCIgen than just having a detector.

SCIgenAMA3 karma

Max here: it really calls into question the purpose Springer is even serving in a modern world where publishing papers is free and easy, and the editorial oversight is largely provided free of charge by academics working on public research grants.

10-10withrice1 karma

What's your favorite video game?