Highest Rated Comments


halfak14 karma

Good Q. So, all of the vandal fighting systems in Wikipedia rely on a machine learning model that predicts which edits are likely problematic. There's ClueBot NG that automatically reverts very very bad edits and tools like Huggle/STiki that prioritize likely bad edits for human review. Before ORES, each of these tools used their own machine learning model. This would have been fine, but it's actually quite a lot of work to stand one of those models and maintain it so that it runs in real time. I think if it weren't so difficult, we'd see a lot more vandal fighting tools that use an AI. That's where ORES comes in.

ORES centralizes the problem of machine prediction so that tool/bot developers can think about the problem space and use interaction that they want to support rather than having to do the heavy lifting of the machine learning modeling stuff. Instead, developers only need to figure out how to use a simple API in order to get predictions in their tools. Currently, Huggle has switched over to using ORES, but I don't think ClueBot NG has. The developer of STiki was one of our key collaborators during the development of ORES. There are now many new tools that have come out in the past few years that use ORES.

halfak13 karma

It does, but we have to be cautious here. Predictions can affect people's judgement. If we have an AI with a little bit of bias, that can direct people to perpetuate that bias. Then if we re-learn on their behavior, we'll learn the bias even stronger! So we're very cautious about training on past behavior. Instead, we ask people to use a custom "labeling interface" that removes them from the context of Wikipedia and asks them to make a single independent judgement pasted on edits/article/whatever we're modeling. A cool thing about this is that we can ask users to give us more nuanced and specific judgements. E.g. rather than just predicting "Will this edit be reverted", we can predict "Was this edit damaging" independent of "Was the damage probably intentional".

Edit: Here's some docs about our labeling system: https://meta.wikimedia.org/wiki/Wiki_labels

halfak11 karma

I don't think that I have. Most of the vandalism we deal with is not very clever. Fun story, there's a cycle that is really obvious in the data that corresponds to the beginning and end of the school year in the West. It looks like quite a lot of vandalism in Wikipedia comes from kids editing from school computers!

Searching around, my colleagues found https://meta.wikimedia.org/wiki/Vandalbot which seems to be the kind of thing that you're asking about.

halfak11 karma

I think that, with better models and the better interfaces that people will develop on top of them, we'll have fewer people who are focusing on counter-vandalism interacting with good-faith newcomers. Instead, we'll be able to route good-faith newcomers to people who are focusing on supporting and training people when they show up at Wikipedia. I'm working with product teams at the Wikimedia Foundation now to start imagining what such a future routing system will look like. But I look forward to what our volunteer developer community comes up with. A big part of my job is making sure that they have the tools that they need to experiment with better technologies. I'm betting that, by providing easier to use machine learning tools, our tool developer community will be able to more easily dream of better ways to route and support good-faith newcomers.

halfak10 karma

Ladsgroup_Wiki's answer is great, but I just wanted to take the opportunity to share my favorite example of a protected page: Elephant

This page has been protected since 2006 when Colbert vandalized it on-air. Check out this awesome Wikipedia article: https://en.wikipedia.org/wiki/Cultural_impact_of_The_Colbert_Report#Wikipedia_references (Because of course there's a Wikipedia article about that)