The FIR Podcast Network Everything Feed

FIR #477: Deslopifying Wikipedia


Listen Later

User-generated content is at a turning point. With generative AI models cranking out tons of slop, content repositories are being polluted with low-quality, often useless material. No website is more vulnerable than Wikipedia, the open-source reference site populated entirely with articles created (and revised) by users. How Wikipedia is handling the issue — in light of its strict governance policies — is worth watching, especially for organizations that also rely on user-generated content.

Links from this episode:

  • Wikipedia Editors Adopt ‘Speedy Deletion’ Policy for AI Slop Articles
  • How Wikipedia is fighting AI slop content
  • From the technology community on Reddit: Volunteers fight to keep ‘AI slop’ off Wikipedia
  • Wikipedia:WikiProject AI Cleanup
  • Wikipedia loses challenge against Online Safety Act verification rules
  • Wikipedia can challenge Online Safety Act if strictest rules apply to it, says judge
  • The next monthly, long-form episode of FIR will drop on Monday, August 25.

    We host a Communicators Zoom Chat most Thursdays at 1 p.m. ET. To obtain the credentials needed to participate, contact Shel or Neville directly, request them in our Facebook group, or email [email protected].

    Special thanks to Jay Moonah for the opening and closing music.

    You can find the stories from which Shel’s FIR content is selected at Shel’s Link Blog. Shel has started a metaverse-focused Flipboard magazine. You can catch up with both co-hosts on Neville’s blog and Shel’s blog.

    Disclaimer: The opinions expressed in this podcast are Shel’s and Neville’s and do not reflect the views of their employers and/or clients.

    Raw Transcript:

    Shel Holtz (00:00)

    Hi everybody, and welcome to episode number 477 of For Immediate Release. I’m Shel Holtz.

    @nevillehobson (00:08)

    And I’m Neville Hobson. Wikipedia has long been held up as one of the internet success stories, a vast collaborative knowledge project that has largely resisted the decline and disorder we’ve seen on so many other platforms. But it’s now facing a new kind of threat, the flood of AI generated content. Editors have a name for it, not just editors by the way, we do as well. It’s called AI slop. And it’s becoming harder to manage as large language models make it easy.

    to churn out articles that look convincing on the surface, but are riddled with fabricated citations, clumsy phrasing, or even remnants of chat bot prompts like as a large language model. Until now, the process of removing bad articles from Wikipedia has relied on long discussions within the volunteer editor community to build consensus, sometimes lasting weeks or more. That pace is no match for the volume of junk AI can generate.

    So Wikipedia has now introduced a new defense, a speedy deletion policy that lets administrators immediately remove articles if they clearly bear the hallmarks of AI generation and contain bogus references. It’s a pragmatic fix, they say, not perfect, but enough to stem the tide and signal that unreviewed AI content has no place in an encyclopedia built on human verification and trust. This development is more than just an internal housekeeping matter.

    It highlights the broader challenge of how open platforms can adapt to the scale and speed of generative AI without losing their integrity. And it comes at a moment when Wikipedia is under pressure from another front, regulation. Just this month, it lost a legal challenge to the UK’s online Safety Act, a ruling that raises concerns about whether its volunteer editors could be exposed to identity checks or new liabilities. The court left some doors open for future challenges, but the signal is clear.

    the rights and responsibilities of platforms like Wikipedia are being redrawn in real time. Put together these two stories, the fight against AI slop and the battle with regulators shows us that even the most resilient online communities are entering a period of profound change. And that makes Wikipedia a fascinating case study for what lies ahead for all digital knowledge platforms. For communicators, these developments at Wikipedia matter deeply. They touch on questions of credibility.

    how we can trust the information we rely on and share, and on the growing role of regulation in shaping how online platforms operate. And there are other implications too, from reputation risks when organizations are misrepresented, to the lessons in governance that communicators can draw from how Wikipedia responds. So, Shail, there’s a lot here for communicators to grapple with. What do you see as the most pressing for communicators right now?

    Shel Holtz (02:52)

    Well, I think the most pressing is being able to trust the content that you see is accurate and authentic and able to be used in whatever project you’re using it for. And Wikipedia, we know based on how it’s configured, has always been a good source for accurate information because it is community edited, errors are usually caught.

    We have talked about in past episodes, the fact that more obscure articles can have inaccuracies that will sit for a long time because nobody reads it, especially it’s not read by people who would have the right information and correct it. But by and large, it is a self-correcting mechanism based on community, which is great. It does seem that the shoe is on the other foot here because when Wikipedia first launched,

    I’m sure you’ll recall that schools and businesses banned it. You can’t use this, you can’t trust it. It’s written by regular people and not professional encyclopedia authors. Therefore, you’re going to be marked down if you use Wikipedia, it’s banned. And they fought that fight for a long time and finally became a recognized authoritative site. And here they are now banning something new.

    that we’re still trying to grapple with. We do need to grapple with it. The AI slop issue is certainly an issue. I worry that they’re going to pick up false positives here. Some of the hallmarks of AI writing are also hallmarks of writing. I mean, if I hear one more person say, an dash is absolutely a sign that it was written by AI. I’m gonna throw my computer out the window.

    I’ve been using dashes my entire career. I was using dashes back when I was doing part-time typesetting to make extra money when I was in college. And dashes are routine. There is nothing about them that makes them a hallmark of AI. That is ridiculous. But we are going to see some legit articles tossed out with the slop. The other thing is some of the slop may have promise. It may be

    the kernel of a good article, and this is a community platform, and wouldn’t people be able to go in and say, wow, this is really badly written, I yeah, yeah, I may have done this, but there’s not an article on this topic yet, so I have expertise, I’m gonna go in and start to clean this up. It’s a conundrum, what are you gonna do at this point? We haven’t had the time to develop the kinds of solutions to this issue that might take root.

    And yet the volume of AI slop is huge. The number of people producing it is equally large. And you have to do something. So I think it’s trial and error at this point to see what works. And there will be some negative fallout from some of these actions. But you got to try stuff to take it to the next level and try the next thing.

    @nevillehobson (05:52)

    Yeah,

    I think there’s a what I see is a is a really big issue generally that this highlights and part of it is based on my own experience of editing Wikipedia articles in a couple of cases for an organization working with people like butler inc. Bill Butler has been an interview guest on this podcast a few times, which is

    The speed of things, the one memorable thing that stays in my mind about using Wikipedia or trying to progress change or addition is the humongous length of time it takes with the volunteer editor community. The defense typically is, well, they’re volunteers, they’re not full-time, they’re not employees, they’re not dedicated, they say you’ve got to be patient, they’re doing it for their own free will to help things. I get all that, I’m a volunteer myself in many other areas, but…

    That’s great. But as they themselves are saying, things are moving at light speed with AI slop generation, you can’t afford to have three to four weeks where you you the the person editing is asked the community, is this good? Are you okay with this? What else? And three weeks go by before you get a reply. And often you don’t you have to nudge and so forth that to get that ain’t going to work today. So it needs something better. They have this

    really interesting looking projects called Project AI Cleanup, which is got, it’s well defined in the Wikipedia, on Wikipedia what it is. They’ve also are developing a non AI powered tool called Edit Check. That’s geared towards helping new contributors fall in line with the policies. So part of the problem a lot of the time, I think is the elaborate policies and procedures you’ve got to follow.

    It’s not user friendly for people who don’t know all this. And they do have a history of people not welcoming newbies to it readily. So all that’s in the background. But this is quite interesting, EditCheck, towards helping new contributors fall in line with the policies and writing guidelines. They’re also working on adding a paste check to the tool, which will ask users who’ve pasted a large chunk of text into an article whether they’ve actually written it. So it’s kind of helping that kind of focus.

    I think I get what you say and I don’t disagree by the way on the discovery of things and you know there might be something good here and all this I get all that and I hope that continues but this is urgent this really does require attention and I think the one of the points in the why this matters to communicate this section is

    the big one, think reputation risk. I mean, some of the research I did, this is going back a couple of years now when I was working on a particular project, was the reality that when you, let’s say as a communicator, I think about something related to your employer, your client or some work you’re doing about an organization, the first place you will go to typically is Wikipedia or the first place that shows up in the search results in that old traditional day that we’ve now passed that now, but as it was back then.

    You get your, you know, above the fold screen full of results on Google. And the three things that make you feel this is this I will go to would be ideally the the organization’s website first and foremost. And it might have someone else talking about the organization’s maybe a second, the third result is going to be the Wikipedia entry. And then you have a little box on the right hand side, which summarizes everything and that’s taken from the Wikipedia entry.

    So if you have not updated your Wikipedia entry or it is inaccurate, that’s what’s going to show up there. So getting this right is good. But unfortunately, that won’t work in the day of AI slot because things change so fast. And just wait till agenting AI gets on the case and you got all these bots creating content as well. I think the point about dashes and so forth, that isn’t going to stop anytime soon, I don’t think. And I believe that.

    that presents a big challenge to Wikipedia where you have human verifiers checking things, where this artist has got 15 dashes. Hey, come on, that’s got to be AI generates. So all that kind of thing, they still have to figure out how to do that. I think your point about they got to do something for this. Absolutely. And this is probably the one thing they’re doing, but there’s more they still need to do that.

    is likely to be quite a challenge, speed with, because of the speed how this is evolving so fast. So I think it comes down to, suppose, you the information consumer, the user of the stuff you’re finding, you absolutely need to do your own due diligence more than ever you have done before. Don’t just believe Wikipedia because, hey, it’s Wikipedia, it’s a community generated site. I hear all the stories I’m sure you have about that’s the reason why it’s no good.

    because it’s community generated. I don’t buy that argument. I’ve rarely had any issues. And one thing I do use a lot myself as part of the verification is the talk pages and the history of editing pages. And you get a feel for, you know, what’s been happening here and so forth. Plus, there are services. Butler has one. There’s another one that’s the name escape me, but it was it’s owned by that that guy we interviewed in Israel who was behind that.

    that will, Wiki Alerts, that will notify you whenever a page you’re paying attention to has got changes, tells you what the changes are. So, and Wikipedia itself has some pretty good analytics now and alerts and so forth and so on. So, communicators can help with this as well in reviewing stuff they know about pages and content they know about to make sure that it doesn’t require, it doesn’t have any issues that they should be concerned about. But that’s the regular climate.

    You have AI literacy. communicate this need to know literally or need to know how to get help in recognizing AI generated text. There isn’t a single guide. There’s lots of people with opinions out there. Your own common sense will often help you. Pitfalls like fabricated citations, how do you really check those? Responsible use in professional context. This brings that to the forefront again, like it was originally and you mentioned.

    organizations, schools, academia banning the use of all this back in the 2000s. Now we’re in the 2020s and this is becoming more urgent it seems to

    Shel Holtz (12:05)

    Yeah, and you you talked about the difficulty in having action taken when you propose a change to an article. Where I work, I check our Wikipedia entry. The first time I did, I saw that the earnings hadn’t been updated in about six years. So I left a note saying, I can’t make the change. I’m a representative of the company, but these are earnings from six years ago. Here are our most recent. This is the doc.

    @nevillehobson (12:21)

    Yeah.

    Shel Holtz (12:32)

    I heard crickets. Absolutely nothing. So I wonder if agentic AI might actually be a solution for Wikipedia down the road. When a challenge like that comes up, an agent will go out and find the correct information and maybe send a note to the editor saying I have confirmed this information or I have found this information to be not accurate. Just put that step in there to speed this up.

    The other issue that I think is going to be interesting is that the quality of the output is going to only improve. And where you can tell bad writing from a bad prompt right now. Well, first of all, you’re going to have more people learn how to prompt well, which will make it harder to identify that it was done by AI, especially if somebody takes five minutes to go through it and edit it and make a few revisions. the AI is just going to crank out better stuff.

    @nevillehobson (13:09)

    Thank

    Shel Holtz (13:25)

    as new models appear because the work that they’re doing is just designed to produce better outputs. That’s going to make it harder to find these things. So again, finding a way to identify it and address it has to be top of mind. And the current process, I think is just a first step, what they’re doing now. It’s not going to scale.

    @nevillehobson (13:46)

    No, totally. I agree. So that would help. I agree. But they would have to make significant changes to their structure and the whole policies and procedures set up. So one of the things that is at the front of this and you mentioned what you’d found for your company, I’ve come across as many times as well. It’s no good sending them, you you the representative of the company, you’ve disclosed that fact. Great. Doesn’t matter. You’ve sent them information that is not neutral point of view, no matter

    Shel Holtz (13:54)

    yeah.

    @nevillehobson (14:14)

    how you see it, if it’s your own website, for instance, you could even send them well, this is what it says on the SEC website as well, that might help, but it needs to be and they explained it in excruciating detail what neutral point of view means and how you can provide reliable verifiable sources by a third by reliable and it was I forgot the other word, but the third party in other words is not it’s a totally neutral.

    The famous example I’ve heard so many times is probably in the very early 2000s, a British author asked Wikipedia to correct his entry because his date of birth was wrong. And they said, sorry, you’re not a neutral point of view. We can’t do that. And they refused to make the change. I mean, the absurdity of that struck a lot of people as utterly absurd. But if you actually read the policies on this, it’s not absurd at all.

    Shel Holtz (15:03)

    No, it’s just what the policy is.

    @nevillehobson (15:03)

    So what that author needs to do

    is find a neutral point of view to him or her, tell Wikipedia and provide the source proving it, which could be, here’s a copy of the birth certificate from Somerset House or whatever it was at the time they do. it’s clear that, and again, in the context of communicators, when the first round of guides for PR people from the CIPR back in 2012.

    that the need for this was apparent very, very quickly to educate PR folks particularly who did not grasp that concept of neutrality and neutral point of view. So they’d have to change a lot if ⁓ agentic AI got into the mix there. I think the more pressing realization perhaps is not agentic AI as an ally of you, not at all. It’s a tool for the bad guys to create really questionable content. You’ve never spot that.

    And that’s you need another AI to pay attention. So all these things are probably in the mix there somewhere. The tool they’ve got or the project they’ve got the AI cleanup project, as they’re calling it, is advice, editing advice for community members. It’s a little light on the detail, but I haven’t drilled into all the stuff on the menu that you can go to on how to do this is actually very well thought through this.

    ⁓ where it goes into detail that you wouldn’t think of, know, broken links, how to fix those to make them to not be broken, finding something that does work properly on fixing common mistakes of free images in the Wikipedia, Wikimedia Commons, there’s copyright issues surrounding that. So all that’s in there too. But what strikes me at the end of it, Shell, is that there are smart people thinking about all of this and it’s great. But I’m not sure that

    ⁓ humans doing this alone are going to be able to grasp the speed with which this is is upon us. And that’s where I think also AI Genre, whether it’s agentic or not, I don’t know, could be a huge aid in the kind of here’s something that needs reviewing and checking fast, boom, put your AI on it’s not not a human. Although you’ve got to have the human as the kind of

    ultimate arbiter or something that’s to be removed or not. So it’s a challenge for the very methodology of Wikipedia, it seems to me, and this reliance on community generated content because the bad guys, and I don’t mean this only in that there are people deliberately doing this, but the bad guys embracing bad agentic AI are moving faster than you can think of. And that’s the bigger threat to Wikipedia, it seems to me.

    Shel Holtz (17:39)

    One closing thought, and that is that Wikipedia is the poster child for user generated content, but they’re not the only ones. User generated content has been embraced by a lot of organizations. And if yours is one of them, you’re probably facing at a smaller scale, the same issue with people contributing content that they used AI to produce. And some of it may be quite bad. So you might want to keep an eye on what Wikipedia is doing in order to inform your own.

    @nevillehobson (17:55)

    Yeah.

    Shel Holtz (18:07)

    processes for addressing this problem in your own organization. And that’ll be a 30 for this episode of Four Immediate Release.

    @nevillehobson (18:11)

    Total Sense.

     

    The post FIR #477: Deslopifying Wikipedia appeared first on FIR Podcast Network.

    ...more
    View all episodesView all episodes
    Download on the App Store

    The FIR Podcast Network Everything FeedBy The FIR Podcast Network Everything Feed

    • 4.5
    • 4.5
    • 4.5
    • 4.5
    • 4.5

    4.5

    24 ratings