Monday, July 14, 2014

Nothing nefarious in anonymous Norwegian government Wikipedia edits.

There was a post on BoingBoing referencing a list of anonymous Wikipedia edits from Norwegian government IP addresses.  Cory Doctorow wanted the anonymous editors and the Norwegian Givernment to go public and involve the Norwegian wikipedia community in their editing and left a touch of a suspicious tone to the request.  Subsequent commenters tried to figure out whether there was an issue by randomly exploring some of the data.  The original supplier of the data merely plots edits as a function of time, but didn't show more work to describe the data and see if there is a story worth telling.  Plugging that data into Tableau lets us look at many variables vs. time and to look at the articles that were edited as well as the times and IP address from which they were edited.  What follows is my analysis in the BoingBoing comments.

Without the appropriate data set I can only speculate, but this number of edits may even be the average number of edits for an organization of the Norwegian government's size. Also might be interesting to see the total number of edits on these titles vs. the Norwegian government edits. The poster did say that it was only the anonymous edits. The view below suggests that rather than posting at lunchtime, there are peaks at first thing in the morning and 1pm (maybe lunch is at 1?) and that editing seems to occur throughout the workday, though there are variations between years (not shown). 

 As to the topics, the top few that I see that possibly could be more open are the edit for Jan Egeland, and Karin Yrvin, both Labour party politicians. What is the etiquette over editing (or having staff edit) your own wikipedia article again? But then the second highest edit over these years is a template for football club odds. Other non government related edits are a quiz show QuizDan, and some organ enthusiast editing the Vox Humana topic 7 times in 2009. Titles with more than 4 edits are shown below, these 40 titles with 271 edits represent less than 20% of the 1437 total edits for all 919 titles. 

 There are only 35 total IP addresses, 29 from government and 6 from Parliament. On the chart below thin are Parliament and thick are government offices IPs.

I agree there is nothing nefarious going on here, but a few charts more than a plot of total edits can help to show that.

Just a few plots of variables vs. other variables to find relationships and tabulate some quantities shows more than a boring plot of amount vs. time.  One could go further and classify the articles and perhaps include the total edits.  It really depends on the questions you are trying to answer or the hypothesis you are investigating.

No comments: