the core functionality of wikiwho is to parse the complete set of all historical revisions (versions) of a wikipedia article in order to find out who wrote and/or removed which exact text at what point in time. this means that given a specific revision of an article (e.g., the current one) wikiwho can determine for each word and special character which user first introduced that word and if and how it was deleted/reintroduced afterwards. this functionality is not offered by wikipedia as such and wikiwho was shown to perform this task with very high accuracy (~95%) and very efficiently, being the only tool that has been scientifically proven to perform this task that well (cf. the paper).
on top of the generated authorship and change data, other data can be mined and other tools can be build. we have extended the original model to also provide relationships between editors in an article such as "delete" or "reintroduce" based on the word they delete or add. we are currently working on a visualization of these networks as well as other visualization of metrics and word authorship useful for end-users that are interested in exploring the collaborative writing dynamics of wikipedia.
We offer an API for word provenance/authorship for the English Wikipedia: You can get word/token-wise information from which revision what content originated (and thereby which editor originally authored the word) as well as all changes a token was ever subject to. Try the graphical interface at:https://api.wikiwho.net/api/v1.0.0-beta/rev_content/Cologne/?o_rev_id=true&editor=true&token_id=true&out=true&in=true
IF YOU CAN: Let me know if you use it / like it / don't like it / fine any specific errors / want any specific features. Email: f.floeck-youknowwhat-gmail.com
We are continously updating our database with incoming changes to the articles. However, with large articles the delivery of the resulting json can take some time.
CREDIT: Kenan Erdogan, Maribel Acosta, Philipp Singer, Pavan Kumar Pandappa.
the original code plus some variants that contain extensions, especially a new function extracting relations between editors. note that extended versions might include additional computational steps that can lead to higher runtimes than the original. all available under MIT licsense at:
"WikiWho: Precise and Efficient Attribution of Authorship of Revisioned Content" – Fabian Flöck & Maribel Acosta, WWW2014, research track.