lovingboth: ([default])
[personal profile] lovingboth
Are there any websites / Linux programs for helping determine if the same person wrote two different sets of message board posts?

I can find plagiarism checkers, but these tend to be looking for whether the texts have a common source, rather than a common author. (If I write two texts, one about soft fruit and one about deckchairs, they are liable to pass a plagiarism test, but should still have enough in common for someone to be able to say that the same person wrote both...)

(no subject)

Date: 2012-08-07 06:55 am (UTC)
From: [identity profile] drdoug.livejournal.com
What you're after is stylometrics. One widely used tool that's as good a starting place as any is JGAAP: http://evllabs.com/jgaap/w/index.php/Main_Page

Your problem might be a very hard one, though. These things work best wIth decent corpuses of training text - like dozens of novels' worth. And they can generally only tell you, for a given test text, which of the authors in the training corpus is most likely to have written it - they assume that it must have been one of the authors in the training materials, not any random writer.

(no subject)

Date: 2012-08-07 06:59 am (UTC)
From: [identity profile] drdoug.livejournal.com
Oh, and it's also worth noting that it's even harder to detect impersonation - e.g. determining whether a text is really Dickens or a skilled writer pretending to be Dickens.

Profile

lovingboth: (Default)
Ian

February 2026

S M T W T F S
1234567
891011121314
15161718192021
22232425262728

Most Popular Tags

Active Entries

Style Credit

Expand Cut Tags

No cut tags