lybrary wrote: You are looking at fine grained symmetry which is immaterial here. As I wrote earlier, you can apply the math, but you do not understand what it calculates. The method is obviously symmetric because if "A wrote B" then it follows that "B wrote A". That the specific numbers for each function word individually do change is simply a fact of the shape of the Poisson distribution which is not symmetric. But this asymmetry of the Poisson distribution does not change the symmetry of the MoWa method as a whole. If MoWa would give different results if applied from both ends then the method would be complete non-sense, which it is not. It has been shown to work surprisingly well in several cases.
You are confusing the conclusion made by the user of the method, with the results of the method. Obviously, if A wrote B, then B wrote A. That's not an insight into MW's methods, it's a tautology.
But the methodology described by MW doesn't give you "A wrote B". It gives you a probability number. MW didn't decide that Madison wrote the disputed papers because that was the output of their method. The applied the method many times to multiple words, taken from large groups of texts, and got numerous probabilities. Only after examining all of these probabilities, and subjecting them to the Bayesian Analysis (which is the real point of their paper -- the demonstration of Bayesian Analysis on a statistical problem), do they come to the conclusion that Madison wrote the papers.
The case that's causing all the discussion, whether or not Erdnase is more similar to Gallaway than Teale, based on (?), doesn't have multiple data points. There's just the single marker -- (?). And applying the MW method to it in one direction (the correct one) suggests that Erdnase writes more like Teale than Gallaway (in that they both have non-zero usages of (?).) The actual probability that he would use it the way he did if his baseline was that of Teale is still quite small, though, so from the single test, it suggests that they are not the same. If you apply it in the wrong direction (like you did), it suggests that Gallaway writes more like Erdnase than Teale does. But the probability is still small, so it also suggest that they are not the same (again, the conclusion is symmetric, but not the output of the calculations).
Look at that again. Applying it in one direction tells you which of two authors a disputed author writes more like. Applying it in the other direction tells you which of two authors writes more like a disputed author. Two different, similar but non-symmetrical questions. The "amount" that Erdnase writes like Teale is not necessarily the same as the "amount" that Teale writes like Erdnase.
(I suspect that there are, but haven't found any, data sets that would indicate that it is likely that A wrote like B (probability > 50%), but it is unlikely that B wrote like A (probability < 50%). Such a situation would clearly be non-symmetric.)
Off on a tangent:
In Fred Mosteller's autobiography, The Pleasures of Statistics
, is a chapter devoted to magic. It says:
When I worked in New York City in the 1940s, I came across an erudite book by Erdnase on card magic, or possibly how to cheat at cards. Jimmie Savage [a colleague] wanted to learn a little about magic, and so I lent him this book. Jimmie loved it for two reasons. First, the author said he wrote the book because he needed the money. Second, Erdnase treats each problem much like a mathematics text would. He has names for various devices such as false shuffling (riffling cards but leaving them in the original order), and he tells the reader exactly the sequence of use of these devices to produce the desired result. For anyone who hadn’t learned to manipulate cards at mother’s knee, too much skill is required. But for those who have it, Erdnase potentially moves their performance up many notches.
Too bad he wasted all that time on The Federalist Papers
when, if things had gone just a little differently, he could have told us who wrote EATCT