Leonard Hevia wrote: Political cartoons . . . are stylized art and not meant to be lifelike renditions.
Except in this case, the cartoons of real people are lifelike renditions. (add to the list John James Ingalls, who is sprawled on the ground in front of the cannons.)
Dalrymple did accurate likenesses when he was depicting real people. "Montana" is not an accurate likeness of Sanders, so it wasn't intended to be W. F. Sanders.
W. E. Sanders presumably knew what his father looked like. He would not have perceived "Montana" to have been his father, and there's no reason to think that a decade later, he would have said to Smith that the cartoon referred to his father.The rest of this post is math-heavy, boring, and probably should be skipped. TL;DR: I disagree (once again) with Chris.
lybrary wrote:I am doing it exactly as Mosteller-Wallace.
Not at all.
What MW did (paper online here
) was examine several of the Federalist papers whose authorship was unknown, but presumed to be either Madison or Hamilton, and compared them to works which were known to have been written by Madison, and by Hamilton.
So MW had disputed works, and works of known authorship. We have a disputed work (Expert), and works of known authorship for comparison (Gallaway and Teale). So far, so good.
MW did this:
1. Derived a list of function words that would be used in the analysis. Cross-checked and tested that list to ensure it was valid.
2. Measured the usage rates of these words in the disputed papers (individually), and in the collected works of Madison, and the collected works of Hamilton.
3. Calculated the probability (for the words in the list, as they appear in each of the disputed papers) that an author with Madison's "native" usage rates would have used the word as is was used in each disputed paper, and then did likewise for Hamilton.
4. Look for patterns, draw conclusions.
You did this:
1. Used a single punctuation marker instead of a list of words. Did no tests or cross-checks to determine if this marker is subject to the same statistical patterns as words are, or if it is otherwise valid.
2. Measured the usage rate of the marker in Erdnase (3/52k), Teale (58/85k), and Gallaway (0/30k).
3. Calculated the probability that Teale would have used the marker 58 times if he had Erdnase's native usage rate, and the probability that Gallaway would have used it 0 times if he had Erdnase's native usage rate. (see where you did it differently?)
4. Draw conclusions.
You have worked the problem from the wrong direction. MW calculated to what extent the unknown author wrote like Madison and Hamilton; you calculated to what extent Teale and Gallaway wrote like the unknown author.
You used λErdnase
= 3, and then for Teale and Gallaway, you scaled it so λTeale
= 3*85/52 = 4.9 and λGallaway
= 3*30/52 = 1.72 (that should be 1.73, BTW).
scaled upwards ≈ 0; and
scaled downwards = 0.177.
If you had approached the problem like MW, you would have calculated
(= 58*52/85 = 35.4) = 3.12e-12; and
(= 0 ) = 0.
So, the probability that a person who uses "(?)" 58 times in 85k words (Teale) would use it 3 times in 52k words is small: 3 parts in 10^12.
The probability that a person who never uses "(?)" (Gallaway) would use it 3 times in 52k words is nonexistent.
The takeaways should be:
1. Using "(?)" as a measure of stylistic similarity is unverified, and doesn't really get you anywhere. MW spent a large portion of their paper (pp. 279-286) just determining what markers to use, and how much to weight each of them. Nothing like that was done with "(?)"; we just used it because it stood out to a casual reader.
2. If it is a valid marker to use, however, the probability (based solely on the usage rates of "(?)") that either Teale or Gallaway wrote Expert
is vanishingly small, and zero.
3. But in those extremes, Teale (who uses the marker) is more likely than Gallaway (who doesn't use the marker) to have written a book that occasionally uses the marker (Expert
). Teale beats Gallaway.