ERDNASE

Discuss general aspects of Genii.
Bill Mullins
Posts: 4330
Joined: January 17th, 2008, 12:00 pm
Location: Huntsville, AL

Re: ERDNASE

Postby Bill Mullins » September 10th, 2017, 2:02 am

lybrary wrote:
Bill Mullins wrote:I'd welcome the specific references to the stylometry literature. Particularly since, again, Olsson has a table of the frequency of use of phrases containing personal pronouns on p. 51.

I am not seeing that table. On page 51 is a table for synonyms of 'learn', 'study', etc. No personal pronouns I can see.


I am referring to Table 5.3 on p. 51 of the 2008 edition of his book Forensic Linguistics (2nd edition, 2008). It includes statistics on phrases including "I", "he," "she," "we," and "they." I should have made it clear that I was referring to his textbook.

Read: D. L. Hoover, “Delta prime?,” Literary and Linguistic Computing, vol. 19, no. 4, pp. 477–495, 2004, and D. L. Hoover, “Testing Burrows’s Delta,” Literary and Linguistic Computing, vol. 19, no. 4, pp. 453–475, 2004.


Thanks for the references. But 24 hour rental on these papers is $42 each, so I'll take some time to get them through ILL.

"For example, if the document of interest is a novel written in third person, the distribution of pronouns will be radically different than that of a novel written in first person, not by virtue of an authorship difference, but simply from genre."


But we aren't comparing two novels written in different voices. We are comparing two books designed to give instruction to the reader (as I mentioned in the original discussion.) Erdnase uses the third person to refer to the reader, and Gallaway uses the second person. This isn't an issue of stylometry, it is an observation that the two writers, who are doing the same thing, do so in different linguistic styles.

However, the more fundamental problem is that function word analysis doesn't work for texts of such different subject and 25-30 years apart. Some authors do modify their style over time.


This is an argument against the analysis that Olsson did as well. If the author's style stays the same, you should be able to compare the works based on style. If it evolves, how do you account for it? As near as I can tell, Olsson's conclusions are based on the style being constant between 1902 and 1927.

Bill Mullins
Posts: 4330
Joined: January 17th, 2008, 12:00 pm
Location: Huntsville, AL

Re: ERDNASE

Postby Bill Mullins » September 10th, 2017, 2:48 am

lybrary wrote:You have a completely flawed and incorrect understanding of what stylometry is and what it can do. What you have done is taken an author A (Gallaway), calculated some word frequencies, then taken the document/author in question X (Erdnase), calculated the frequencies of the same words, and then concluded that A is not equal X, or A is unlikely X, or some other statement about the likelihood of A being X.


What I have done is much simpler than that. I have shown that the author of Expert and the author of Estimating for Printers use language in different ways.

Either they are the same author, and there are reasons that they did so, or they are different authors. If they are the same author, two possible reasons for the difference in language are:
1. Authorial style changes over time. If this is the reason, I don't understand how Olsson controls for this in his analysis. Has he only measured time-invariant features? How does he know they are time-invariant?
2. The genre is different. Possible, but I tried to make sure that the facets of language I investigated would not obviously be sensitive to the topic of the work. If I had said, "Erdnase has a lot of words about fingers, and Gallaway has a lot of words about paper, they must be different authors", then this would be an obvious criticism. But there's no reason that I can see that the subject of card table artifice vs. printing would make a difference in addressing the reader in 3rd person vs 2nd person, or the relative usage of "that is," "i.e.," and "viz.", or whether the author refers to himself as "the writer" or "the author".

As far as "but for" and "[Erdnase] [transitive verb] "no" [object]", these strike me as examples of what Olsson called "markedness" in his text: "What we appear to have is simply an uncommon or unusual formulation rather than a non-standard one. [p. 51]"; and as such may be useful in analyzing text.

Here's another difference: the relative use of "center" vs. "middle". Gallaway uses "center" 16 times, and "middle" none. Erdnase uses "middle" over 50 times and "center" once.

User avatar
lybrary
Posts: 677
Joined: March 31st, 2013, 4:59 pm
Contact:

Re: ERDNASE

Postby lybrary » September 10th, 2017, 9:44 am

Bill Mullins wrote:What I have done is much simpler than that. I have shown that the author of Expert and the author of Estimating for Printers use language in different ways.

You are still not understanding the differences and challenges of the various methods used:

1) When you use function word stylometry you are dealing with a lot of noise and only a small signal. Take for example five works by Hoffmann, spread out over time and subject, and then calculate the function word frequencies in all of these books. What you will see is a fairly large variation in the frequencies of many function words, even though all of these books were written by Hoffmann. You can't just take a couple function words, note some variation, and then argue there is a difference. Of course, there is always a difference, even when you look at the same author. That is why one has to use many such words, often hundreds of them, to hope to be able tease out some signal from all that noise. And one always has to compare this against a group of authors, which you fail to do. Take your example with 'i.e.', 'viz', etc. and do the same analysis over a dozen other magic, gambling and print authors and then compare them. Unless you do that your numbers are meaningless noise.

2) When looking at books from two different subject areas, say magic and printing, you do not only have to account for obvious subject words, like your example of fingers versus paper, but you also have to account for industry and subject norms and phraseology. I haven't yet carefully looked at the use of synonyms, but some of the differences you noted could very well be caused by industry standards how things are typically called and not be a choice of the author. But studying synonyms is certainly an interesting area. I have shown that the use of single-/one- and double-/two- is more similar between Gallaway/Erdnase than between Erdnase and other magic and gambling authors. You always have to compare against other authors to get a sense of how significant that particular aspect is. You need to do that in your own examples otherwise they are meaningless.

3) The advantage of looking at rare words and phrases is that one sidesteps a lot of the noise problems, and high dimensionality problems function word stylometry has. With rare words the signal is large and their significance is much more robust against the influences of time and subject. The significance of each rare word can also be estimated. Olsson's Erdnase analysis is a mix of methods. He looked at usage patterns of rare words. He also looked at punctuation and conjunctions, but it is much more nuanced than simply calculating frequencies, setting up a high dimensionality space, and then trying to make a comparison in that space. He uses his sense for language and what he has learned over the decades is significant and what is not. He is not using computational stylometry.

4) Forensic linguistics and authorship attribution requires attention to detail, appreciation of nuances and subtleties. Small changes in any of the input parameters and boundary conditions can cause large changes in the results. Your approach so far is way too black and white, way too simplistic to be of any value. For example the influence of time is complex. Some authors can maintain their style over a long time. Others change it. The influence of different subjects can cause a number of things to change. Again, study the things you notice across a dozen other authors. That will give you an idea if you are looking at noise or if you have discovered a significant aspect. And even if you have found a significant aspect you then need to line up many of them to be able to formulate a strong argument against the already established findings by Olsson and myself that show that Gallaway writes much more similar to Erdnase than all other authors we have looked at.
Lybrary.com https://www.lybrary.com/
preserving magic one book at a time

User avatar
lybrary
Posts: 677
Joined: March 31st, 2013, 4:59 pm
Contact:

Re: ERDNASE

Postby lybrary » September 10th, 2017, 11:05 am

Bill Mullins wrote:This is an argument against the analysis that Olsson did as well. If the author's style stays the same, you should be able to compare the works based on style. If it evolves, how do you account for it? As near as I can tell, Olsson's conclusions are based on the style being constant between 1902 and 1927.

First of all, Olsson's analysis does not only look at style. The use of rare vocabulary doesn't have anything to do with style. The use of religious vocabulary in the prefaces of Erdnase and Gallaway is not an issue of style. It is an issue of background, how they acquired their vocabulary, what other books they were reading, what questions and subjects they were interested in. Very different from style. Some of his tests do include aspects of style, for example where he looks at certain conjunctions and conjunctions together with punctuation. But Olsson did carefully consider the impact of time. For example I remember a call where we were talking about punctuation and in particular the use of semicolons. One thought was to calculate the frequency of semicolons. Olsson noted that the use of semicolons strongly changed over time and is therefore not necessarily an author indicator, but simply reflects how the popularity of its use changed. In the 19th century semicolons were much more heavily used. Today it is quite rare to see them. That is for example why Olsson doesn't compare their use frequencies, which can be heavily impacted by how language changes over time, but he compares if the author used that feature or not. That aspect is much more robust over time.

That expert input and weighing of features and how they are evaluated is one of the important differences of a mechanic function word computational stylometry, where one simply dumps the text in at one end, and then hopes that some sensible result comes out at the other end, and an expert forensic linguist who has decades of experience, who thinks carefully about each feature and how it can or cannot be used.
Lybrary.com https://www.lybrary.com/
preserving magic one book at a time

Bill Mullins
Posts: 4330
Joined: January 17th, 2008, 12:00 pm
Location: Huntsville, AL

Re: ERDNASE

Postby Bill Mullins » September 15th, 2017, 1:54 am

In Lybrary.com's most recent newsletter, Chris says
I was researching the company McKinney & Gallaway which is mentioned in an ad Richard Hatch sent to me. The ad appeared in 1903. There are also incorporation notices in the press about the same company in 1903. Some on the Genii forum argued that this wasn't Edward Gallaway but somebody else.

The "some" would be me; I brought up McKinney & Gallaway in 2008. The Secretary of State of Illinois reported on the incorporation in 1904 here. Note the discrepancy in the capital of the company; the newspaper item I mentioned in 2008 and the ad Chris reports say $2500; the Sec State report says $25,000.

Chris goes on to say
I found the incorporation and dissolution documents for the McKinney & Gallaway company which clearly show that the Gallaway mentioned in the company name is indeed the Edward Gallaway I am researching.


As much as I've criticized him and his theory in the past, it is appropriate now to commend him for putting to rest my earlier speculation. I hope he makes these documents available, as he did the McKinney bankruptcy documents.

The 1908 Lakeside City Directory for Chicago has in its listings:
JAMES McKINNEY CO.
Successors to McKINNEY & GALLAWAY CO.
Printers and Binders
We run our plant night and day
Tel. Harrison-3854
79-81 W. VAN BUREN STREET

User avatar
lybrary
Posts: 677
Joined: March 31st, 2013, 4:59 pm
Contact:

Re: ERDNASE

Postby lybrary » September 15th, 2017, 12:17 pm

Documents will be added to my ebook over the next couple of weeks. I found one other set of incorporation and dissolution documents for another company Edward Gallaway was involved with. Details will also be added to my ebook. Here is something for you guys to discuss. I believe that these $1200 (about $30k in today's money) Gallaway invested in McKinney & Gallaway company in 1903 could very likely be the profit from selling the stock and plates of Expert.
Lybrary.com https://www.lybrary.com/
preserving magic one book at a time


Return to “General”