Comparison Between Different Types of Speech

The corpus contains two types of speeches. In the chart below, the "Inaugurals" are represented by purple bars and the "State of the Union (SoTU)" are represented by orange bars.

44.95% 8.02% 28.63% 7.3% 34.61% 20.57% 27.27% 14.83% 10% 20% 30% 40% 50% 60% War/Peace Economy Law Sovereignty

The topic percentages are weighted in two ways. First, topic modeling output provides a percentage of which topic the unit of text belongs to. Second, each unit of text is weighted by the total word count in the unit. The following steps were used, for each speech type, to generate the chart above:

With the constant caveat that this is based on a limited corpus, it is not possible to say that different topics were represented in the two types of speeches. However, the results above do show some difference for the speeches in this corpus. The difference is most striking for the "Economy" topic at approximately 12.55% and least obvious for the "Law" topic at only 1.36%.

Comparison Between Presidents

The corpus contains the speeches of 8 presidents. In the charts below, each super-topic is represented by a different color as follows: "War/Peace"; "Economy"; "Law"; "Sovereignty".

Washington War/Peace: 35.7% Economy: 5% Law: 34.04% Sovereignty: 18.89% Madison War/Peace: 45.86% Economy: 5.47% Law: 22.25% Sovereignty: 21.85% Lincoln War/Peace: 26.62% Economy: 9.35% Law: 40.71% Sovereignty: 20.91% Cleveland War/Peace: 21.42% Economy: 18.05% Law: 37.93% Sovereignty: 21.13% Wilson War/Peace: 41.07% Economy: 10.95% Law: 34.78% Sovereignty: 8.31% Roosevelt War/Peace: 60.5% Economy: 11.67% Law: 17.02% Sovereignty: 6.6% Nixon War/Peace: 53.59% Economy: 15.09% Law: 14.73% Sovereignty: 4.35% Obama War/Peace: 29.55% Economy: 52.02% Law: 8.91% Sovereignty: 4.21%

The topic percentages were determined in a very similar method to the type of speech comparison seen above. As can be see from the pie charts above, there are fairly large differences in which topics were prominent for which president.

It's perhaps not surprising that 60.5% of FDR's speeches were considered "War/Peace" topic, with a large portion of his time in office during WWII. Similarly, it is not too surprising to see 52.02% of Obama's speeches focused on "Economy" as he took office during the economic crisis of '08. One of the issues revealed after generating these pie charts is regarding the unused topic (represented in grey). During the process of grouping topics into super-topics, topic 2 was disregarded because it seems to mostly be greetings and salutations. This might be a fair assumption for Cleveland's corpus when it only represent 1.45% of his speeches. However, for Nixon's corpus, it represent a very substantial 12.25%.

One possible way to ameliorate this problem is to use the 25-topic results before grouping them into super-topics. This way the "greetings and salutations" topic would possibly be seperated out more from the rest of the topics and represent its percentages in the corpus better.

The "Sovereignty" (in orange) super-topic also shows some change over time based on the pie charts. The earlier presidents have approximately 20% of their speeches in this category, while the latter half have less than 10%, with Nixon and Obama having less than 5% in this topic. The "Economy" (in purple) super-topic shows, somewhat, of the opposite trend.

Comparison Through Time

When analysing the results of topic modeling, it was initially somewhat baffling why certain key words were grouped into different topics. It was only after taking "time" into account, that this became easier to understand. This concept was well demonstrated in the article, The Language of the State of the Union by Benjamin Schmidt and Mitch Fraas in The Atlantic. Although in their interactive chart, they were graphing words rather than topics, the concept is the same. In two examples of their chart below, by selecting the words "Treasury" or "Budget", it shows when they are most frequently used.


A similar pattern could be seen when looking at topics. For example, at 14 topics, topics 7 and 8 both concern "money" matters. Here are the key words:

Topic # Key Words
7 year wa fiscal expenditure revenue june silver number treasury total increase pension government amount receipt day cent sum money
8 job american year tax business work ve family make time reform million home school economy tonight america back deficit

Topic 7, with key words like "treasury" and "expenditure" were consistently marked for speeches from earlier parts of the corpus. While topic 8, with key words like "economy" and "tax" were more consistently marked for speeches from latter parts of the corpus.