Skip to main content

Advances, Systems and Applications

Time-series topic analysis using singular spectrum transformation for detecting political business cycles

Abstract

Herein, we present a novel topic variation detection method that combines a topic extraction method and a change-point detection method. It extracts topics from time-series text data as the feature of each time and detects change points from the changing patterns of the extracted topics. We applied this method to analyze the valuable, albeit underutilized, text dataset containing the Japanese Prime Minister’s (PM’s) detailed daily activities for over 32 years. The proposed method and data provide novel insights into the empirical analyses of political business cycles, which is a classical issue in economics and political science. For instance, as our approach enables us to directly observe and analyze the PM’s actions, it can overcome the empirical challenges encountered by previous research owing to the unobservability of the PM’s behavior. Our empirical observations are primarily consistent with recent theoretical developments regarding this topic. Despite limitations, by employing a completely novel method and dataset, our approach enhances our understanding and provides new insights into this classic issue.

Introduction

An increasing number of studies in political science and economics have adopted advanced text mining methods to analyze large-scale text datasets. Many of them have successfully provided novel insights into classical issues in these fields. This study aims to contribute in a similar scope. By extending our earlier paper (Kato et al. [18]), we herein propose a new topic variation detection method and apply it to analyze a valuable, albeit underutilized, text dataset containing the Japanese Prime Minister’s (PM’s) detailed daily schedule for over 32 years. We aim to enhance our understanding of the empirical analysis of the political business cycle (PBC), which is a classic issue that economists and political scientists have been addressing for over four decades (Dubois [5]).

Research on the PBC was initiated with the pioneering work of William Nordhaus [28]. The PBC theory posits that an incumbent government will stimulate an economy immediately before an election to increase its chances of being reelected. Earlier theoretical and empirical studies on the PBC, including that of Nordhaus, focused on presidential democracies, where the date of election is fixed exogenously. However, the majority of present democratic states have a parliamentary system, in which the election date is endogenously determined by the PM’s decision to dissolve the legislature and call for an early election (Schleiter & Travits [34]). In parliamentary democracies, the PBC is a more complex issue. To increase the chances of winning an election, the PM can either “manipulate” the economy prior to an election, as the standard PBC theory indicates (manipulative hypothesis), or call an early election to “surf”’ on favorable economic conditions (surfing hypothesis) (Inoguchi [12]; Kayser [19]).

Past empirical analyses of the PBC with endogenous election timing (EET) that examined whether the manipulative or surfing hypothesis is valid have encountered several empirical challenges. The core of these challenges was the unobservability of the PM’s behavior. For example, even when an apparent correlation exists between election timing and favorable economic conditions (e.g., low unemployment, high income growth), researchers encounter challenges in determining whether the PM has manipulated or surfed.

In this study, we used a novel dataset and method to overcome the empirical challenges encountered by previous research. The primary dataset used in this study was a text dataset that tracked the Japanese PM’s schedule, where he went and whom he met—365 days a year for over 32 years. Known as shushō dōsei [the PM’s movement] in Japanese, it is compiled by reporters who are officially permitted to record the PM’s detailed daily activities (Nippon Hōsō Kyōkai [27]). This dataset provides researchers with a rare opportunity to directly and systematically observe the actions of the head of the Japanese government, who generally is the concurrent leader of the ruling party. However, its use has been limited to sporadic references in qualitative research, owing to inadequate advanced research into Japanese politics utilizing textual data. This study is the first to extensively apply machine-learning techniques to this highly valuable dataset. In addition, we present a new topic variation detection method to analyze the shushō dōsei dataset. The new method combines a topic extraction method with change-point detection. This enables us to extract topics from time-series text data as a feature of time and simultaneously detect change points from the patterns of the topics. We used this method to empirically examine whether the PM had manipulated or surfed in past elections. This approach enabled us to overcome the challenges encountered in past empirical studies regarding the PBC with EET because it could be used to directly observe and analyze the actions of the PM. Furthermore, it enabled us to empirically assess the PM’s complex and strategic decision-making implied by recent theoretical arguments regarding this topic.

The main contributions of this study are as follows.

  • We propose a new topic variation detection method for time-series text data. Our proposed method comprises feature extraction represented as topics and change point detection focusing on the topic patterns.

  • Using our proposed method, we empirically examined whether PBCs occur in Japan by analyzing a valuable, albeit underutilized, time-series text data recording the Japanese PM’s daily activities. This approach can be used to examine the PM’s actions directly, helping to overcome the empirical challenges encountered in past studies.

The remainder of this paper is organized as follows. In next section, we present a problem definition of the PBCs in parliamentary democracies. An overview of related studies is presented in the following section. Next, we present our method, a topic variation detection method for time-series text data. We then apply our method to the shushō dōsei dataset. In the next section, we empirically assess if and how PBCs occur in Japan. We conclude by indicating the scope for future research.

Problem definition

Past empirical analyses of the PBC with EET to examine whether the PM has manipulated or surfed encountered several problems. The first is a causal-relation problem. Even when there is a coincidence between the election timing and favorable economic conditions for the incumbent, it is challenging for researchers to determine whether the economic conditions have motivated the PM to call for an early election (surfing hypothesis) or whether the PM has manipulated the economy to realize favorable conditions at the time of election (manipulative hypothesis) (Rogoff & Sibelt [31]).

The second empirical problem is a measurement problem. How can one measure the PM’s manipulation? In earlier studies on the PBC, researchers used macroeconomic outcomes, such as gross domestic product (GDP) growth, unemployment, and inflation rates as proxies for manipulation. However, it is uncertain whether the PM can control such macroeconomic outcomes using manipulation tools at his disposal. More recent studies measure manipulation through the use of economic policy tools, such as taxes (Yoo [39]) and government spending (Rogoff & Sibelt [31]; Kohno & Nishizawa [20]). Although the PM has more direct influence over those policy tools than macroeconomic outcomes, it is still uncertain whether he has complete control over them. Various studies regarding policy processes, including those of Japanese politics (e.g., Johnson [16], Mabuchi [22]), revealed that the PM’s actions and actual use of those policy tools are not directly linked. For example, bureaucratic autonomy should severely hinder the PM’s influence over those tools. Therefore, there should exist cases when the PM attempts to manipulate, but fails to actually activate, macroeconomic tools or influence macroeconomic outcomes. However, past empirical studies could not identify such cases.

The third problem is the assumption of most of the past empirical research that presuppose that the PM always surfs or manipulates in every election. However, in line with recent theoretical research that employs dynamic optimization models (Kayser [19], Kato & Inui [17]), it is more natural to assume that the PM strategically chooses to surf or manipulate depending on specific circumstances.

The root cause of these empirical problems arises from the challenge of directly observing the PM’s actions. As indicated in Fig. 1, lack of data that directly tracks the PM’s actions motivated previous researchers to gather political and economic data near election times and infer the PM’s actions—whether he has always manipulated or surfed. The method and data of this study, as indicated in Fig. 1, are advantageous over those of previous studies in terms of their capability to directly track and analyze the actions of the PM, who is the primary decision maker near election time. It is a novel approach to a classic issue in political science and economics, which can overcome a number of serious empirical problems encountered in past empirical studies.

Fig. 1
figure 1

Diagram of previous studies and the current one

As a baseline to compare and evaluate our approach, we used the results of past research as a baseline to compare and evaluate our approach. For example, most of the earlier studies assumed that the PM has always manipulated or surfed in every election. Instead, our approach allows us to categorize whether the PM surfed or manipulated in each election. We evaluated our approach by assessing, compared to past studies, how well our results fit the expectations of state-of-the-art theoretical works and realistic anticipations. As for methodological problems such as the causal inference problem that cannot be assessed quantitatively, we describe how our approach could overcome such problems relative to previous studies.

Related studies

Related studies on PBC

Previous empirical studies of the PBC with EET have attempted to investigate whether the manipulative or surfing hypothesis is valid in parliamentary democracies, such as those in Japan (Ito & Park [13], Ito [14], Kohno & Nishizawa [20], Cargill & Hutchison [3]), the UK (Smith [36], Smith [37]), and India (Chowdhury [4]). They used macroeconomic indicators and policy tools as proxies for the PM’s actions and analyzed the relationship between the proxies and electoral timing. Although the results are varied, they primarily support the surfing hypothesis. That is, a typical PBC in line with the manipulation hypothesis is often not observed in parliamentary democracies.Footnote 1

On the theoretical side, a limited number of formal analyses have been conducted regarding this topic. Recent theoretical analyses, such as those by Kayser [19], modeled the PBC with EET as a dynamic optimization problem for the PM and examined when the PM has manipulated or surfed. In contrast to previous empirical studies that implicitly assumed that the PM has always manipulated or always surfed in every election, recent theoretical developments illustrate that the PM strategically opts to manipulate or surf depending on the political or economic conditions encountered (Saito [32]; Kato & Inui [17]).

This study is the first to attempt analyzing a text dataset that directly observes the PM’s behavior. By directly observing the PM’s actions, it overcomes a few of the empirical problems that past research encountered in previous studies.

Related studies on topic variation detection

Text segmentation segments texts into topically related units (e.g., Hearst [7], Sun et al. [38], Eisenstein [6], Riedl & Biemann [30], and Jameel & Lam [15]). The method regards a text as a sequence of subtopics, and when it detects a change in the subtopic within a text, it creates a new segment. Therefore, text segmentation can be considered a detection method to determine where segments should be placed within each text. Meanwhile, our method focuses on each subtopic and detects a change in pattern in the appearance of each subtopic.

Studies on topic bursts are related to the method introduced herein. Mane and Börner [23] used Kleinberg’s burst detection algorithm to identify topics that experienced an abrupt increase in usage. They named such abrupt increases as “topic bursts.” Their method can be considered an approach for anomaly detection of topic usage, focusing primarily on when the topic share increases. In contrast, our method seeks to detect not only the increase in topic share but also patterns in the appearance of a topic. Studies on a time-dependent topic model (e.g., Hong et al. [10]) are also related to our model. Whereas their method examines an association among each topic by a time-dependent function that characterizes its trend over time, our method focuses on detecting change points of patterns of each topic.

Topic variation detection method for time-series text data

In this section, we propose a new method—topic variation detection (TVD) for time-series text data. We applied it to our analysis in this study. This method detects changes in patterns of contents in text data delivered in a time series. It first extracts topics from time-series text data as features of each time using latent Dirichlet allocation (LDA). Subsequently, it detects a change point where patterns of the extracted features change using singular spectrum transformation (SST). TVD enables us to detect changes in the semantic patterns of topics in time-series text data.

We first describe the fundamental concept of TVD, we then briefly explain LDA and SST, that is, the two methods we used for TVD.

Basic concept

Our new method, TVD, combines LDA (Blei, Ng, & Jordan [1]) to extract topics from text data and SST (Ide & Inoue [11]) to detect changes in topic patterns. It seeks to detect change points in the pattern of events written in text data delivered in a time series, such as daily reports. A simple method to achieve this is to tabulate the frequency of occurrence of certain keywords in a time series. Such an approach is effective when text data delivered in a time series consistently express similar contents with similar words. It is not effective, however, when the text data delivered in a time series include various keywords and content. Another challenge for the simple method is to visually verify word-frequency oscillations on a graph. By combining LDA and SST, TVD enables us to more systematically and clearly detect patterns of events from time-series text data with varieties of keywords and contents. By applying TVD to analyze underutilized shushō dōsei dataset, we introduced a novel approach to a classic issue of political science and economics known as PBC.

LDA and SST

In this section, we briefly summarize LDA and SST, which we combined in the TVD method.

LDA

LDA (Blei, Ng, & Jordan [1]) is a topic model that extracts topics from a document set. In LDA, documents can be represented as combinations of latent topics, where each topic is characterized by the distribution of words. LDA assumes that documents are generated in the process of selecting topics, according to the topic distribution within a document, and selecting keywords according to the word distribution within a topic. Letting N be the number of words in a document, the variable names are defined as follows: α is the K-th dimensional hyperparameter of the topic prior to distribution, β is the parameter of word distribution, θ is the topic distribution, z is the topic set, and w is the word set. When α and β are specified, the mixed distribution θ of topics, topic set z, and word set w are represented as follows:

$$ p\left(\theta, \boldsymbol{z},\boldsymbol{w}|\alpha, \beta \right)=p\left(\theta |\alpha \right)\prod \limits_{n=1}^Np\left({z}_n|\theta \right)p\left({w}_n|{z}_n,\beta \right). $$

Here, θ and β are potential parameters. When the document is observed, one can estimate the topic(s) of the document by estimating θ and β.

SST

SST (Idé & Inoue [11]) detects change point(s) by calculating the changing point score z(t) at each time. SST can be considered as a transformation from a time series T to a new time series Tc.

For a more formal description, first, let the time series data S = {x(1), x(2), x(3)…}. The SST is realized using the following procedure:

  1. 1.

    w past data for time t are prepared, and vector s(t − 1) is expressed by arranging them. Therefore, s(t − 1) = (x(t − w), …x(t − 1))T. These column vectors are arranged to form the matrix Ht. Here, Ht = [s(t − n), …, s(t − 2), s(t − 1)].

  2. 2.

    Similarly, matrix Ht + L is formed by shifting the time series by L toward the future. Ht + L = [s(t + L + w − 1), …, s(t + L + w + m − 2), s(t + L + w + m − 1)]. The singular value decomposition (SVD) for each matrix Ht and Ht + L is calculated to determine the left singular vector.

  3. 3.

    The changing point score z(t) is calculated, which represents the degree of difference between the two left singular vectors calculated in step 2.

  4. 4.

    The changing point score z(t) of each time is calculated by sliding time and executing steps 1–3. A point with a high value of z(t) is regarded as a change point.

Applying the topic variation detection model

In this section, we first introduce the primary dataset used in this study, shushō dōsei. Subsequently, we apply TVD to the data.

Data

The primary dataset to which we applied the TVD method for analysis in this study was time-series text data called shushō dōsei. It tracks the Japanese PM’s schedule daily (see Fig. 2 for a sample). While the list was launched around 1970, this paper focuses on the period from July 1, 1986 to November 30, 2018, for which electronic data from the Nihon Keizai Shimbun (Nikkei) is currently available.

Fig. 2
figure 2

Prime Minister’s schedule, (source: Shushō kantei in Nihon Keizai Shimbun, 2019, February 27)

Shushō dōsei is a succinct summary (frequently missing verbs) of the PM’s activities that accurately lists the names and affiliations of those he meets and the places he visits. For example, the entry for January 21, 2009 reads as follows: “8:31 Upper House Budget Committee. Brief exchange later with the committee’s Chief Director Iwanaga. Nouns are the most important elements of these entries, and “latent semantic features” can be discerned from the co-occurrence of certain names and topics. This study analyzes similar occurrences by grouping the entries into documents encompassing a period of 7 days each.

Because the PM’s schedule is extremely tight, the daily allocation of his time should ex ante reveal his strategic priorities. The right to dissolve the Diet is an exclusive prerogative of the PM, and selecting the election date may be “the most important single decision” by the PM (Newton [26]). We thus assume that indications for crucial decisions, such as pre-electoral macroeconomic stimulations closely related to the call of early elections can be detected by analyzing the PM’s daily schedule. Hence, an analysis of the PM’s daily schedule offers researchers an invaluable opportunity to directly observe the PM’s manipulating or surfing behavior while declaring an early election.

Implementation of topic variation detection

We applied TVD to the shushō dōsei dataset. Because days (e.g., holidays) when no political action is undertaken exist, we organized 1 week (7 days) of text data as a set of data and let D = {d1, d2, d3, …, dn} be a time-series of textual documents. Each document di comprises the PM’s seven-day schedule, which appears in shushō dōsei.

By applying TVD to shushō dōsei, we aim to detect change points in the PM’s behavioral patterns that appear in the PM’s weekly schedule D. The formal procedure for implementing TVD is as follows:

  1. 1.

    By extracting K topics from all the documents in D through LDA, we obtained a K-dimensional multivariate time-series V = {v1, v2, v3, …, vn}. Here, each element of vector vi is a share of the document topics di.

  2. 2.

    Considering a target element of all the vectors vi in V, we constructed a real-valued time-series x = {x(1), x(2), x(3), …, x(n)}. Here, x corresponds to a time series of the share of an extracted topic.

  3. 3.

    We transformed x to change point scores s = {s(1), s(2), s(3), …, s(n)} through the SST.

  4. 4.

    By setting and applying the criteria to assess the change point scores of each peak point, we detected the topic variation.

Figure 3 displays an overview of the TVD. Procedures 1 and 2 are performed to extract topics from documents and display the time-series share of the topics. In this study, we extracted the PM’s behavioral patterns as a real-valued time-series on K topics. Procedures 3 and 4 are performed to detect changes in the topic patterns. When the extracted real-valued time-series exhibits high oscillation and superficial random movements, it is difficult to detect a change in the topic pattern. SST transforms the real-valued time-series of topics to a time-series of change point scores, which reveals the extent of change in the topic pattern. By generating change point scores, SST enables us to create a more systematic criterion to detect the changes. For example, in the later section of this paper, we describe a criterion that we created to assess whether the PM has manipulated the economy.

Fig. 3
figure 3

Overview of topic variation detection (TVD) method for time-series text data

When employing TVD, it is necessary to be cognizant at every step of the preprocessing and hyperparameter setting. During step 1, topic extraction consistent with LDA was performed. To clarify the topics before affecting the LDA, the words that appear frequently and stop words are deleted from the documents. In LDA, since K topics and the two parameters, β (the word distribution parameter) and θ (the topic distribution parameter) significantly affect the topic extraction, further adjustments are necessary. If the topic extraction is successful in step 1, then the topics of interest can be determined in step 2. In step 3, to detect the change accurately, hyperparameters that appear via SST are selected carefully. In step 4, the criteria are confirmed according to the problem, and the robustness of these criteria is verified. In subsequent sections, we will further investigate these steps with greater precision as we engage in an empirical assessment of detecting PBCs.

Empirical assessment

In this section, we assess the empirical results obtained by applying TVD to the shushō dōsei dataset. We first overview our experimental system and examine possible applications to a cloud environment, and in subsequent sections we start our actual empirical assessments. We extract 11 topics using LDA and identify topics that represent the PM’s use of economic policy. Next, we transform the topic share to change the point score by SST. Finally, we create a criterion based on both the change point score and the increased rate of topic share to assess when the PM has manipulated. Using the criterion, we categorized each Japanese national election held during the time span of the data as either “manipulation,” “surfing,” or “other.” We compared and assessed our results with previous empirical and theoretical studies regarding PBC with ETT and evaluated how our models fared against them.

Implementation and application to a cloud environment

In this subsection, we first show how we implemented the methods described later to develop our experimental system. We then explore the possible application of our experimental system to a cloud environment.

Overview of experimental system

Table 1 displays the specifications of the hardware and software used to implement our experimental system. Figure 4 shows an overview of the experimental system. As shown in Fig. 4, the experimental system consists of six modules: 1) a morphological analysis module, 2) a stop word elimination module, 3) a topic extraction module, 4) a word cloud creation module, 5) a change point detection module, and 6) a visualization module.

Table 1 The specifications for implementation of the experimental system
Fig. 4
figure 4

Overview of the experimental system

The initial dataset we used in this experiment is expressed here as the shushō dōsei database. It covers the PM’s daily activities from July 1, 1986 to November 30, 2018. The shushō dōsei database contains an average of 39.6 words per daily entry and 444,852 words in total. The database was divided into individual words in the morphological analysis module. The next module—stop word elimination module—removed unnecessary words for topic extraction, such as stop words, high-frequency words, and low-frequency words. These first two modules represent the preprocessing stage of topic extraction. The third module—topic extraction module—extracted topics from the preprocessed data, which were displayed as word clouds in the word cloud creation module and, at the same time, stored in the time-series topic database. Finally, the change point scores of each topic in the time-series topic database were visually displayed via the change point detection module and visualization module. Overall, these processes enabled us to extract topics from the shushō dōsei database and, concurrently, visually convey their change points.

The complexity of our Python code can be shown by focusing on our TVD algorithm, the core of our experimental system. TVD consists of two steps. In the first step, K topics are extracted from each document d in the D list of the PM’s daily schedule using the LDA method. We used the Python scikit-learn package to implement LDA. According to Hoffman et al. [8], the time complexity of an LDA calculation is O(NLK) for each document d, where N is the total time points of the PM’s daily schedule, L is the number of unique words, and K is the total number of topics. In the second step, we used the SST method to calculate the change point score of each topic sorted as time series data. To do so, we calculated three SVDs for each time point of every time series. Based on the Goub-Van Load method, the complexity of SVD is O(w2n) (Hogben [9]). Thus, the time complexity for a change point is O(NKw2n), where K is the number of topics, N is the length of each time series, w is the window size, and n is the column vector of the SVD matrix. Table 2 shows the actual batch processing time for preprocessing, LDA, and SST using the shushō dōsei database.

Table 2 Batch processing time for preprocessing, LDA, and SST

Possible application to a cloud environment

Our experimental system can potentially be further developed by placing 4) the word cloud creation module and 6) visualization module as local applications, 1) the morphological analysis module, 2) stop word elimination module, 3) topic extraction module, and 5) change point detection module as an application programming interface (API) on a cloud system. Since shushō dōsei is updated every night, around 11:00 PM, on the website of the Nihon Keizai Shimbun (the Nikkei newspaper)Footnote 2; instead of storing shushō dōsei data through a user interface (UI), as shown in Fig. 4, we can create a shushō dōsei data scraping module to perform this task automatically. By going through the process depicted in Fig. 4 and calculating the change point score of each topic using the daily scraped shushō dōsei data, we can detect changes in the PM’s daily activities. Table 2 shows that it can be batch processed every day within a reasonable timeframe.

Topic extraction

We first extracted nouns from shushō dōsei using Python with MeCab (Kudo, Yamamoto, & Matsumoto [21]) and mecab-ipadic-Neologd. We used Python 3.5.6, Mecab 0.996, Gensim 3.6.0, and mecab-ipadic-NEologd (Sato [33]) with seed file v0.0.6 for our implementation.

Adjusting the hyperparameter for LDA

Our corpus was created using Python with Gensim (Rehurek & Sojka [29]). We excluded stop, rare, and high-frequency words and conducted stemming before applying LDA to the dataset. Having conducted a morphological analysis using MeCab, the words obtained from the list of Japanese stop words developed in the SlothLib project were deleted from the documents.Footnote 3 Both words with high frequency that appeared in over 30% of documents and words with extremely low frequency that appeared in less than 10 documents were removed. To determine the number of topics K, we used the perplexity and coherence values. The perplexity index is widely used to evaluate the predictiveness of the models. The coherence index signifies the human interpretability of the models (Mimno et al. [25]). We used the UMass Coherence from among several different indices for evaluating coherence. However, perplexity and coherence indices do not necessarily match one another. Therefore, we selected topics that are regarded as having a definite meaning while considering the balance between the two indexes. It is important to adjust β and θ because topics are affected by both word and topic distributions. We first set β and θ to auto, with only K topics allowed to change. However, because we could not obtain clear topics, we subsequently conducted a grid search to obtain the appropriate values for K, β, and θ that exhibit good perplexity and coherence. To realize a balance between perplexity and coherence, we examined the actual topics sequentially and selected K = 11, β = 2, and θ = 0.09. Figure 5 shows the perplexity and coherence scores for each K.

Fig. 5
figure 5

Perplexity and coherence indexes with different numbers of topics K

Election related topics and their appropriateness

Before starting our empirical analysis, we first evaluated the appropriateness of the model generated by LDA. We examined whether the frequency of an election-related topic matched with the actual election dates. Among the 11 topics, we selected a topic that consisted of words related to last-minute election campaigns. The word cloud of this topic is shown in Fig. 6. It presents the frequency of words, such as “outdoor speech,” “street speech,” “TV program,” “Tokyo Station,” “ANA,” “JAL,” and “TV appearance.”Footnote 4

Fig. 6
figure 6

Word cloud of election topic 1 (last-minute election campaign)

Figure 7 illustrates the variations in the share of this topic through the timespan of the dataset we used. The dotted lines represent the dates when the Lower House elections occurred. As shown in Fig. 7, a large share of the topic matches the Lower House election dates. Other high-share points matched with other election dates (e.g., the Upper House and local elections).

Fig. 7
figure 7

Topic share of election topic 1 (last-minute election campaign)

The next election-related topic extracted by LDA consists of words related to electoral preparations. Figure 8 shows the word cloud of this topic, including words such as “candidacy,” “electoral affairs office (of the party),” “campaign office (of the party),” and “support group (koenkai).Footnote 5” The share of each document’s topic is shown in Fig. 9. As shown, the topic share increases slightly differently from the election dates. Such a movement of topic shares implies that it contains information that differs from that of the last-minute election campaign topics. From these results on election-related topics, we can reasonably conclude that LDA has appropriately extracted topics from shushō dōsei.

Fig. 8
figure 8

Word cloud of election topic 2 (Electoral preparations)

Fig. 9
figure 9

Topic share of election topic 2 (Electoral preparations)

Other topics

Among the 11 topics that we extracted by LDA, two were the election-related topics described above and five were topics related to economy and finance, which we will analyze in detail in the next subsection. Figure 10 shows the word clouds of the other four topics, i.e., “internal affairs,” “national security,” “Western diplomacy,” and “Asian diplomacy.” Words in each topic are primarily coherent, and each topic represents important policy areas of the Japanese government. We believe that these four topics, which are reasonable, further confirm that LDA has appropriately extracted topics from shushō dōsei. The next section presents our main analysis, in which we examine whether the PM has surfed or manipulated prior to the election.

Fig. 10
figure 10

Word clouds of other topics

Economic policy topic

When the PM seeks to manipulate the economy prior to an election to increase the incumbent’s chance of winning, he is likely to allocate his limited time to activities related to economic and fiscal policy. Among the 11 topics extracted by LDA, five were topics related to economy and finance. Because it was challenging to distinguish among the five topics, we merged them into one topic by following the procedures adopted by past political science research using LDA (Martin & McCrain [24]) and termed it “economic policy topic.”

Figure 11 shows the word cloud of economic policy topics. Words such as “finance,” “Cabinet Office,”Footnote 6 “Council of Fiscal and Monetary Policy,” “financial services,” “Ministry of Finance,” and “Ministry of Economy, Trade, and Industry” appear frequently.

Fig. 11
figure 11

Word cloud of economic policy topic

Figure 12 shows the share of the economic policy topic within documents for the timespan of the dataset. Because the variance of the share of economic policy topics is extremely high, it is challenging to visually detect the change patterns of the share nearer to the election dates.

Fig. 12
figure 12

Topic share of economic policy topic

Change point score

We applied SST to detect the PM’s manipulation through pattern changes in the economic policy topic. As stated previously, a change point score is marginal when the change is negligible compared with past patterns and is large when a pattern differs substantially from those of the past.

To apply SST, we must first adjust the hyperparameters of the SST. When we select a small value for the parameter w, known as the window size, the sensitivity of SST increases. Such a model with a small w is suitable for investigating changes over a short time span. In contrast, when the parameter w is set as a large value, the sensitivity becomes lower and the model is suitable for measuring changes over a more extended period. As an election date approaches, the share of election-related topics increased significantly, with the share of other topics suppressed. We adjusted the SST parameters to observe changes in topics other than election-related ones at the time of elections. We performed a grid search to fix an appropriate w for SST. We selected an optimized w which had the maximum sum of change scores immediately before each election on economic topic.

Figure 13 shows the share and change point scores of the economic policy topic before and after the Lower House election that was held on February 18, 1990. As shown in Fig. 13, near the points where the share of the economic policy topic is relatively stable, the change point score is marginal. Conversely, when the share varies substantially, the change point score is large. Because the change point score quantifies the share variation of the economic policy topic, we can define an objective criterion of likely economic policy manipulation. The Appendix shows the topic share of economic policy topics for periods near other elections.

Fig. 13
figure 13

Topic share and change point score of economic policy topics

Empirical results and implications

To empirically assess whether the PM has surfed or manipulated, we first examined, in each election, whether the economic policy topic has changed substantially before it was apparent that the PM would call an early election. If yes, it implies that the PM has conducted a pre-election manipulation of the economy, consistent with the manipulative hypothesis. If no, it implies that the PM has either surfed on favorable economic conditions (consistent with the surfing hypothesis), or that other political economic conditions compelled the PM to call for an early election.

We used a change point score to detect significant changes in the patterns of economic policy topics. By assessing a change point score 45 days prior to the call for an early election was apparent in the media, we checked whether peak point(s) existed in the timespan. When it did, we also examined whether the share of the economic policy topic increased near the peak point(s). With this assessment, we categorized 10 elections into the following two groups: 1) elections where the economic policy topic increased at the peak point(s) (manipulation group); 2) elections where peak point(s) did not exist or the economic policy topic decreased at the peak point(s) (non-manipulation group).

More formally, our criterion for categorizing elections with prior manipulation and non-manipulation is as follows:

Criterion

Let day t be less than N days prior to the call for an early election in the media, let f(t) be the share of economic policy topic on day t, and let g(t) be the change point score on day t. We categorize an election as “manipulation” if the following three conditions are satisfied on day t:

  1. 1.

    g(t) > g(t - 1), and g(t) > g(t + 1).

  2. 2.

    g(t) > k1g(t - 1) or g(t) > k1g(t - 2), where k1 is the threshold value.

  3. 3.

    f(t) > k2 f(t - 1) or f(t) > k2 f(t - 2), where k2 is the threshold value.

Condition 1 implies that the change point score g(t) is a peak point. Conditions 2 and 3 show that the rates of change of g(t) and f(t) are relatively large near day t. These conditions require day t to be the time when the pattern of the economic policy topic changes substantially and the share of the economic policy topic increases substantially. The entire procedure of TVD for PBCs is summarized as a pseudocode in Algorithm 1.

figure a

Subsequently, we set the following values: N = 45 and k1 = k2 = 1.5. We will examine the robustness of these parameters later. After categorizing the Lower House elections into “manipulation” and “non-manipulation” groups by applying the criterion, we extracted the 1993 election from the non-manipulation group and included it in the “other” category. This is because the 1993 election was a rare case in Japanese politics when a non-confidence motion of the cabinet passed the Diet, compelling the serving PM Kiichi Miyazawa to dissolve the Lower House.Footnote 7 Hence, in the 1993 election, the PM could not strategically opt to surf or manipulate, rendering this case an evident outlier. As for other elections, we examined each case substantially for possible outliers and could not find any. We also examined statistically and, as shown in Fig. 14, found all of the “non-manipulation” group other than the 1993 election were within the quartile + 1.5*IQR boxplots of relevant indicators. Subsequently, we named the remaining non-manipulation group as the “surfing group.” Table 3 presents the categorization of the 10 elections.

Fig. 14
figure 14

Boxplot of the “Non-manipulation” Group, excluding the 1993 Election

Table 3 Categorization of Elections from 1986 to 2018

Although earlier empirical studies of the PBC with EET assumed that the PM always opted to manipulate or surf in every election, recent theoretical studies of the topic assumed that the PM strategically opted to manipulate or surf depending on the political and economic conditions encountered. Therefore, how do the classifications presented in Table 3 fit with or differ from recent empirical and theoretical analyses of PBCs with EET? To assess this, we used three frequently used indicators: 1) time spent in office as a percentage of the full term (i.e., 4 years); 2) GDP growth rate; and 3) cabinet approval rate, each measured at the nearest point in each election. The average values of each indicator for each group are presented in Table 4 (detailed data for each election are presented in Table 3).

Table 4 Average Indicators Values for Each Group with Threshold k1 = k2 = 1.5

The results presented in Table 4 are primarily consistent with the theoretical and empirical expectations of recent studies. For example, recent studies predicted that the PM would be more likely to surf when the remaining period within the term was longer and more likely to manipulate when the remaining time was shorter (Schultz [35]; Kayser [19]; Kato & Inui [17]). Furthermore, they predicted that the PM would be more likely to surf when his cabinet’s approval rate was higher and the economic conditions are better (Saito [32]; Kato & Inui [17]).

To evaluate our model and empirical results, we compared our results with earlier empirical studies on this topic. Earlier studies assumed that the PM has always manipulated or surfed in every election and tested whether the “surfing” or “manipulation” hypothesis holds (e.g., Inoguchi [12]; Ito & Park [13]). In other words, in Table 4, past studies have assumed that the row “All” was either “surfing” or “manipulation” because the PM always surfed or manipulated. In either case the results of our model are more intuitively sound and fit better to recent theoretical developments where the PM strategically chooses to manipulate or surf. In sum, our empirical results generated by TVD are overtly consistent with recent theoretical models of PBC with EET and fit better to realistic expectations than previous empirical studies.

We will now examine the robustness of the results above. To create the criterion, the following values were set: number of days before the dissolution of parliament was set at N = 45, and thresholds were set at k1 = k2 = 1.5. When we maintained N = 45 and substantially increased the threshold values to k1 = k2 = 2, only one case, namely the 2000 election, moved from the manipulation group to the surfing group. As shown in Table 5, such a change in threshold values does not substantially change the overall results. Whereas the percentage of time spent within the term became indistinguishable between both the manipulation and surfing groups, the values for the quarterly GDP growth rate supported the findings of previous studies more strongly (Saito [32]; Kato & Inui [17]). As for N, none of the elections moved between the two groups when N was changed to 60 or 90 days. Hence, we can conclude that, notwithstanding small sample sizes, our findings are relatively robust.

Table 5 Average Indicator Values for Each Group with Threshold k1 = k2 = 2

Conclusions

We introduced herein a novel dataset and method to empirically assess whether the PM has surfed or manipulated in PBCs with EET, a classical research topic in political science and economics. TVD was proposed to detect the PM’s manipulation by examining his daily schedule.

Our approach enabled us to directly observe the PM’s behavior and overcome the significant methodological challenges encountered in past empirical research on PBCs with EET. Our empirical results generated by TVD fit well with state-of-the-art theoretical models of PBCs with EET. However, our study had some limitations. For example because only 10 Lower House elections occurred during the time range of the dataset (i.e., 1986–2018), this study could not empirically determine whether the surfing or manipulative hypothesis is valid because of the small sample size. Past empirical studies regarding this topic also had to address the small sample size problem. However, in our view, our approach enhances understanding of this classical topic from a new perspective and can at least serve as a strong test for evaluating the robustness of previous empirical studies regarding PBCs with EET.

For future studies, because the number of elections within the timespan of our dataset was severely limited, we must expand the sample size, for example by extending the timespan of the dataset. We might also analyze Upper House election cases, in which the election dates are fixed, to enrich our understanding of this topic. As for the method, TVD can be transformed and improved depending on the characteristics of the dataset analyzed. For example, LDA can be replaced by other topic models such as the dynamic topic model (Blei [2]). Finally, with regard to application of our system to a cloud environment, by creating a module that scrapes the shushō dōsei every day, we can automatically detect changes in the PM’s daily activities.

Availability of data and materials

We used the newspaper text data service of Nikkei Media Marketing Inc. The number of text files was 11,242. The total number of characters in the text files was 26,58,687.

Notes

  1. A few empirical studies have indicated that both manipulative and surfing hypotheses can co-exist (e.g., Cargill & Hutchinson [3]).

  2. For example, shushō dōsei on April 3, 2020, can be viewed online at https://www.nikkei.com/article/DGXMZO57666160T00C20A4EA3000/.

  3. The list of Japanese stop words created by Slothlib can be downloaded at the link below:

    http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/SlothLib/NLP/Filter/StopWord/word/Japanese.txt

  4. The English keywords in the word clouds displayed in this paper were translated by the authors from the original Japanese terms. ANA is the largest airline company in Japan and JAL is the second. A highly frequent mention of them together with “Yamabiko” and “Nozomi,” the nicknames of bullet trains, and “Tokyo Station” shows how the PM travelled around the country during the electoral campaign.

  5. It is called “koenkai” in Japanese. Because of its unique structure and role within Japanese politics, it is typically denoted as “koenkai” instead of “support group” even in political science literature written in English.

  6. The Cabinet Office, where the Council on Economic and Fiscal Policy is placed, is in charge of macroeconomic policymaking in the Japanese government.

  7. When a no-confidence motion of the cabinet passes the Diet, the Constitution of Japan requires the PM to either resign or dissolve the Diet.

Abbreviations

PM:

Prime Minister

PBC:

Political business cycle

EET:

Endogenous election timing

GDP:

Gross domestic product

TVD:

Topic variation detection

LDA:

Latent Dirichlet allocation

SST:

Singular spectrum transformation

SVD:

Singular value decomposition

References

  1. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  2. Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, pp 113–120

    Chapter  Google Scholar 

  3. Cargill TF, Hutchinson MM (1991) Political business cycles with endogenous election timing: evidence from Japan. The MIT Press. Rev Econ Stat 73(4):733–739

  4. Chowdhury A (1993) Political surfing over economic waves: parliamentary election timing in India. Am J Political Sci 37:1100–1118

    Article  Google Scholar 

  5. Dubois E (2016) Political business cycles 40 years after Nordhaus. Public Choice 166(1–2):235–259

    Article  Google Scholar 

  6. Eisenstein J (2009) Hierarchical text segmentation from multi-scale lexical cohesion. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp 353–361

    Google Scholar 

  7. Hearst MA (1994) Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, USA, pp 9–16. https://dl.acm.org/doi/proceedings/10.5555/981732

  8. Hoffman MD, Blei DM, Bach FR (2010) Online learning for latent Dirichlet allocation. NIPS, Curran Associates, Inc., USA, pp 856–864. https://papers.nips.cc/paper/3902-online-learning-for-latent-dirichlet-allocation

  9. Hogben L (ed) (2014) Handbook of linear algebra, 2nd edn. Chapman and Hall/CRC, USA. https://www.taylorfrancis.com/books/9780429185533

  10. Hong L, Dom B, Gurumurthy S, Tsioutsiouliklis K (2011) A time-dependent topic model for multiple text streams. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 832–840

    Chapter  Google Scholar 

  11. Idé T, Inoue K (2005) Knowledge discovery from heterogeneous dynamic systems using change-point correlations. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp 571–575

    Google Scholar 

  12. Inoguchi T (1979) Political surfing over economic waves: a simple model of the Japanese political economic system in comparative perspective. In: Eleventh World Congress of the International Political Science Association, Moscow, pp 1960–1980

    Google Scholar 

  13. Ito T, Park JH (1988) Political business cycles in the parliamentary system. Econ Lett 27(3):233–238

    Article  Google Scholar 

  14. Ito T (1990) The timing of elections and political business cycles in Japan. J Asian Econ 1(1):135–156

    Article  Google Scholar 

  15. Jameel S, Lam W (2013) An unsupervised topic segmentation model incorporating word order. In: Proceedings of the 36th international ACM SIGIR conference on Research and Development in information retrieval, pp 203–212

    Chapter  Google Scholar 

  16. Johnson C (1982) MITI and the Japanese miracle. Stanford University Press, Stanford

    Google Scholar 

  17. Kato S, Inui M (2013) How valuable is Prime Minister's dissolution option?: Black-Scholes approach to parliamentary dissolution. APSA 2013 Annual Meeting Paper. SSRN: https://ssrn.com/abstract=2303520.

    Google Scholar 

  18. Kato S, Nakanishi T, Shimauchi H, Ahsan B (2019) Topic variation detection method for detecting political business cycles. In: Proceedings of the IEEE/ACM 6th international conference on big data computing, applications and technologies, pp 85–93

    Chapter  Google Scholar 

  19. Kayser MA (2005) Who surfs, who manipulates? The determinants of opportunistic election timing and electorally motivated economic intervention. Am Political Sci Rev 99(1):17–27

    Article  Google Scholar 

  20. Kohno M, Nishizawa Y (1990) A study of the electoral business cycle in Japan: elections and government spending on public construction. Comparative Politics 22:151–166

    Article  Google Scholar 

  21. Kudo T, Yamamoto K, Matsumoto Y (2004) Applying conditional random fields to Japanese morphological analysis. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 230–237

    Google Scholar 

  22. Mabuchi M (1994) Ōkurashō tōsei no seiji keizaigaku (the political economy of the Ministry of Finance's control). Chūō Kōronsha, Tokyo

    Google Scholar 

  23. Mane KK, Börner K (2004) Mapping topics and topic bursts in PNAS. Proc Natl Acad Sci 101(1):5287–5290

    Article  Google Scholar 

  24. Martin GJ, McCrain J (2019) Local news and national politics. American Political Science Review 113(2):372–384

  25. Mimno D, Wallach HM, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp 262–272

    Google Scholar 

  26. Newton K (1993) Caring and competence: the long, long campaign. In: Kind A (ed) Britain at the polls, 1992. Chatham House Publishers, New Jersey, pp 129–170

  27. Nippon Hōsō Kyōkai. Retrieved Mar 1, 2019. shushō dōsei nan no tameni, from https://www3.nhk.or.jp/news/web_tokushu/2018_0712.html. Retrieved Mar 1 2019

  28. Nordhaus WD (1975) The political business cycle. Rev Econ Stud 42(2):169–190

    Article  MathSciNet  Google Scholar 

  29. Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks

    Google Scholar 

  30. Riedl M, Biemann C (2012) TopicTiling: a text segmentation algorithm based on LDA. In: Proceedings of ACL 2012 student research workshop, pp 37–42

    Google Scholar 

  31. Rogoff K, Sibert A (1988) Elections and macroeconomic policy cycles. Rev Econ Stud 55(1):1–16

    Article  Google Scholar 

  32. Saito J (2010) Jimintō chōki seiken no seiji keizaigaku: Rieki yūdō seiji no jiko mujun (the political economy under the LDP's longtime rule: the paradox of patronage-driven politics). Keisō Shobō

    Google Scholar 

  33. Sato T (2015) Neologism dictionary based on the language resources on the Web for Mecab. https://github.com/neologd/mecab-unidic-neologd

    Google Scholar 

  34. Schleiter P, Tavits M (2016) The electoral benefits of opportunistic election timing. J Politics 78(3):836–850

    Article  Google Scholar 

  35. Schultz KA (1995) The politics of the political business cycle. Br J Political Sci 25(1):79–99

    Article  Google Scholar 

  36. Smith A (2003) Election timing in majoritarian parliaments. Br J Political Sci 33(3):397–418

    Article  Google Scholar 

  37. Smith A (2004) Election timing. Cambridge University Press, Cambridge

    Google Scholar 

  38. Sun Q, Li R, Luo D, Wu X (2008) Text segmentation with LDA-based fisher kernel. In: Proceedings of the 46th annual meeting of the Association for Computational Linguistics on human language technologies: short papers, pp 269–227

    Google Scholar 

  39. Yoo K (1998) Intervention analysis of electoral tax cycle: the case of Japan. Public Choice 96(3–4):241–258

    Article  Google Scholar 

Download references

Acknowledgements

We thank Shun Ibaragi and Bashar Khayrul for their invaluable assistance. We also thank Kiminori Matsuyama, Keiichiro Kobayashi, Shigeki Morinobu, Charles Crabtree, and the participants of a research seminar at the Tokyo Foundation for Policy Research, the annual meeting of the Midwest Political Science Association (MPSA), and the IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT) for their useful comments and suggestions.

Author information

Authors and Affiliations

Authors

Contributions

S. Kato and T. Nakanishi initiated the research and contributed to the design and implementation of the research, the analysis of the results, and the writing of the manuscript. B. Ahsan and H. Shimauchi contributed to the design and implementation of the research, analysis of the results, and writing of the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Sota Kato.

Ethics declarations

Competing interests

N/A.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Fig. 15
figure 15

Topic Share and Change Point Scores of all Lower House Elections from 1986 to 2018

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kato, S., Nakanishi, T., Ahsan, B. et al. Time-series topic analysis using singular spectrum transformation for detecting political business cycles. J Cloud Comp 10, 21 (2021). https://doi.org/10.1186/s13677-020-00197-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13677-020-00197-4

Keywords