Takaaki Ohnishi ワーキングペーパー一覧に戻る

  • Gaussian Hierarchical Latent Dirichlet Allocation: Bringing Polysemy Back

    Abstract

    Topic models are widely used to discover the latent representation of a set of documents. The two canonical models are latent Dirichlet allocation, and Gaussian latent Dirichlet allocation, where the former uses multinomial distributions over words, and the latter uses multivariate Gaussian distributions over pre-trained word embedding vectors as the latent topic representations, respectively. Compared with latent Dirichlet allocation, Gaussian latent Dirichlet allocation is limited in the sense that it does not capture the polysemy of a word such as “bank.” In this paper, we show that Gaussian latent Dirichlet allocation could recover the ability to capture polysemy by introducing a hierarchical structure in the set of topics that the model can use to represent a given document. Our Gaussian hierarchical latent Dirichlet allocation significantly improves polysemy detection compared with Gaussian-based models and provides more parsimonious topic representations compared with hierarchical latent Dirichlet allocation. Our extensive quantitative experiments show that our model also achieves better topic coherence and held-out document predictive accuracy over a wide range of corpus and word embedding vectors.

    Introduction

    Topic models are widely used to identify the latent representation of a set of documents. Since latent Dirichlet allocation (LDA) [4] was introduced, topic models have been used in a wide variety of applications. Recent work includes the analysis of legislative text [24], detection of malicious websites [33], and analysis of the narratives of dermatological disease [23]. The modular structure of LDA, and graphical models in general [17], has made it possible to create various extensions to the plain vanilla version. Significant works include the correlated topic model (CTM), which incorporates the correlation among topics that co-occur in a document [6]; hierarchical LDA (hLDA), which jointly learns the underlying topic and the hierarchical relational structure among topics [3]; and the dynamic topic model, which models the time evolution of topics [7].

     

     

    WP017

  • Detecting Stock Market Bubbles Based on the Cross‐Sectional Dispersion of Stock Prices

    Abstract

    A statistical method is proposed for detecting stock market bubbles that occur when speculative funds concentrate on a small set of stocks. The bubble is defined by stock price diverging from the fundamentals. A firm’s financial standing is certainly a key fundamental attribute of that firm. The law of one price would dictate that firms of similar financial standing share similar fundamentals. We investigate the variation in market capitalization normalized by fundamentals that is estimated by Lasso regression of a firm’s financial standing. The market capitalization distribution has a substantially heavier upper tail during bubble periods, namely, the market capitalization gap opens up in a small subset of firms with similar fundamentals. This phenomenon suggests that speculative funds concentrate in this subset. We demonstrated that this phenomenon could have been used to detect the dot-com bubble of 1998-2000 in different stock exchanges. 

    Introduction

    It is common knowledge in macroeconomics that, as Federal Reserve Board Chairman Alan Greenspan said in 2002, ”...it is very difficult to identify a bubble until after the fact; that is, when its bursting confirms its existence.” In other words, before a bubble bursts, there is no way to establish whether the economy is in a bubble or not. In economics, a stock bubble is defined as a state in which speculative investment flows into a firm in excess of the firm’s fundamentals, so the market capitalization (= stock price × number of shares issued) becomes excessively high compared to the fundamentals. Unfortunately, it is exceedingly difficult to precisely measure a firm’s fundamentals and this has made it nearly impossible to detect a stock bubble by simply measuring the divergence between fundamentals and market capitalization [1–3]. On the other hand, we empirically know that market capitalization and PBR (= market capitalization / net assets) of some stocks increase during bubble periods [4–7]. However, they are also buoyed by rising fundamentals, so it is not always possible to figure out if increases can be attributed to an emerging bubble.

     

    WP010

  • House Price Dispersion in Boom-Bust Cycles: Evidence from Tokyo

    Abstract

    We investigate the cross-sectional distribution of house prices in the Greater Tokyo Area for the period 1986 to 2009. We find that size-adjusted house prices follow a lognormal distribution except for the period of the housing bubble and its collapse in Tokyo, for which the price distribution has a substantially heavier upper tail than that of a lognormal distribution. We also find that, during the bubble era, sharp price movements were concentrated in particular areas, and this spatial heterogeneity is the source of the fat upper tail. These findings suggest that, during a bubble, prices increase markedly for certain properties but to a much lesser extent for other properties, leading to an increase in price inequality across properties. In other words, the defining property of real estate bubbles is not the rapid price hike itself but an increase in price dispersion. We argue that the shape of cross-sectional house price distributions may contain information useful for the detection of housing bubbles. 

    Introduction

    Property market developments are of increasing importance to practitioners and policymakers. The financial crises of the past two decades have illustrated just how critical the health of this sector can be for achieving financial stability. For example, the recent financial crisis in the United States in its early stages reared its head in the form of the subprime loan problem. Similarly, the financial crises in Japan and Scandinavia in the 1990s were all triggered by the collapse of bubbles in the real estate market. More recently, the rapid rise in real estate prices - often supported by a strong expansion in bank lending - in a number of emerging market economies has become a concern for policymakers. Given these experiences, it is critically important to analyze the relationship between property markets, finance, and financial crisis.

     

    WP008

  • Power Laws in Market Capitalization during the Dot-com and Shanghai Bubble Periods

    Abstract

    The distributions of market capitalization across stocks listed in the NASDAQ and Shanghai stock exchanges have power law tails. The power law exponents associated with these distributions fluctuate around one, but show a substantial decline during the dot-com bubble in 1997-2000 and the Shanghai bubble in 2007. In this paper, we show that the observed decline in the power law exponents is closely related to the deviation of the market values of stocks from their fundamental values. Specifically, we regress market capitalization of individual stocks on financial variables, such as sales, profits, and asset sizes, using the entire sample period (1990 to 2015) in order to identify variables with substantial contributions to fluctuations in fundamentals. Based on the regression results for stocks in listed in the NASDAQ, we argue that the fundamental value of a company is well captured by the value of its net asset, therefore a price book-value ratio (PBR) is a good measure of the deviation from fundamentals. We show that the PBR distribution across stocks listed in the NASDAQ has a much heavier upper tail in 1997 than in the other years, suggesting that stock prices deviate from fundamentals for a limited number of stocks constituting the tail part of the PBR distribution. However, we fail to obtain a similar result for Shanghai stocks.

    Introduction

    Since B. Mandelbrot identified the fractal structure of price fluctuations in asset markets in 1963 [1], statistical physicists have been investigating the economic mechanism through which a fractal structure emerges. Power laws is an important characteristic in the fractal structure. For example, some studies found that the size distribution of asset price fluctuations follows power law [2,3]. Also, it is shown that firm size distribution (e.g., the distribution of sales across firms) also follows power law [4–8]. The power law exponent associated with firm size distributions is close to one over the last 30 years in many countries [9, 10]. The situation in which the exponent is equal to one is special in that it is the critical point between the oligopolistic phase and the pseudoequal phase [11]. If the power law exponent less than one, the finite number of top firms occupy a dominant share in the market even if there are infinite number of firms.

  • The Gradual Evolution of Buyer-Seller Networks and Their Role in Aggregate Fluctuations

    Abstract

    Buyer–seller relationships among firms can be regarded as a longitudinal network in which the connectivity pattern evolves as each firm receives productivity shocks. Based on a data set describing the evolution of buyer–seller links among 55,608 firms over a decade and structural equation modeling, we find some evidence that interfirm networks evolve reflecting a firm’s local decisions to mitigate adverse effects from neighbor firms through interfirm linkage, while enjoying positive effects from them. As a result, link renewal tends to have a positive impact on the growth rates of firms. We also investigate the role of networks in aggregate fluctuations.

    Introduction

    The interfirm buyer–seller network is important from both the macroeconomic and the microeconomic perspectives. From the macroeconomic perspective, this network represents a form of interconnectedness in an economy that allows firm-level idiosyncratic shocks to be propagated to other firms. Previous studies has suggested that this propagation mechanism interferes with the averaging-out process of shocks, and possibly has an impact on macroeconomic variables such as aggregate fluctuations (Acemoglu, Ozdaglar and Tahbaz-Salehi (2013), Acemoglu et al. (2012), Carvalho (2014), Carvalho (2007), Shea (2002), Foerster, Sarte and Watson (2011) and Malysheva and Sarte (2011)). From the microeconomic perspective, a network at a particular point of time is a result of each firms link renewal decisions in order to avoid (or share) negative (or positive) shocks with its neighboring firms. These two views of a network is related by the fact that both concerns propagation of shocks. The former view stresses the fact that idiosyncratic shocks propagates through a static network while the latter provides a more dynamic view where firms have the choice of renewing its link structure in order to share or avoid shocks. The question here is that it is not clear how the latter view affects the former view. Does link renewal increase aggregate fluctuation due to firms forming new links that conveys positive shocks or does it decrease aggregate fluctuation due to firms severing links that conveys negative shocks or does it have a different effect?

  • Novel and topical business news and their impact on stock market activities

    Abstract

    We propose an indicator to measure the degree to which a particular news article is novel, as well as an indicator to measure the degree to which a particular news item attracts attention from investors. The novelty measure is obtained by comparing the extent to which a particular news article is similar to earlier news articles, and an article is regarded as novel if there was no similar article before it. On the other hand, we say a news item receives a lot of attention and thus is highly topical if it is simultaneously reported by many news agencies and read by many investors who receive news from those agencies. The topicality measure for a news item is obtained by counting the number of news articles whose content is similar to an original news article but which are delivered by other news agencies. To check the performance of the indicators, we empirically examine how these indicators are correlated with intraday financial market indicators such as the number of transactions and price volatility. Specifically, we use a dataset consisting of over 90 million business news articles reported in English and a dataset consisting of minuteby-minute stock prices on the New York Stock Exchange and the NASDAQ Stock Market from 2003 to 2014, and show that stock prices and transaction volumes exhibited a significant response to a news article when it is novel and topical.

    Introduction

    Financial markets can be regarded as a non-equilibrium open system. Understanding how they work remains a great challenge to researchers in finance, economics, and statistical physics. Fluctuations in financial market prices are sometimes driven by endogenous forces and sometimes by exogenous forces. Business news is a typical example of exogenous forces. Casual observation indicates that stock prices respond to news articles reporting on new developments concerning companies’ circumstances. Market reactions to news have been extensively studied by researchers in several different fields [1]–[13], with some researchers attempting to construct models that capture static and/or dynamic responses to endogenous and exogenous shocks [14], [15]. The starting point for neoclassical financial economists typically is what they refer to as the “efficient market hypothesis,” which implies that stock prices respond at the very moment that news is delivered to market participants. A number of empirical studies have attempted to identify such an immediate price response to news but have found little evidence supporting the efficient market hypothesis [16]– [21].

  • Structure of global buyer-supplier networks and its implications for conflict minerals regulations

    Abstract

    We investigate the structure of global inter-firm linkages using a dataset that contains information on business partners for about 400, 000 firms worldwide, including all the firms listed on the major stock exchanges. Among the firms, we examine three networks, which are based on customer-supplier, licensee-licensor, and strategic alliance relationships. First, we show that these networks all have scale-free topology and that the degree distribution for each follows a power law with an exponent of 1.5. The shortest path length is around six for all three networks. Second, we show through community structure analysis that the firms comprise a community with those firms that belong to the same industry but different home countries, indicating the globalization of firms’ production activities. Finally, we discuss what such production globalization implies for the proliferation of conflict minerals (i.e., minerals extracted from conflict zones and sold to firms in other countries to perpetuate fighting) through global buyer-supplier linkages. We show that a limited number of firms belonging to some specific industries and countries plays an important role in the global proliferation of conflict minerals. Our numerical simulation shows that regulations on the purchases of conflict minerals by those firms would substantially reduce their worldwide use.

    Introduction

    Many complex physical systems can be modeled and better understood as complex networks [1, 2, 3]. Recent studies show that economic systems can also be regarded as complex networks in which economic agents, like consumers, firms, and governments, are closely connected [4, 5]. To understand the interaction among economic agents, we must uncover the structure of economic networks.

  • Detecting Real Estate Bubbles: A New Approach Based on the Cross-Sectional Dispersion of Property Prices

    Abstract

    We investigate the cross-sectional distribution of house prices in the Greater Tokyo Area for the period 1986 to 2009. We find that size-adjusted house prices follow a lognormal distribution except for the period of the housing bubble and its collapse in Tokyo, for which the price distribution has a substantially heavier right tail than that of a lognormal distribution. We also find that, during the bubble era, sharp price movements were concentrated in particular areas, and this spatial heterogeneity is the source of the fat upper tail. These findings suggest that, during a bubble period, prices go up prominently for particular properties, but not so much for other properties, and as a result, price inequality across properties increases. In other words, the defining property of real estate bubbles is not the rapid price hike itself but an increase in price dispersion. We argue that the shape of cross sectional house price distributions may contain information useful for the detection of housing bubbles.

    Introduction

    Property market developments are of increasing importance to practitioners and policymakers. The financial crises of the past two decades have illustrated just how critical the health of this sector can be for achieving financial stability. For example, the recent financial crisis in the United States in its early stages reared its head in the form of the subprime loan problem. Similarly, the financial crises in Japan and Scandinavia in the 1990s were all triggered by the collapse of bubbles in the real estate market. More recently, the rapid rise in real estate prices - often supported by a strong expansion in bank lending - in a number of emerging market economies has become a concern for policymakers. Given these experiences, it is critically important to analyze the relationship between property markets, finance, and financial crisis.

  • High quality topic extraction from business news explains abnormal financial market volatility

    Abstract

    Understanding the mutual relationships between information flows and social activity in society today is one of the cornerstones of the social sciences. In financial economics, the key issue in this regard is understanding and quantifying how news of all possible types (geopolitical, environmental, social, financial, economic, etc.) affect trading and the pricing of firms in organized stock markets. In this paper we seek to address this issue by performing an analysis of more than 24 million news records provided by Thompson Reuters and of their relationship with trading activity for 205 major stocks in the S&P US stock index. We show that the whole landscape of news that affect stock price movements can be automatically summarized via simple regularized regressions between trading activity and news information pieces decomposed, with the help of simple topic modeling techniques, into their “thematic” features. Using these methods, we are able to estimate and quantify the impacts of news on trading. We introduce network-based visualization techniques to represent the whole landscape of news information associated with a basket of stocks. The examination of the words that are representative of the topic distributions confirms that our method is able to extract the significant pieces of information influencing the stock market. Our results show that one of the most puzzling stylized fact in financial economies, namely that at certain times trading volumes appear to be “abnormally large,” can be explained by the flow of news. In this sense, our results prove that there is no “excess trading,” if the news are genuinely novel and provide relevant financial information.

    Introduction

    Neoclassical financial economics based on the “efficient market hypothesis” (EMH) considers price movements as almost perfect instantaneous reactions to information flows. Thus, according to the EMH, price changes simply reflect exogenous news. Such news - of all possible types (geopolitical, environmental, social, financial, economic, etc.) - lead investors to continuously reassess their expectations of the cash flows that firms’ investment projects could generate in the future. These reassessments are translated into readjusted demand/supply functions, which then push prices up or down, depending on the net imbalance between demand and supply, towards a fundamental value. As a consequence, observed prices are considered the best embodiments of the present value of future cash flows. In this view, market movements are purely exogenous without any internal feedback loops. In particular, the most extreme losses occurring during crashes are considered to be solely triggered exogenously.

  • On the Nonstationarity of the Exchange Rate Process

    Abstract

    We empirically investigate the nonstationarity property of the dollar-yen exchange rate by using an eight year span of high frequency data set. We perform a statistical test of strict stationarity based on the two-sample KolmogorovSmirnov test for the absolute price changes, and the Pearson’s chi-square test for the number of successive price changes in the same direction, and find statistically significant evidence of nonstationarity. We further study the recurrence intervals between the days in which nonstationarity occurs, and find that the distribution of recurrence intervals is well-approximated by an exponential distribution. Also, we find that the mean conditional recurrence interval 〈T|T0〉 is independent of the previous recurrence interval T0. These findings indicate that the recurrence intervals is characterized by a Poisson process. We interpret this as reflecting the Poisson property regarding the arrival of news.

    Introduction

    Financial time series data have been extensively investigated using a wide
    variety of methods in econophysics. These studies tend to assume, explicitly
    or implicitly, that a time series is stationary, since stationarity is a requirement
    for most of the mathematical theories underlying time series analysis.
    However, despite its nearly universal assumption, there is little previous studies
    that seek to test stationarity in a reliable manner. (Toth1a et al. (2010)).

  • On the Evolution of the House Price Distribution

    Abstract

    Is the cross-sectional distribution of house prices close to a (log)normal distribution, as is often assumed in empirical studies on house price indexes? How does the distribution evolve over time? To address these questions, we investigate the cross-sectional distribution of house prices in the Greater Tokyo Area. We find that house prices (Pi) are distributed with much fatter tails than a lognormal distribution and that the tail is quite close to that of a power-law distribution. We also find that house sizes (Si) follow an exponential distribution. These findings imply that size-adjusted house prices, defined by lnPi − aSi, should be normally distributed. We find that this is indeed the case for most of the sample period, but not the bubble era, during which the price distribution has a fat upper tail even after adjusting for size. The bubble was concentrated in particular areas in Tokyo, and this is the source of the fat upper tail.

    Introduction

    Researchers on house prices typically start their analysis by producing a time series of the mean of prices across different housing units in a particular region by, for example, running a hedonic or repeat-sales regression. In this paper, we pursue an alternative research strategy: we look at the entire distribution of house prices across housing units in a particular region at a particular point of time and then investigate the evolution of such cross-sectional distribution over time. We seek to describe price dynamics in the housing market not merely by changes in the mean but by changes in some key parameters that fully characterize the entire cross-sectional price distribution.

  • On the Evolution of the House Price Distribution”

    Abstract

    Is the cross-sectional distribution of house prices close to a (log)normal distribution, as is often assumed in empirical studies on house price indexes? How does it evolve over time? How does it look like during the period of housing bubbles? To address these questions, we investigate the cross-secional distribution of house prices in the Greater Tokyo Area. Using a unique dataset containing individual listings in a widely circulated real estate advertisement magazine in 1986 to 2009, we find the following. First, the house price, Pit, is characterized by a distribution with much fatter tails than a lognormal distribution, and the tail part is quite close to that of a power-law or a Pareto distribution. Second, the size of a house, Si, follows an exponential distribution. These two findings about the distributions of Pit and Si imply that the the price distribution conditional on the house size, i.e., Pr(Pit | Si), follows a lognormal distribution. We confirm this by showing that size adjusted prices indeed follow a lognormal distribution, except for periods of the housing bubble in Tokyo when the price distribution remains asymmetric and skewed to the right even after controlling for the size effect.

    Introduction

    Researches on house prices typically start by producing a time series of the mean of prices across housing units in a particular region by, for example, running a hedonic regression or by adopting a repeat-sales method. In this paper, we propose an alternative research strategy: we look at the entire distribution of house prices across housing units in a particular region at a particular point of time, and then investigate the evolution of such cross sectional distributions over time. We seek to describe price dynamics in a housing market not merely by changes in the mean but by changes in some key parameters that fully characterize the entire cross sectional price distribution. Our ultimate goal is to produce a new housing price index based on these key parameters.

  • Random Walk or A Run ―Market Microstructure Analysis of the Foreign Exchange Rate Movements based on Conditional Probability―

    Abstract

    Using tick-by-tick data of the dollar-yen and euro-dollar exchange rates recorded in the actual transaction platform, a “run”—continuous increases or decreases in deal prices for the past several ticks—does have some predictable information on the direction of the next price movement. Deal price movements, that are consistent with order flows, tend to continue a run once it started i.e., conditional probability of deal prices tend to move in the same direction as the last several times in a row is higher than 0.5. However, quote prices do not show such tendency of a run. Hence, a random walk hypothesis is refuted in a simple test of a run using the tick by tick data. In addition, a longer continuous increase of the price tends to be followed by larger reversal. The findings suggest that those market participants who have access to real-time, tick-by-tick transaction data may have an advantage in predicting the exchange rate movement. Findings here also lend support to the momentum trading strategy.

    Introduction

    The foreign exchange market remains sleepless around the clock. Someone is trading somewhere all the time—24 hours a day, 7 days a week, 365 days a year. Analyzing the behavior of the exchange rate has become a popular sport of international finance researchers, while global financial institutions are spending millions of dollars to build real-time computer trading systems (program trading). High-frequency, reliable data are the key in finding robust results for good research for academics or profitable schemes for businesses.

PAGE TOP