DATA DRIVEN PERSPECTIVE ON STOCK PRICE - MACROECONOMIC VARIABLES: INDONESIA ECONOMY 2016-2020

The use of a theory-driven perspective is very common, especially in economics research, and even become an inevitable approach. Problems arise when data, as a form of reality, does not synergize with theory. The resulting conclusion is very likely to be different from the theoretical statement. One method that refers to data-driven is the Vector Auto-Regressive (VAR) model, which puts all the variables involved in a position as endogenous variables. This study seeks to identify a statistically more accurate relationship in the relationship between variables, stock prices, consumer price index, Jakarta Inter-Bank Over rate, exchange rate, and Net Balance Trade. Observations were made from January 2016 to December 2020. This study found evidence that there is a recursive relationship between stock price variables and macroeconomic variables. The VAR model identifies the Net Balance Trade variable as an endogenous variable in 3 types of sectoral stocks and only manufacturing sector stocks that resemble it. These results have two theoretical consequences: first, setting stock prices without differentiating sectors carries the risk of generalization errors. Second, setting stock prices as the endogenous variable means assuming that the market is perfect, and efficient and market participants have rational behavior.


INTRODUCTION
Research on the stock market and macroeconomic variables is one of the most popular topics. This popularity can be seen from the results of browsing. Through google.co.id, the word "stock price, macroeconomic variable" recorded 34.9 million searches. If added with the word "multiple regression" recorded 56.2 million. If "multiple regression" is replaced with "Vector Autoregression" it is recorded at 1.7 million. If the search is in Indonesian, "Stock Prices and Macroeconomic Variables" is recorded at 155 thousand. If the word "price" is replaced with "market" it increases to 233 thousand. If replaced with "return" recorded 108 thousand searches. Likewise with the method used. The word "multiple regression macroeconomic variable stock prices" resulted in 57,700 searches. If the word "multiple regression" is replaced with "VAR" it becomes 22,800 search results (browsing is done via Google Chrome on February 22, 2021).
The search results show that research on the relationship between stock prices and macroeconomic variables gives mixed results. Inflation variable theoretically has a negative effect on stock prices which is confirmed by many researchers (Camilleri et al., 2019;Megaravalli & Sampagnaro, 2018;Singhal et al., 2019). However, several other researchers have different conclusions (Ali et al., 2018;Caruso, 2018;Chang et al., 2019;Mohamed & Ahmed, 2018;Shmueli & Koppius, 2010;Wu et al., 2018).
The stronger the exchange rate, the higher the stock price (Ali et al., 2018;Demir, 2019;Megaravalli & Sampagnaro, 2018;Wu et al., 2018). However, some researchers conclude differently (Chang et al., 2019;Singhal et al., 2019). An increase in the trade balance will have an impact on strengthening stock prices (Caruso, 2018;Demir, 2019), on the other hand, Wu, et.al (2018), and Ali (2018) have different conclusions. Similarly, the interest rate variables, Wu, et.al. ((2018), Demir (2019), Chang, et.al. (2019) and Ali (2018) conclude that the effect is negative, the results are different from the results of research by Camilleri (2019), Singhal, et.al. (2019) and Mohammad and Ahmed (2018). This diversity is caused by casuistic conditions, both from the aspect of diversity in the profile of the object of observation and the time of observation used as the sample period. However, alternative sources of diversity in research results are also possible due to changing modeling, which describes the relationship between variables that change from his mainstream relationship (Jansen, 2018;Sembel, 2015).
The relationship between variables is very likely to change because empirically there is no variable that stands in one position consistently, for example as the dependent variable. Given that the theory is closed because the assumptions that accompany it, on the other hand, empirical data is an open phenomenon, which is the resultant of many factors without being limited by assumptions. These possibilities, unfortunately, have not been responded to by many internal researchers. The lack of efforts to open up other possibilities from established theories can also be seen from the lack of use of VAR.
Many economists' statements are starting to doubt the robustness of the theory. One of them is Kling (2022) who concludes that in the end, economists cannot really have an effective economic theory. Note that what is meant by an effective theory is a theory that is verifiable and reliable for prediction and control. On the other hand, economic theory deals only with speculative interpretations and continues to search for more convincing forms. Now, many economists are starting to struggle with mental-cultural factors that are thought to be very strong in influencing economic behavior.
In other words, economists today think more about the reasons why one variable affects other variables, and not what variables affect what variables. In a research perspective when researchers focus their analysis on proving theories and building models based on theory, it is called a Theory-Driven perspective. This perspective begins with building a theoretical model, with a limited sample building a statistical model according to the theory and then analyzing it. And the results of the analysis are directed to verify the theory used. On the other hand is the Data-Driven perspective, where researchers collect data and theories only as determinants of the variables to be analyzed. The statistical model built places all variables as independent variables and then leads to the most appropriate statistical relationship to be analyzed ( (Jagadish, 2015;Shmueli & Koppius, 2010).
Efforts to open oneself to alternative theoretical explanations have been initiated since 1958 by Forrester, Professor Emeritus of management at the MIT Sloan School. His article is about the lack of references to the formation of mathematical models in the economic field, especially those related to industrial firms (Forrester, 1958). The scarcity of references illustrates the lack of seriousness of economists (at that time) in thinking about the industrial concept. Whereas theoretically, the concept of individual firms uses a lot of mathematical logic in analyzing the relationship between the factors. In addition, Forrester also observes that many analytical models are still built in too simple a form. This simplicity is in the form of ignoring external factors in influencing the behavior of individual industrial firms. According to Forrester's assessment, many analyzes are unsuccessful in describing and producing good policies.
In a different era, Robert Lucas made a similar critique, in the field of macroeconomics, known as the Lucas' Critique (Gabaix, 2019;Haldane & Turrel, 2017;Lucas Jr. & Sargent, 1979). Lucas stated that Keynes's thinking was unable to produce appropriate macroeconomic policies. Because the structural model can be replaced with a model that is based on the rational behavior of individual economic actors (micro-foundation), which is able to react rationally to policies. Thus, Lucas is placed as the originator of a mathematical model that leads to the role of expectations, which gives the role of 'time' to be very important. For Lucas, the current policies taken are aimed at future economic actors.
The implication of Lucas' critique is that the role of the individual in shaping policy is very important (micro-foundation). Individual dynamics as a focus must be balanced with an economic model that is dynamic (forward-looking behavior). Therefore, policy makers need to make time profiling, which makes mathematics have a very important role in Lucas' point of view, especially time series analysis.
The very rapid developments in timeseries analysis have had little impact on existing research, particularly those related to financial concentration. Most research on finance uses a cross-sectional approach and even if it involves the element of time, the analysis still leads to static time conditions. The static relationship between stock price variables and macroeconomic variables resulted in various findings so that an explanation of the causes was not obtained. Therefore, it is important to open the possibility to generate new relationships or new explanations, namely with a data-driven approach. Through the Data-Driven approach, it is expected to find a new relationship between stock prices and macroeconomic variables that are different from mainstream theory.

LITERATUR REVIEW
The model is an attempt to simplify a complex reality. The model is not reality, but the model is an attempt to build reality so that it is easier to understand. In general, the model has input elements (information, processing information) and expected outputs. Mathematically, the model has the following elements: (1) Variables: as elements involved in the construction of reality, consisting of the dependent variable and the independent variable; (2) Parameters: are measures in the form of constants and coefficients; and (3) assumption: which simplifies the scope of the discussion.
On the other hand, the economic model is a model that relates complex economic realities (facts, data) with economic theory that is full of simplifications. Therefore, the critical point of the economic model is to use a confirmatory theoretical framework to explain open and complex facts and data.

Theory Driven vs Data Driven
The concept of parsimony brings benefits from the ease of understanding complex realities. However, behind this convenience, theory requires a series of assumptions so that this simple theory has the ability to explain. The existence of assumptions is often a problem when theory is confronted with facts. On the one hand, assumptions make it easier to focus thinking, on the other hand assumptions make thinking more closed. On the other hand, facts are the result of many factors that are not limited by assumptions so that the facts are more open. This relationship between theory and facts will cause the problem of resistance to theory, which is precisely the basis of the researchers' foothold. Therefore, modeling is needed when research problems are faced with casuistic situations. The casuistic nature here can be understood from the side of the object being observed, which may have various profiles. In addition to the object of observation, casuistic characteristics can also occur in terms of the time of observation, as well as in terms of the continuous relationship between variables, which may differ from mainstream theory.
Theory-based research (theory driven), builds a model based on the selected theory with a set of assumptions, while datadriven research builds a model based on the results of processed statistics.
Data-driven research uses an exploratory approach to analyze big data (Kitchin, 2014). Due to the complexity of the phenomena and processes that generate the data, it may differ from a theoretical environment constrained by a set of assumptions. Data-driven research is described in the following stages (Jagadish, 2015;Shmueli & Koppius, 2011): Stage-1: identify research questions based on theoretical gaps in the observed field; Stage-2: create/obtain data sources related to relevant phenomena in the field; Stage-3: cleaning, extracting, annotating data streams to prepare for analysis; Stage-4: integrate, combine, and represent data to detect relationships between variables (eg. correlation patterns); Stage-5: analyze and model data; and Stage-6: interpreting patterns to arrive at a model-building solution.
Data-driven research has been popular in several natural sciences. This scientific method is considered effective because the size of the data is quite large. Although the research started by using a standard theory, when interpreting the findings, it was based on the objective results of data observations.
The contributions of data-based research are: (1) patterns taken from the results of big data analysis, are more accurate in describing facts; and (2) the explanations obtained are more illustrative of the facts than merely justifying the theory.

Stock Prices and Macroeconomic Variables
Theoretically, macroeconomic variables are thought to be able to explain stock price movements, along with company internal variables. The following are some standard journals that are often used as references in research on stocks and macroeconomics. Fama and French (1990) show that the expected profit from changes in stock prices is lower when economic conditions are strong and higher when conditions are weak. Fama (1990) in more detail suggests that the annual stock variance can be traced through estimates of variables such as industrial production, real GNP, and investment, which are important in determining cash flows to firms; and states that stock returns in the United States and their aggregate real activity are correlated.
Chen, Roll and Ross (1986) put forward hypotheses and evidence that economic variables such as interest rates, inflation, industrial production or bonds as important factors for the stock market. They concluded that between macroeconomic variables and changes in stock prices had a strong relationship, however, they could not determine the appropriate macroeconomic factors for asset pricing. This is Chen's opinion which is a core gap in the literature. Cheung and Ng (1998) provide evidence of co-movement over the long term between five stock indices (in Canada, Germany, Italy, Japan, and the US) and country-specific measures of aggregate real activity; such as real oil prices, real GNP, money supply and real consumption, which were previously not fully captured by Fama (1990). Flannery and Protopapadakis (2002) provide empirical evidence that changes in stock prices, in the current literature, have a negative correlation to inflation (both in the form of consumer price index (CPI) and producer price index (PPI)) and money supply growth. They further provide empirical evidence for price factor variables, such as trade balance, employment or housing ownership. Rapach, Wohar and Rangvid (2005) examined whether macroeconomic factors such as interest rates, inflation, industrial production, unemployment can predict stock returns in twelve countries. He stressed that interest rates and inflation appear to be significant factors in the relationship. On the other hand, the predictive ability of industrial production and unemployment to assess changes in stock prices is relatively limited.
In the past decade, various multifactor-based studies on the relationship between stock prices and macroeconomic variables have been frequently conducted. Dubravka and Posedel (Dubravka & Posedel, 2010) suggest that the market index is a very strong factor influenced by interest rates, oil prices and production which have a positive relationship with changes in stock prices on the Croatian stock market, while inflation has a negative relationship. Hsing (2011) examines the relationship between stock market indices on the Croatian stock market and relevant macroeconomic variables such as real GDP, stock market developments, govern-ment bond returns, real interest rates, exchange rates or expected inflation. Similar research was conducted by Hsing and Hsieh (2012) with similar results. The results of his research state that the stock market in Poland is positively related to industrial sector production, real GDP, stock market index, interest rate, nominal effective exchange rate, inflation rate, and government bond yields.
In addition, several studies have begun to develop their independent variables towards variables in an open economy. The proposed consideration is that the object of observation is a country with an open economy. These variables are interactions with the capital markets of other countries (Cieslak & Pang, 2020;Megaravalli & Sampagnaro, 2018), oil prices on the world market which directly affect domestic prices (Adiwibowo & Sihombing, 2019;Benakovic & Posedel, 2010;Demir, 2019;Moelands, 2017;Smyth & Narayan, 2018), and consider the period of shock (Machmuddah et al., 2020;Megaravalli & Sampagnaro, 2018).
The basic model that will be built in this study is based on theoretical considerations from several sources, namely: Flannery and Protopapadakis (2002), Benakovic and Posedel (2010), Hsing and Hsieh (2012), Mohamed and Ahmed (2018), Wu, et.al. (2018), Camilleri, et.al. (2019) and Chang, et.al. (2019). In general, the stock price variable is associated with the Balance Trade, Exchange Rate, Interest Rate, and Inflation variables. The selection of these four variables is based on macro fundamentals which not only have complete data and are easy to access data, but also these macro variables cause the most differences in the results of the analysis.
The general pattern of relationships adapted in Figure 1.

Data and Data Sources
The object of observation in this study is the stock market in Indonesia from January 2016 to December 2020. The data taken is the latest data before the research begins with a short period of time so that there is no risk of structural changes. The selection of stock sector objects is based on the consideration that the sector is a real sector and relatively dominates stock price movements. The variables that will be involved in this analysis are: First, stock prices, measured by sectorial composite stock price indexes, namely the stock price index of the construction sector (JKCON), the infrastructure sector (JKINFRA), and the manufacturing industry sector (JKMAN); Second, the exchange rate is measured based on the weighted average middle rate of IDR/USD, which is formulated by: Weighted Average Rate = ((Selling Rate * Selling Volume) + (Buying Rate * Buying Volume)) / (Selling Volume + Buying Volume) ……………………………..…(1) Third, the interest rate is measured based on the Jakarta Inter-Bank Over-Rate for one month (JIBOR1) and three months (JIBOR3). JIBOR is the average interest rate indicative of unsecured loans offered by contributing banks to other contributing banks to lend rupiah in Indonesia (for tenors above overnight).
Fourth, the price level or inflation is measured based on the Consumer Price Index (CPI) 2018=100. Since 2014 it is based on consumption patterns in 82 cities, while in 2020 it is based on consumption patterns in 90 cities.
Fifth, the foreign sector is measured by Net Balance Trade (NDB). It is the difference between exports and imports of goods and services.
Stock price and exchange rate variables were obtained from finance.yahoo. com; interest rate and price level (inflation) sourced from Bank Indonesia; while the net balance trade variable is taken from Badan Pusat Statistik (BPS). The formation of the model in this study is based on the results of statistical processing (datadriven) not theory driven. Data processing utilizes the EViews 10 application which is equipped with various dynamic analysis alternatives.

Research Stages
Time Series analysis is used to create good models that can be used to forecast economic and business activities (such as stock market prices, sales, turnover). This allows decision makers to understand time patterns in the data and analyze their trends. Therefore, the testing process in the time series analysis is very detailed, including: testing the level of variables, testing the level of the relationship between variables, testing the stability of the relationship between variables.
Time series analysis aims to identify natural phenomena through observations from time to time, and lead to forecasting activities. There are four components of (2) seasonal variation, which describes changes that are seasonal in nature; (3) cyclical fluctuations, namely changes that are periodic but not related to time; and (4) irregular variations, namely changes in data or variables that are nonrandom. The research stages start from the data, then the stationarity test (unit roots test), determine the optimum lag, determine the causality relationship and finally build the VAR model.

Unit Roots
The factor of unit roots data time series is blamed for the inaccuracy in estimating the model. The cause is not only the accuracy factor, but also the lag factor or autoregressive and the time trend. The decision is based on the hypothesis: H0: Ha:

Optimum Lag
"k" lag is the period that occurs, where the point in time "k" before a certain time, is symbolized by Yt-k. The most commonly used lag is 1, which is called the first-order lag plot. Lag is generally small (k= 1 or 2) to avoid losing too much data. However, it is possible to plot multiple lags with separate groups.
The parameters that determine the optimal lag length are AIC (Akaike Information Criterion) and SIC (Schwarz Information Criterion). The determination of the optimal lag VAR is based on the smallest AIC, or SIC, value. The calculation of AIC and SIC is (Enders, 2015, pp. 69-70): The value of k for the lag length is determined in advance from the stable VAR equation until the maximum lag generated by the VAR system is obtained as the k value used.

Engle-Granger Causality
Granger (1969) introduced the concept of causality which eventually became a popular topic in econometrics. Granger defines the y 2t variable as having a causal relationship in period t with the y 1t variable if the y 1t variable is able to help improve the forecasting ability of the y 2t variable (Lutkepohl & Kratzig, 2004).
y 2t has no causal relationship with y 1t if removing y 2t from a series of information does not change the optimal forecast for y 1t , conversely, y 2t has a Granger Causal relationship for y 1t if (in the above equation) the value of h is at least one (t+h ≤ t+1).

Vector Auto-Regressive (VAR)
Several researchers who focus on asset price dynamics, such as: Megaravalli and Sampagnaro (2018), Singhal, Choudhary, and Biswal (2019) and Cieslak and Pang (Cieslak & Pang, 2020) in their observations of the sources of asset price variations, recommend the use of the VAR analysis model because VAR is considered most capable of tracking asset price dynamics. VAR is basically a tool for predicting system interrelationships in time series data and for analyzing the impact of random disturbances from a variable system. The VAR approach produces a structural model that is built on the placement of all variables as endogenous variables involving time lag in a system of equations.
…………………....………………….... (6) Even though VAR will be able to produce several structural equations, VAR is not a simultaneous model, so it can be expected that the least square approach will still produce predictions that are consistent, efficient and equivalent to General Least Square (GLS) as long as it has the same regressor.
One of the important tests to achieve the consistency of the model generated by VAR is stability (stationary). The test is based on The Inverse Roots of The Characteristic AR Polynomial which is presented in graphic form (Agung, 2009;Anastasiou & Kapopolous, 2021;Juselius, 2006;Moelands, 2017). The VAR estimate will be considered stable or stationary if all the roots have a modulus less than one.

General Description
In general, the distribution for each variable is normally distributed, except for the JKINFRA and KURS variables. The normality test is based on the Jarque-Bera statistic, along with the probability value. In Table 1

Unit Roots Test
In probability theory, unit roots are a form of some stochastic process that can cause problems in drawing statistical inference involving time series data. Table  2 shows the probability of testing unit roots at degree-0 (I-0) and degree-1 (I-1). At (I-0) none of the variables are stationary, but at (I-1) all variables have a stationary nature. The conclusion is obtained from the probability value of unit root testing (table 2) which is < 0.05 based on the MacKinnon statistic. Thus, all analyzed data has the opportunity to be analyzed dynamically by taking into account changes in time. Likewise with opportunities in VAR analysis.

Optimum Lag
Optimum lag is used to increase the effectiveness of the relationship between time-series variables. Optimum lag measurement is done by using several alternative information criteria. The optimum lag decision is based on the smallest value of the information criteria of the 5 methods used. In this study there are three basic models, namely: (1) JKCON= f(JIBOR, CPI, KURS, NDB), (2) JKINFRA=f(JIBOR, IHK, KURS, NDB) , (3) JKMAN= f(JIBOR, CPI, EXCHANGE, NDB). The three models use two proxies for JIBOR, namely JIBOR1 and JIBOR3. Based on Table 3, it can be seen that in general the information criteria indicators show the optimum lag at lag-2, except for the relationship between JKCON and JIBOR3 which has an optimum lag at level 1.

VAR Stability
Stability is an important element in this analysis, especially when it is associated with forecasting. The stability of the VAR system is visually inspected by evaluating roots of the above VAR system. The dots represent the roots of characteristic polynomial. If no roots lie outside the unit circle, then the VAR system satisfies stability condition, indicating model stability. The figure below shows that all alternative models have good stability so that it is possible to analyze using VAR.

Engle-Granger Causality
VAR analysis requires the identification of relationships between variables which are estimated using Engle-Granger causality. This causality relationship is not directed to determine the analysis model. Table 4 shows the bivariate relationship between all analyzed variables. If the probability value < 0.05 indicates that the two variables have a causal relationship. Thus it can be concluded that a causal relationship occurs between:

VAR Model
From a series of VAR models formed, there are 6 models, each model has 6 alternative models. To determine the best model, a comparison was made using the highest R 2 and using the information criteria of AIC and SIC which had the lowest value. Thus the best VAR model for each alternative model is as follows:

Model-1: D(JKCON) D(IHK) D(JIBOR1) D(KURS) D(NBT)
For model-1, the best alternative model is the model using NBT as an endogenous variable, with the equation:

Model-2: D(JKCON) D(IHK) D(JIBOR3) D(KURS) D(NBT)
For model-2, the best alternative model is the model using NBT as an endogenous variable, with the equation:

Model-4: D(JKINFRA) D(IHK) D(JIBOR3) D(KURS) D(NBT)
Likewise for model-4, the best alternative model is a model that uses NBT as an endogenous variable, with the equation:

Model-5: D(JKMAN) D(IHK) D(JIBOR1) D(KURS) D(NBT)
For model-5, there are 2 best alternative models, namely the model that makes JKMAN and NBT as endogenous variables, with the following equation: The results of the comparison of the VAR model are shown in table 5. The best model for model-1 to model-4, is the model that places NBT as an endogenous variable with a higher R 2 value than other alternative models.
For model-5 and model-6, there are two possible alternative models, namely the model that places the JKMAN as the dependent variable or NBT as the dependent variable. The difference in R 2 of the two variables is not too far, so it is possible that there are two equations.
However, it should be noted that this VAR model is not intended for simultaneous model identification. Table 5 illustrates that among the 3 stock price variables (JKCON, JKINFRA, and JKMAN) which position themselves as endogenous variables or as the dependent variable, only JKMAN. In other words, if we base ourselves on the distribution of data, then only the manufacturing sector stocks whose movements are influenced by macroeconomic variables.
The results of these calculations provide two theoretical consequences, namely: (1) setting stock prices in general (without distinguishing sectors) has a risk of generalization errors. This sectoral distinction is necessary considering that market behavior, both from the institutional side and from the side of market players, has different characteristics.
(2) using the irrelevance theory point of view from Miller-Modigliani (Aboura & Lepinette, 2017;Cline, 2015;Gersbach et al., 2015), setting stock prices as the dependent variable (endogenous variable) means assuming that the market is perfect, efficient and market participants have rational behavior. Thus, the stock of the manufacturing sector is included in the category of Miller-Modigliani irrelevance theory. It is interesting to study further whether the consumption and infrastructure sector shares have the characteristics of an inefficient market or market players are irrational?

CONCLUSION AND RECOMMEN-DATION Conclusion
Theory is a simple way to explain facts (data), while facts (data) are the resultant result of many factors which are much more complex than theoretical explanations. The difference in complexity between theory and facts (data) allows for a diversity of observations (research gap), and it is very possible that there is a difference between experience and expectations (phenomena gap).
The theory regarding the relationship between stock prices and several macro variables, placing stock prices as endogenous variables and macroeconomic variables as exogenous variables, does not provide certainty that factually (based on data) gives a similar statement. This is because the role of time, the role of behavior or market characteristics becomes very important to consider.
The results of this study indicate that the relevant stock price variable acting as an endogenous variable is only the stock of the manufacturing sector. Shares of consumption and infrastructure sectors are more directed to their role as exogenous variables.

Recommendation
Based on the conclusions above, it can be suggested to pay attention to the time variable in conducting the analysis, especially related to the capital market. In addition to the completeness of the data that can be accessed, the time factor is also able to reveal the dynamics of market institutional behavior and the dynamic behavior of market participants. The dynamics of market behavior must be the basis for consideration in the formation of the model.