mrau1.github.io
The aim of this project is to examine the relationships between major coronavirus events and their impact on various economies.
Coronavirus Event Data:
Market Data:
During drastic times when things go wrong, people scramble, they are uncertain and tend to make rash decisions. These scenarios are all too common in our society and recently occured as a result of the COVID-19 Pandemic. The stock market and foreign exhcange rates are perfect vehicles of this channeled uncertainty and provide insights into investor sentiment and economic strength. To understand this project we need a brief background into economics.
It is known that people are never comfortable with uncertainty and tend to seek certainty whenever the possible. This is especially prevelant in financial markets where people, countries, and firms want to have certain plans around how their money is allocated. Most investors prefer reliable returns that meet their goals and countries allocate their wealth in currencies or commodities that hold value. During major events throughout history such as September 11, 2001; The Great Recession, and President John F. Kennedy's assasination, markets tumbled because investors did not know what would happen next.
Another concept that should be understood is that a stock/currency is like a voting machine. The more people that buy the stock the higher the price will go, and vice-versa. In foreign exchange markets the more people that demand a currency in relation to another, the highly demand currency will increase in value (appreciate) while the lessor will decrease in value (depreciate). If more Americans want to travel to Britain than the United States and demand more of Britain's currency (British Pound), then the British Pound will appreciate in value and the United States Dollar will depreciate.
Lastly, investors psychologically tend to act like cartels. This is because a stock is a voting machine. Even if a company or currency has no intrinsic value, if people are willing to keep buying the underlying at its current value, then the underlying will not drop. All the market players are acting rationally so nobody loses their investment. However, if one major investor (or enough smaller investors) sell enough of the underlying at once, then the value will drop and other investors will panic and close their positions. This downward trend will cause prices to rapidly drop, and it only stops when enough people have confidence to buy the underlying at its reduced price.
The purpose of this project is to analyze the uncertainty of market participants. To do so i'll analyze the foreign exchange rates and major stock market indices of 14 large countries. These countries are a mix of developing and developed countries who have active stock markets, and are relevant to the world economy.
This project aims to analyze which markets were the most volatile, dependable, and risky during a time of crisis. Furthermore, consumer sentiment is analyzed regarding COVID-19 events to capture investor reaction to events.
!pip install mapclassify
!pip install pycountry-convert
!pip install geopy
mrau1.github.io
import geopandas as gpd
import requests
import csv
import time
import pandas as pd
import scipy.stats as sp
import logging
import threading
import time
import json
import os
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import math
from sklearn.linear_model import LinearRegression
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))
#Show all the rows of a dataframe
pd.options.display.max_rows = 10000
pd.options.display.width = 1000
Extraction: multiple datasets were extracted from finance.yahoo.com. The code below uses RESTFUL api's to get historical data for thousands of financial tickers, instruments, and currencies. This code uses the requests library to pull the data, and pandas to convert into a pandas DataFrame. Each historical dataset can be webscraped or downloaded as a .csv file. Each DataFrame has the same column structure with columns for Date
, Open
, Close
, High
, Low
, Adj Close
, and Volume
. Volume
however, is not regularly recoreded, and for the sake of this project, will be dropped from consideration.
# Url of Historical Data from yahoo.finance.com
historical_url = "https://finance.yahoo.com/quote/%s/history?p=%s"
"""Adjusted to automatically take in symbols for automation"""
# Creates urls to grab current price of the underlying
def generateUrl(symbol=False):
url_1 = "https://finance.yahoo.com/quote/"
url_2 = "?p="
url_3 = "&.tsrc=fin-srch"
if symbol == False:
# Allows user to input stock symbol and returns the Yahoo finance URL
symbol = input("Enter a ticker symbol for a stock: ")
url = url_1 + symbol + url_2 + symbol + url_3
else:
url = url_1 + symbol + url_2 + symbol + url_3
return url
"""Adjusted to allow symbols and different URL's to pass through
Allows us to grab from historical, options, and other data sources"""
def dataGrabber(symbol=False, url=None):
if url == None:
# Obtain server's response to HTTP request
url = generateUrl(symbol)
src = requests.get(url)
# Converts source code to string
data = src.text
return data
# Returns the current price of the underlying
def getCurrentPrice(symbol=False):
# Isolates the currentPrice variable
data = dataGrabber(symbol)
split = data.split("currentPrice\":", 1)
halfList = split[1]
split2 = halfList.split("}", 1)
currentPrice = split2[0] + "}"
finalPrice = currentPrice.split("""{"raw":""")[1].split(",")[0]
# Print and return current price
print(symbol, finalPrice)
return round(float(finalPrice),3)
# Returns historical data in Pandas DataFrame format
def get_historical(symbol=None, to_csv=False):
# Gets CSV of historical data on stock for the past 5 years
t = str(int(time.time()))
url_1 = "https://query1.finance.yahoo.com/v7/finance/download/"
url_2 = "?period1=1433980800&period2=" + t
url_3 = "&interval=1d&events=history"
def fixNull(x):
if x == 'null':return 0
else:return x
if symbol == None:
symbol = input("Enter a ticker symbol for a stock: ")
url = url_1 + symbol + url_2 + url_3
# Keeps track of url that could be used as a download link
with requests.Session() as s:
download = s.get(url)
decoded_content = download.content.decode('utf-8')
cr = csv.reader(decoded_content.splitlines(), delimiter=',')
historical_data = list(cr)
c = True
data = []
for row in historical_data:
row = list(map(fixNull, row))
if c:
columns = row
c=False
else:
data.append(row)
df = pd.DataFrame(data=data, columns=columns)
# Resets Parsing Error
if symbol == "^KS11":
df = df.set_index('Date')
df.loc['2020-05-06', 'Low'] = str(float(df.loc['2020-05-06', 'Low']) / 10)
df = df.reset_index()
# Allows us to save as a CSV
if to_csv != False:
df.to_csv(symbol + ".csv")
return df
Firstly, I want to load each country's index and currency ticker. To do this I compiled a csv file (data/Stock Market Indices by Country - Sheet1.csv
) to hold this information. I use the pandas .read_csv() function to load the data into a pandas DataFrame.
After this step I select only Symbol
, Country
, and Currency
as variables. These variables are the only variables needed to tidy this DataFrame.
# Load country index and currency tickers
# Note that only `Symbol`, `Country`, and `Currency` will be used.
country_indices = pd.read_csv("data/Stock Market Indices by Country - Sheet1.csv")[['Symbol', 'Country', 'Currency']]
print(country_indices)
The CoronaNet project features over 1000 researchers worldwide who compiled event data about Covid-19 dating back to January 2020.
I downloaded their dataset as a .csv
file and stored the file in my jupyter repository. I use the pandas .read_csv() function to properly load the data and store in and pandas DataFrame. This dataset will be used to classify major COVID-19 announcements.
# Download COVID-19 event data
# Data set goes from January 2020 until the end of August, thus only those portions will be analyzed.
corona = pd.read_csv("data/coronanet_release.csv")
corona.head()
Yahoo finance provides great baseline statistics for historical data, however, I need to expand on that data in order to gain the insights I seek. To do this, I created the clean_df()
function to convert the data returned from API calls into workable datasets.
I do this by converting Open
, Close
, High
, Low
, and Adj Close
into floats, and Date
into a datetime
class.
I then manipulate the data further to create Range
, Change
, AbsoluteChange
, percentChange
, AbsolutePercentChange
, and Volatility
to provide key statistics on price movement. This data will be used further on to gain insights into how each index performed relative to their peers.
# Historical Data From yahoo.finance.com is packaged the same way, so it makes sense to create functions to process these DataFrames.
# `Open`, `Close`, `High`, `Low`, and `Adj Close` are converted into floats
# `Date` is converted into a datetime class
# `Range`, `Change`, `AbsoluteChange`, `percentChange`, `AbsolutePercentChange`, and `Volatility` are created from the columns above.
def clean_df(df:pd.DataFrame, start_date=dt.datetime(2020, 1, 2), end_date=dt.datetime(2020, 11, 18)):
df['Date'].fillna(0)
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'] >= start_date]
df = df[df['Date'] <= end_date]
columns = ['Open', 'Close', 'High', 'Low', 'Adj Close']
for col in columns:
df[col].fillna(0)
try:df[col] = df[col].astype('float')
except:pass
df['Range'] = df['High'] - df['Low']
df['Change'] = df['Close'] - df['Open']
df['AbsoluteChange'] = df['Change'].abs()
df['percentChange'] = df['Change'] / df['Open']
df['AbsolutePercentChange'] = df['percentChange'].abs()
df['Volatility'] = df['Range'].abs() / df['Open']
return df
Index data comes from yahoo.finance and has the same file structure as listed above. The load_indices() function creates a master pandas DataFrame with each of the columns below and their correpsonding country. This allows me to keep a tidy DataFrame that is easily manipulatable.
# load_indices() creates a dictionary holding each country's designated index data.
def load_indices(df=country_indices):
master = pd.DataFrame(columns=['Country',
'Date',
'Open',
'High',
'Low',
'Close',
'Adj Close',
'Volume',
'Range',
'Change',
'AbsoluteChange',
'percentChange',
'AbsolutePercentChange',
'Volatility'])
for index in range(len(df)):
new_df = clean_df(get_historical(df.loc[index].Symbol))
new_df['Country'] = df.loc[index,"Country"]
master = pd.concat([master, new_df])
return master
indices = load_indices()
# Display the S&P 500 index which represents the United States stock market
# Note that the top index of each nation will be represented.
# The United States also has the Russell-2000, Dow Jones, VIX, among many others.
indices.head()
# Note that the number of trading days varies as a result of national holidays.
indices.Country.value_counts()
get_forex()
is built off the get_historical()
function and takes in two currencies to compare. A finance.yahoo.com DataFrame will be returned with the same columns seen for the index data.
"""
functions to set up forex data
Clean data to accurately merge
"""
def get_forex(cur1:str, cur2:str):
return clean_df(get_historical(cur1+cur2+'=X'))
get_forex('USD','GBP').head()
The CoronaNet Project's dataset has 41 different columns, however, we'll only use country
, target_country
, init_country_level
, date_announced
, and type
. We'll use this dataset to track COVID-19 events throughout 2020 and will eventually merge with Index data. Note that this dataset is mainly meant to provide baselines and dummy variables. Meta data analyses are drawn for observational purposes and understanding of the COVID-19 timeline.
In this section I plot the amount of events per country, I use events that are at the national level. These events have backing from the national government and have a greater magnitude than local or provincial events. The variable init_country_level
indicates the level of each event. Furthermore, in the first cell, the United States
is listed as United States of America
, thus, the name is changed to properly join with the Index data.
corona.columns
# Replace `United States of America` with `United States`
corona['target_country'] = corona['target_country'].str.replace('United States of America','United States')
corona['country'] = corona['country'].str.replace('United States of America','United States')
corona.country.unique()
# Views the number of events in each category
corona.init_country_level.value_counts()
# `No, it is at the national level` looks like it is the same as `National`
# View the source of the supposed error
corona[corona['init_country_level']=='No, it is at the national level'].country.value_counts()
# Finds that the error is from China!
# Replace `No, it is at the national level` values with `National`
corona['init_country_level'] = corona['init_country_level'].str.replace('No, it is at the national level', 'National')
corona.init_country_level.value_counts()
# Create a national events count to isolate each countries national events
national_events = corona[corona['init_country_level']=='National'].country.value_counts()
national_events = national_events.to_frame().reset_index()
national_events.columns = ['country', 'events_count']
national_events.head(15).plot.bar(x='country', y='events_count', title='Amount of National COVID-19 Events per Country', figsize=(10, 7))
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode,iplot,plot
init_notebook_mode(connected=True)
# Set up specific data structure for Map
data = dict(
type = 'choropleth',
colorscale = 'Viridis',
locations = national_events['country'],
locationmode = "country names",
z = national_events['events_count'],
text = national_events['country'],
colorbar = {'title' : 'National Covid Events'},
)
layout = dict(title = 'National Covid Event Counts by Country',
geo = dict(projection = {'type':'mercator'})
)
# Merge National Event counts back into the main `corona` DataFrame
corona = pd.merge(national_events, corona, how='right', on='country')
corona['date_announced'] = pd.to_datetime(corona['date_announced'])
corona.head()
# Display Map
choromap = go.Figure(data = [data],layout = layout)
iplot(choromap,validate=False)
# Build Dummy Variables for Event Type
corona = corona.merge(pd.get_dummies(corona['type']), left_index=True, right_index=True)
corona.columns
# Determine the amount of events per day during 2020
# `date_announced` is the variable indicating a new event
# Note that `date_announced` does not start a new event, but rather indicates that an event will happen
dd = corona[corona['date_announced'] <= dt.datetime.today()].date_announced.value_counts().to_frame()
dd.columns = ['Count']
plt.xlabel("Date")
plt.ylabel('Count')
dd.plot(title='Number of National COVID-19 Events Per Day', figsize=(10, 7))
The VIX is a measure of Market Volatility in the S&P 500 option spreads. The higher the VIX the more uncertainty in the marketplace.
The chart above high resembles the VIX (Volatility Index-Index), Below I quickly plot and run a regression between the closing price of the VIX (close
) and the number of Covid Announcements (date_announced
) between January and August 2020.
# Web Scrap VIX historical data and clean the DataFrame
VIX = clean_df(get_historical('^VIX'))
# WebScrap Vix historical data and clean the DataFrame
VIX.set_index('Date')['Close'].plot(title='VIX Closing Price 2020 Levels', figsize=(10,7))
# Merge VIX and events data
VX = pd.merge(VIX[['Close', 'Date']], dd.reset_index(), left_on='Date', right_on='index').set_index('index')
VX.sort_values('Close', ascending=False).head()
# Set up a Linear Regression using statsmodels.api
X = sm.add_constant(VX[['Count']])
Y = VX['Close']
output = sm.OLS(Y, X).fit()
# Output the statistical summary of the linear regression
print(output.summary())
The correlation between the Closing Price of the VIX and the # of Covid-19 announcements per day show significance. This model, with an R^2 of 0.792 is representative of 79.2% of the sample. This model could be an accurate indicator of the VIX, however, we would like to increase our R^2 if possible. More independent variables such as event magnitude could be added to create a more representative model.
Regarding the Beta Coefficients, both the constant and count of date_announcements
are statistically significant. Count has a t-stat of 23.847 which is significant to the 0.01% level.
+- 1.96 is statistically significant at the 95% significance level.
The VIX formula accounts for options spreads and is a measure of market uncertainty. Feel free to learn more about the VIX formula in the link below
I plot the amount of events of each nation using plotly's graphing capabilities. I generate a world map and use the National
event counts we accumulated to view the amount of national events for each nation.
As we can see above, Russia has the most national events from January 2020 - August 2020. Russia is followed by China, Bangladesh, and Guatemala. China and Russia have communist governments who have centralized control of the government. Guatemala and Bangladesh do have centralized control, however, they are generally poorly run governments. Government uncertainty may also be a reason for the quantity of events. These governments may have enacted, then retracted numerous events.
# Display Index data for the United States of America
indices[indices['Country'] == 'United States'].tail()
In this section I build a matrix()
function to calculate the relationships between DataFrames. I use this function to analyze the relationships between different indices and currency pairs. This function takes in a dependant column and takes the sample mean of that given dependant column through a time period. These time periods are specified by start_date
and end_date
. The means are sampled weekly, but can be changed by period
.
# `col` is the dependent column in the matrix
# `period` is the resampling/grouping method
# `values` is the Pandas DataFrame
# `start_date` and `end_date` are date parameters
# `country_indices` are the list of tracked countries
def matrix(col,
period='W',
values=indices,
start_date=dt.datetime(2020, 2, 15),
end_date=dt.datetime(2020, 11, 2),
country_indices=country_indices['Country'].to_list()):
count = 0
# Master DataFrame
df = pd.DataFrame()
for index in country_indices:
# Builds dataFrame if it is null
if count == 0:
df = values[values['Country'] == index].set_index('Date').resample(period).mean()[col].to_frame().reset_index()
df.columns = ['Date', index]
count += 1
else: # Continually Resamples and Appends to master DataFrame
df2 = values[values['Country'] == index].set_index('Date').resample(period).mean()[col].to_frame().reset_index()
df2.columns = ['Date', index]
df = pd.merge(df,
df2,
how='left',
on='Date')
count += 1
# Returns DataFrame between the selected dates
df = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
return df.set_index('Date')
# Gets variance of volotility over March and April 2020: the main effect of COVID-19 of markets
variance = matrix(col='Volatility', period='W', start_date=dt.datetime(2020, 3, 1), end_date=dt.datetime(2020, 5, 1)).var().to_frame()
variance.columns = ['Variance']
# Plots the Variance in ascending order based on total variance in volatility
variance.sort_values('Variance').plot.bar(title='Variance of Percent Changes', figsize=(12, 6))
India had the highest variance of any stock market in the sample set. India was closely followed by Australia, Indonesia, and the United States. The south eastern countries are expected to have higher volatility because the pandemic hit them first, however, China, the first country infected, had the second lowest volatility. This may be a result from their decisive actions to mitigate the virus, or the world's relative certainty in the chinese economy. India is known to be a volatile developing economy and is seen as such in this analysis.
# Create Matrix
vol = matrix(col='Volatility')
variance = vol.var().to_frame()
fig, ax = plt.subplots(figsize=(20,20))
# Plot the frequency of volatile weeks during March and April
# Plot each countries mean weekly volatility
for country in vol.columns:
plt.scatter(x=vol.index, y=vol[country], label=country)
plt.legend()
plt.title("Sample Nation Volatility 2020")
plt.ylabel("Volatility")
plt.xlabel("Date")
build_forex_matrix
is built to create a matrix correlating each of the currency pairs. Below Volatility is compared amongst the currency pairs.
def build_forex_matrix(data=country_indices.to_dict('list'), GET='matrix'):
# Re-Categorizes currencies to not get Euro Dollar data multiple times
try:
currency = data['Currency']
country = data['Country']
country.remove('Germany')
country.remove('Netherlands')
country.remove('France')
country.append('European Union')
currency.remove('EUR')
currency.remove('EUR')
except: pass
master = {}
master_df = pd.DataFrame(columns=['Country',
'Date',
'Open',
'High',
'Low',
'Close',
'Adj Close',
'Volume',
'Range',
'Change',
'AbsoluteChange',
'percentChange',
'AbsolutePercentChange',
'Volatility'])
# Iterate through countries and use each Country as a dependant variable
for index, c in enumerate(country):
slave = {}
main_currency = currency[index]
currencies_to_use = (currency[0:index] + currency[index+1:-1])
country_df = pd.DataFrame(columns=['Country',
'Date',
'Open',
'High',
'Low',
'Close',
'Adj Close',
'Volume',
'Range',
'Change',
'AbsoluteChange',
'percentChange',
'AbsolutePercentChange',
'Volatility'])
# Iterate through other countries and use as indpendant variables
for curr in currencies_to_use:
try:
# Returns forex data from yahoo.finance.com
temp_df = get_forex(main_currency, curr)
temp_df['Country'] = curr
# Appends to master DataFrame
country_df = pd.concat([country_df, temp_df])
except:
pass
# Creates a final matrix for Volatility for each Dependant Currency
final = matrix(col='Volatility',
period='W',
values=country_df,
start_date=dt.datetime(2020, 2, 15),
end_date=dt.datetime(2020, 4, 1),
country_indices=currencies_to_use).mean()
country_df['currency_pair'] = main_currency + '/' + country_df['Country']
master_df = pd.concat([master_df, country_df])
final[main_currency] = 0
master[main_currency] = final
# Returns DataFrame of Raw data if selected
if GET == 'DF':
return master_df
# Returns Matrix data
return pd.DataFrame(master)
foreign = build_forex_matrix()
foreign = foreign.fillna(0)
foreign = foreign.reindex(sorted(foreign.columns), axis=1)
foreign * 100
Note that the value of IDR, KRW is an outlier skewing our results. To correct this error, I reset the value to 0. This value allows other values to correctly
foreign.loc['IDR', 'KRW'] = 0
import seaborn as sns
fig, ax = plt.subplots(figsize=(20,20))
colormap = sns.diverging_palette(220, 10, as_cmap=True)
sns.heatmap(foreign, cmap=colormap, annot=True, fmt = ".2f")
plt.xticks(range(len(foreign.columns)), foreign.columns)
plt.yticks(range(len(foreign.columns)), foreign.columns)
plt.title("Currency Pair Volatility Correlations")
plt.show()
forex = build_forex_matrix(GET='DF')
forex_df = forex[(forex['Date'] >= dt.datetime(2020, 2, 15)) & (forex['Date'] <= dt.datetime(2020, 4, 4)) & (forex['currency_pair'] != 'IDR/JPY')]
forex_df = forex_df.groupby(['currency_pair']).mean().sort_values('AbsolutePercentChange', ascending=False).dropna()
forex_df.head(10)['AbsolutePercentChange'].plot.bar(title='Average Daily Percent Changes During Initial Outbreaks')
def winners_losers(country_indices=forex, target='Close', p_value=1.96):
currencies = forex.Country.unique()
master = {}
scores = {}
for main_currency in currencies:
master[main_currency] = {'Score':0, 'Wins':list(), "Loses":list(), 'Ties':list()}
scores[main_currency] = 0
for curr in currencies:
if curr != main_currency:
try:
k = get_forex(main_currency, curr)
k = k[(k['Date'] < dt.datetime(2020, 5, 1)) &(k["Date"] > dt.datetime(2020, 2, 10))]
y=np.array(k[target].dropna().values, dtype=float)
x=np.array(pd.to_datetime(k['Date'].dropna()).index.values, dtype=float)
slope, intercept, r_value, p_value, std_err = sp.linregress(x,y)
t_stat = slope / std_err
if t_stat > p_value:
master[main_currency]['Score'] += 1
scores[main_currency] += 1
master[main_currency]['Wins'].append(curr)
elif math.sqrt(t_stat * t_stat) <= p_value:
master[main_currency]['Ties'].append(curr)
else:
scores[main_currency] -= 1
master[main_currency]['Score'] -= 1
master[main_currency]['Loses'].append(curr)
except:
pass
return master, scores
data, scores = winners_losers()
pd.Series(scores).sort_values().plot.bar(figsize=(16, 6))
plt.title("Statistical Significance Scores of Currencies")
plt.ylabel("Score")
From the chart above we can see that the Indian Rupee was the most volatile currency during the heart of the COVID-19 surge. This could be a result of India's developing economy and their large, vulnerable population. India is also one of the most heavily traded stock markets in Asia. The sheer growth of India's democratic economy has few to no restrictions on investment which has triggered investment. However, India in relation to China, South Korea, and Japan, did not stem the spread of COVID-19. COVID-19 still runs rampant in India, but other countries are now reopened.
In general, the most stable currencies become reserve currencies
. Reserve currencies are stable currencies where nations will keep their money. For instance, unstable South American countries store their wealth in the United States; or former British colonies (like the US in 1783) would keep stores of the British pound or gold to store their wealth. In troubling times, reserve currencies tend to appreciate in value. In this scenario, the Japanese Yen appreciated the most, followed by the United States Dollar, Hong Kong Dollar, and Euro. Japan was the most successful in combatting COVID-19 compared to the United States who is facing a massive outbreak and more shutdowns. Furthermore, we can see that developing Asian countries are depreciating the most compared to their developed counterparts. What is interesting is that the British Pound had net depreciation. This may be because their leader Boris Johnson ignored COVID-19 warnings, or a sign that the financial dominance of a Britain has deteriorated after Brexit.
In this section I analyze the effect different COVID-19 events have on their respective stock market indices. For each of the events listed below, I create dummy variables and then combine the results into distinct groups. I will then run regression analyses to look for correlation amongst the variables. I hypothesize that days with more events will have have a higher volatility and days with high normally high counts will have a relief in volatility.
corona.type.unique()
# This function creates dummy variables for each type of event announcement
def build_dummies(df:pd.DataFrame=corona, countries:list=country_indices['Country'].tolist()):
df = df[df['init_country_level'] == 'National']
dates = df['date_announced'].unique()
df = df.groupby(['country', 'date_announced'])[df['type'].unique()].sum()
df['next_day'] = 0
return df
master = build_dummies()
master.head()
The dummy data is left joined on indices to merge each trading day of each country on their COVID-19 events. Nan events are set to 0. The reasoning for this is to set up a dummy variable regression on Volatility. This regression aims to see how each event affects consumer sentiment.
m = pd.merge(indices[['Country', 'Date']], master, left_on=['Country','Date'], right_on=['country', 'date_announced'], how='left')
m = m.fillna(0)
m.head()
# Head of indices to show the stock market data to be merged.
indices.head()
next_day
Column¶This dummy variable allows us to analyze the day after change due to an event. We can use this data to run a difference in differences regression model. We will run a pooled regression on Date
and Country
. The aim is to see if individual nations' dummy variables can be combined to provide insight into volatile markets.
We will merge the indices
DataFrame with dummy
on Country
and Date
to properly match our data. Then we will combine similar dummy variables into groups (Health, Restriction, Lockdown, & Information).
# Length of the merged index
count = len(m.index)
# Creates temporary DataFrame
dummy = m
# Groups index data in temporary dataFrame
index_data = indices.groupby(['Country', 'Date']).mean()
for ind in m.index:
# Appends next day
get = dummy.loc[ind].tolist()
get[1] = dummy.loc[ind+1].tolist()[1]
# Sets day after variable to next market day
get[-1] = 1
dummy.loc[count] = get
count += 1
# Creates table of dummy variables
dummy.sort_values('Date').head()
# Merge Coronavirus dummy data on market days for respective indices
# Create four broad columns representing the categories for each event type
dummy['Health'] = dummy['Health Testing'] + dummy['Health Resources'] + dummy['Health Monitoring']
dummy['Restrictions'] = (dummy['Restriction and Regulation of Government Services'] + dummy['Restriction and Regulation of Businesses'] +
dummy['Restrictions of Mass Gatherings'] + dummy['External Border Restrictions']
+ dummy['Internal Border Restrictions'] + dummy['Social Distancing'])
dummy['Lockdown'] = (dummy['Declaration of Emergency'] + dummy['Lockdown'] + dummy['Curfew'])
dummy['Information'] = (dummy['Other Policy Not Listed Above'] + dummy['Public Awareness Measures'] +
dummy['New Task Force, Bureau or Administrative Configuration'] +
dummy['Anti-Disinformation Measures'] + dummy['Hygiene'])
dummies_to_delete = ['Health Testing', 'Health Resources', 'Restriction and Regulation of Government Services',
'Restriction and Regulation of Businesses', 'Restrictions of Mass Gatherings',
'Closure and Regulation of Schools', 'External Border Restrictions', 'Other Policy Not Listed Above',
'Internal Border Restrictions', 'Public Awareness Measures', 'Declaration of Emergency',
'Health Monitoring', 'Quarantine', 'Social Distancing', 'New Task Force, Bureau or Administrative Configuration',
'Anti-Disinformation Measures', 'Hygiene', 'Curfew']
for dumb in dummies_to_delete:
del dummy[dumb]
dummy.head()
# Merge Dummy data with indices
master = pd.merge(dummy, indices, on=['Date', 'Country'], how='left').sort_values(['Country', 'Date'])
# Drop null results
master = master.dropna()
# Event_Count is the amount of dummies on a given day
master['Event_Count'] = (master['Lockdown'] + master['Health'] + master['Restrictions'] + master['Information'])
# *-1 is used to properly sort the data, was used by trial.
master['next_day'] = master['next_day'] * -1
# Sort values by Country, then Date, and lastly next_day
master = master.sort_values(['Country', 'Date', 'next_day'], ascending=True)
master['next_day'] = master['next_day'] * -1
# Saved Dataset for other used
master.to_excel("MatthewRau_IndexData.xlsx", index=False)
master.tail()
# Full Regression of all countries on volatility
temp_df = master
independents = ['Lockdown', 'Health', 'Restrictions', 'Information']
X = sm.add_constant(temp_df[independents])
Y = temp_df['Volatility']
output = sm.OLS(Y, X).fit()
output.summary()
# Create differences in Volalatility and Absolute Change in Volatility
# Is necessary for first differences model
master['Volatility_Change'] = master['Volatility'].diff(1)
master['Absolute_Volatility_Change'] = master['Volatility_Change'].abs()
master.head()
# Creates immediate effect DataFrame
# Only takes next_day == 1 (Day After Events)
master_next = master.loc[master['next_day'] == 1.0].dropna()
master_next.head()
# Makes each dummy variable binary instead of numerical
# Normally dummy variables are binary but I combined numerous dummies for certain days.
binary_master = master
binary_master['Lockdown'] = (binary_master['Lockdown'] > 0)
binary_master['Health'] = (binary_master['Health'] > 0)
binary_master['Restrictions'] = (binary_master['Restrictions'] > 0)
binary_master['Information'] = (binary_master['Information'] > 0)
binary_master.tail()
A pooled first differences regression is a regression on panel data and changes over time. I take the differences between volatility for the day-after an event happens. I capture the immediate effect and group the data by its country index. This model is slit to provide clarity among groups while looking at the overall effect.
# Pooled OLS package for first differences model.
from linearmodels.panel import PooledOLS
# Create temporary DataFrame
temp_df = binary_master
# Only use February 15 - May 1. Aims to capture COVID-19 surge.
temp_df = temp_df[(temp_df['Date'] > dt.datetime(2020, 2, 15)) & (temp_df['Date'] < dt.datetime(2020, 5, 1))]
# Categorical Variable for Country
co = pd.Categorical(temp_df.Country)
# Set Double index for first differnces
temp_df = temp_df.set_index(['Country', 'Date'])
temp_df['Country'] = co
# Signify exogenous variables
exog_vars = ['Lockdown', 'Health', 'Restrictions', 'Information']
exog = sm.add_constant(temp_df[exog_vars])
# Regress and Fit model
mod = PooledOLS(temp_df.Volatility_Change, exog)
pooled_res = mod.fit()
pooled_res
# Built to analyze regressions on certain dependant and independant variables for ALL nations.
def highest_corr(dependant='Volatility', df:pd.DataFrame=master, independants=['Lockdown', 'Health', 'Restrictions', 'Information'] ):
data = {}
for country in master.Country.unique():
temp_df = df.loc[df['Country'] == country].dropna()
X = temp_df[independants]
y = temp_df[dependant]
#Fit the linear regression
regr = LinearRegression()
regr.fit(X, y)
data[country] = (regr.score(X, y))
return pd.Series(data)
# Bar Chart to show overall correlation of Dummy Variables on Volatility
fig1, ax = plt.subplots(1,3, figsize=(20,5))
final = pd.DataFrame()
final['Volatility'] = highest_corr().sort_values().plot.bar(ax=ax[0], color='blue', title="R2 of Dummy Variables Regressed on Volatility")
final['Volatility_Change'] = highest_corr(dependant='Volatility_Change').sort_values().plot.bar(ax=ax[1], color='purple', title="R2 of Dummy Variables Regressed on Volatility Change")
final['Absolute_Volatility_Change'] = highest_corr(dependant="Absolute_Volatility_Change").sort_values().plot.bar(ax=ax[2], color='green', title="R2 of Dummy Variables Regressed on Absolute Volatility Change")
# Bar Chart to show overall correlation of Dummy Variables on Volatility
fig1, ax = plt.subplots(1,3, figsize=(20,5))
final = pd.DataFrame()
final['Volatility'] = highest_corr(independants=['Event_Count']).sort_values().plot.bar(ax=ax[0], color='blue', title="R2 of Events/Day Regressed on Volatility")
final['Volatility_Change'] = highest_corr(independants=['Event_Count'], dependant='Volatility_Change').sort_values().plot.bar(ax=ax[1], color='purple', title="R2 of Events/Day Regressed on Volatility Change")
final['Absolute_Volatility_Change'] = highest_corr(independants=['Event_Count'], dependant="Absolute_Volatility_Change").sort_values().plot.bar(ax=ax[2], color='green', title="R2 of Events/Day Regressed on Absolute Volatility Change")
# Create Simple regression to calculate which dummy variables has the largest effect on Volatility
last_df = pd.DataFrame()
for var in ['Lockdown', 'Health', 'Restrictions', 'Information']:
last_df[var] = highest_corr(independants=[var])
last_df.sort_values('Restrictions').plot.bar(figsize=(16, 9))
Overall there was little to no effect each coronvirus event on the overall volatility. Consumers were unfazed by constant COVID-19 updates. Their level of uncertainty did not change much, however, some nations had higher correlation than others. For instance, the United States had higher correlation among its dummy variables to volatility than other nations. Malaysia and Japan had the highest R-squared for Volatility Changes and Absolute Volatility Changes respectively, however, the R-squared was nowhere near significant. The United States had an R-squared around 0.25 meaning only 25% of the regression model was explained. The United States, however, did relate the most to the dummy variables. COVID-19 events shaped part of day-after consumer sentiment about markets, however, there is significant omitted variable bias that needs to be explained.
The pooled first differences regression model and cumulative multiple linear regression model were not explanatory of market volatility in any form. The pooled regression model had an extremely low R-squared and the independant variables did not show any significant T-scores. On the other hand, the full regression including all countries and the four dummy variables yielded significant T-scores for all dummy variables. The T-scores were significant past the 1% level, showing that each Information
, Restrictions
, Lockdown
and Health
should be included in a full regression model.
Each dummy variable by itself, showed that Restrictions
were the most dependant dummy variable for singular linear regressions on Volatility. This means that Restrictions
probably weighs more on Volatility than other dummy variables.
People put their wealth in safe places in times of uncertainty and in the foreign exchange market it is shown that Japan had been the safest place. Japan did not have a robust financial system until the 1960's and has grown ever since. Japan's ability to control the pandemic better than its developed partners proved fruitful in an appreciation of the Yen. Developing nations like India and Indonesia seen as major world economies were not as fortunate. They received net depreciation and may possibly mean revert in the future. The Indian Rupee and Indonesian Rupiah may be a solid investment for the prospects of a near-future mean reversion.
Although COVID-19 event data did not fully capture volatility movement in major stock market indices, it did show that some event types would a play significant effect on a more robust model. Data that should be included for next time would be COVID-19 case rates and death rates. These rates are the baseline for lockdowns and may be better indicators of different events and volatility itself.. These events in effect have some effect on markets directly and COVID-19 case rates could cause uncertainty in markets before restrictions are put in place. This pandemic has confused market participants and everyone alike, and no one model will capture the overall effect this pandemic has had on humanity.
Papers you might enjoy