GROUPBY IN FINANCIAL DATA¶
In Lecture 5.1, we used the groupby() method on the tips dataset. In this session, we are going to use groupby function in a panel datasets and other areas of financial research.
# Import Pandas library
import pandas as pd
import numpy as np
1. Import Data from CSV¶
# Load the stocks datasets
# Load the data from the given url into Pandas Dataframe, call it stocks
stocks = pd.read_csv('https://opendoors.pk/wp-content/uploads/2020/03/prices.csv')
stocks.head()
stocks.info()
If you notice, the date variable is an object, which means it is a text column, and note a datetime variable of Pandas. For further use, we need to convert it to a datetime variable
2. Convert string to datetime¶
stocks['date'] = pd.to_datetime(stocks['date'])
stocks.info()
3. Calculate stock returns¶
For stock returns, we can use the pct_change method with the bygroups method as the returns need to be calculated for each symbol. If we do not use the groupby function, then the percentage changes of one firm will be mixed with other firms.
stocks["returns"] = stocks.groupby("symbol")["close"].pct_change(1)
stocks.returns.describe()
4. Extract month variable¶
.dt is a Pandas Series method and provides more options related to datetime. Therefore, when we need to extract year from datetime variable, we shall add .dt.year to the Series or dt.month to extract month
#Create a month variable in the dataframe
stocks['month'] = stocks['date'].dt.month
stocks.head()
5. Extract Monthly Date¶
To create a combination of year month dates, we need to use the to_period(“M”) method of the .dt method. This is necessary if we need to find group statistics separately on each month of each year.
stocks['monthdate'] = stocks['date'].dt.to_period("M")
stocks.head()
6. Find Monthly returns for each stock¶
Monthly returns is the product of (1+ daily returns) groupby monthdate foreach stock
6.1 How to find product of a Series¶
# Let us find product of the first 5 rows the returns sereis
target = [1,2,3,4,5],['returns']
subset = stocks.loc[target]
subset
# add 1 to the series
subset = subset + 1
# Find Product
subset.product()
6.2 Back to main problem¶
# First create the 1+returns column
stocks['return1'] = stocks['returns'] + 1
#Create a groupby object
group = stocks.groupby(['symbol', 'monthdate'])
# From the group object, apply the product function to return1 column
stocks['mret'] = group["return1"].transform(np.product)
# Deduce one from the mret
stocks['mret'] = stocks['mret'] - 1
stocks.head()
So how did we find the group statistics, steps?¶
- Create the group object
- From the group object, apply transform method the selected variable
7. Find standard deviation of the daily returns for each firm¶
# 7.1 Find the group object
group = stocks.groupby(['symbol', 'monthdate'])
#7.2 From the group object, apply sd function on the return column,
# and create a new variable sd
stocks['sd'] = group['returns'].transform(np.std)
stocks.head()
8. Reduce the data to a monthly frequency¶
Since we have generated monthly returns and sd, we can now keep only one observation for each month for each firm. For this, we can use the duplicates_drop() function
mstocks = stocks.drop_duplicates(subset ='monthdate', keep = 'last')
mstocks.head()
9. Write this data to Excel¶
mstocks.to_excel('Monthly Data.xls')