Lecture 5.2 – Pandas Groupby in Financial Data


GROUPBY IN FINANCIAL DATA

In Lecture 5.1, we used the groupby() method on the tips dataset. In this session, we are going to use groupby function in a panel datasets and other areas of financial research.

In [19]:
# Import Pandas library
import pandas as pd
import numpy as np

1. Import Data from CSV

In [2]:
# Load the stocks datasets

# Load the data from the given url into Pandas Dataframe, call it stocks
stocks = pd.read_csv('https://opendoors.pk/wp-content/uploads/2020/03/prices.csv')
In [3]:
stocks.head()
Out[3]:
symbol date close
0 AABS 2013-01-07 103.500000
1 AABS 2013-01-14 103.000000
2 AABS 2013-01-16 103.000000
3 AABS 2013-01-17 105.750000
4 AABS 2013-02-25 96.699997
In [4]:
stocks.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
symbol    10000 non-null object
date      10000 non-null object
close     10000 non-null float64
dtypes: float64(1), object(2)
memory usage: 234.5+ KB

If you notice, the date variable is an object, which means it is a text column, and note a datetime variable of Pandas. For further use, we need to convert it to a datetime variable

2. Convert string to datetime

In [5]:
stocks['date'] = pd.to_datetime(stocks['date'])
In [6]:
stocks.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
symbol    10000 non-null object
date      10000 non-null datetime64[ns]
close     10000 non-null float64
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 234.5+ KB

3. Calculate stock returns

For stock returns, we can use the pct_change method with the bygroups method as the returns need to be calculated for each symbol. If we do not use the groupby function, then the percentage changes of one firm will be mixed with other firms.

In [7]:
stocks["returns"] = stocks.groupby("symbol")["close"].pct_change(1)
In [8]:
stocks.returns.describe()
Out[8]:
count    9999.000000
mean        0.007478
std         0.368225
min        -0.971447
25%        -0.015200
50%         0.000000
75%         0.017206
max        33.855074
Name: returns, dtype: float64

4. Extract month variable

.dt is a Pandas Series method and provides more options related to datetime. Therefore, when we need to extract year from datetime variable, we shall add .dt.year to the Series or dt.month to extract month

In [9]:
#Create a month variable in the dataframe
stocks['month'] = stocks['date'].dt.month
In [10]:
stocks.head()
Out[10]:
symbol date close returns month
0 AABS 2013-01-07 103.500000 NaN 1
1 AABS 2013-01-14 103.000000 -0.004831 1
2 AABS 2013-01-16 103.000000 0.000000 1
3 AABS 2013-01-17 105.750000 0.026699 1
4 AABS 2013-02-25 96.699997 -0.085579 2

5. Extract Monthly Date

To create a combination of year month dates, we need to use the to_period(“M”) method of the .dt method. This is necessary if we need to find group statistics separately on each month of each year.

In [11]:
stocks['monthdate'] = stocks['date'].dt.to_period("M")
In [12]:
stocks.head()
Out[12]:
symbol date close returns month monthdate
0 AABS 2013-01-07 103.500000 NaN 1 2013-01
1 AABS 2013-01-14 103.000000 -0.004831 1 2013-01
2 AABS 2013-01-16 103.000000 0.000000 1 2013-01
3 AABS 2013-01-17 105.750000 0.026699 1 2013-01
4 AABS 2013-02-25 96.699997 -0.085579 2 2013-02

6. Find Monthly returns for each stock

Monthly returns is the product of (1+ daily returns) groupby monthdate foreach stock

6.1 How to find product of a Series

In [13]:
# Let us find product of the first 5 rows the returns sereis
target = [1,2,3,4,5],['returns']
subset = stocks.loc[target]
In [14]:
subset
Out[14]:
returns
1 -0.004831
2 0.000000
3 0.026699
4 -0.085579
5 0.044467
In [15]:
# add 1 to the series
subset = subset + 1
In [16]:
# Find Product
subset.product()
Out[16]:
returns    0.975845
dtype: float64

6.2 Back to main problem

In [17]:
# First create the 1+returns column
stocks['return1'] = stocks['returns'] + 1
In [23]:
#Create a groupby object
group = stocks.groupby(['symbol', 'monthdate'])

# From the group object, apply the product function to return1 column
stocks['mret'] = group["return1"].transform(np.product)
In [24]:
# Deduce one from the mret
stocks['mret'] = stocks['mret'] - 1


stocks.head()
Out[24]:
symbol date close returns month monthdate return1 mret
0 AABS 2013-01-07 103.500000 NaN 1 2013-01 NaN 0.021739
1 AABS 2013-01-14 103.000000 -0.004831 1 2013-01 0.995169 0.021739
2 AABS 2013-01-16 103.000000 0.000000 1 2013-01 1.000000 0.021739
3 AABS 2013-01-17 105.750000 0.026699 1 2013-01 1.026699 0.021739
4 AABS 2013-02-25 96.699997 -0.085579 2 2013-02 0.914421 -0.044917

So how did we find the group statistics, steps?

  1. Create the group object
  2. From the group object, apply transform method the selected variable

7. Find standard deviation of the daily returns for each firm

In [29]:
# 7.1 Find the group object
group = stocks.groupby(['symbol', 'monthdate'])

#7.2 From the group object, apply sd function on the return column, 
# and create a new variable sd

stocks['sd'] = group['returns'].transform(np.std)
In [30]:
stocks.head()
Out[30]:
symbol date close returns month monthdate return1 mret sd
0 AABS 2013-01-07 103.500000 NaN 1 2013-01 NaN 0.021739 0.016982
1 AABS 2013-01-14 103.000000 -0.004831 1 2013-01 0.995169 0.021739 0.016982
2 AABS 2013-01-16 103.000000 0.000000 1 2013-01 1.000000 0.021739 0.016982
3 AABS 2013-01-17 105.750000 0.026699 1 2013-01 1.026699 0.021739 0.016982
4 AABS 2013-02-25 96.699997 -0.085579 2 2013-02 0.914421 -0.044917 0.091957

8. Reduce the data to a monthly frequency

Since we have generated monthly returns and sd, we can now keep only one observation for each month for each firm. For this, we can use the duplicates_drop() function

In [33]:
mstocks = stocks.drop_duplicates(subset ='monthdate', keep = 'last') 
In [35]:
mstocks.head()
Out[35]:
symbol date close returns month monthdate return1 mret sd
8975 AEL 2017-07-31 14.90 0.000000 7 2017-07 1.000000 -0.067001 0.045379
8979 AEL 2017-08-31 14.30 0.056911 8 2017-08 1.056911 -0.040268 0.064265
8986 AEL 2017-09-21 12.60 -0.030769 9 2017-09 0.969231 -0.118881 0.061009
8995 AEL 2017-10-26 12.75 0.062500 10 2017-10 1.062500 0.011905 0.050943
9013 AGIC 2013-01-30 16.25 0.000000 1 2013-01 1.000000 0.274510 0.031025

9. Write this data to Excel

In [38]:
mstocks.to_excel('Monthly Data.xls')
In [ ]: