GROUPBY IN FINANCIAL DATA¶

In Lecture 5.1, we used the groupby() method on the tips dataset. In this session, we are going to use groupby function in a panel datasets and other areas of financial research.

# Import Pandas library
import pandas as pd
import numpy as np

1. Import Data from CSV¶

# Load the stocks datasets

# Load the data from the given url into Pandas Dataframe, call it stocks
stocks = pd.read_csv('https://opendoors.pk/wp-content/uploads/2020/03/prices.csv')

stocks.head()

stocks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
symbol    10000 non-null object
date      10000 non-null object
close     10000 non-null float64
dtypes: float64(1), object(2)
memory usage: 234.5+ KB

If you notice, the date variable is an object, which means it is a text column, and note a datetime variable of Pandas. For further use, we need to convert it to a datetime variable

2. Convert string to datetime¶

stocks['date'] = pd.to_datetime(stocks['date'])

stocks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
symbol    10000 non-null object
date      10000 non-null datetime64[ns]
close     10000 non-null float64
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 234.5+ KB

3. Calculate stock returns¶

For stock returns, we can use the pct_change method with the bygroups method as the returns need to be calculated for each symbol. If we do not use the groupby function, then the percentage changes of one firm will be mixed with other firms.

stocks["returns"] = stocks.groupby("symbol")["close"].pct_change(1)

stocks.returns.describe()

count    9999.000000
mean        0.007478
std         0.368225
min        -0.971447
25%        -0.015200
50%         0.000000
75%         0.017206
max        33.855074
Name: returns, dtype: float64

4. Extract month variable¶

.dt is a Pandas Series method and provides more options related to datetime. Therefore, when we need to extract year from datetime variable, we shall add .dt.year to the Series or dt.month to extract month

#Create a month variable in the dataframe
stocks['month'] = stocks['date'].dt.month

stocks.head()

5. Extract Monthly Date¶

To create a combination of year month dates, we need to use the to_period(“M”) method of the .dt method. This is necessary if we need to find group statistics separately on each month of each year.

stocks['monthdate'] = stocks['date'].dt.to_period("M")

stocks.head()

6. Find Monthly returns for each stock¶

Monthly returns is the product of (1+ daily returns) groupby monthdate foreach stock

6.1 How to find product of a Series¶

# Let us find product of the first 5 rows the returns sereis
target = [1,2,3,4,5],['returns']
subset = stocks.loc[target]

subset

# add 1 to the series
subset = subset + 1

# Find Product
subset.product()

returns    0.975845
dtype: float64

6.2 Back to main problem¶

# First create the 1+returns column
stocks['return1'] = stocks['returns'] + 1

#Create a groupby object
group = stocks.groupby(['symbol', 'monthdate'])

# From the group object, apply the product function to return1 column
stocks['mret'] = group["return1"].transform(np.product)

# Deduce one from the mret
stocks['mret'] = stocks['mret'] - 1


stocks.head()

So how did we find the group statistics, steps?¶

Create the group object
From the group object, apply transform method the selected variable

7. Find standard deviation of the daily returns for each firm¶

# 7.1 Find the group object
group = stocks.groupby(['symbol', 'monthdate'])

#7.2 From the group object, apply sd function on the return column, 
# and create a new variable sd

stocks['sd'] = group['returns'].transform(np.std)

stocks.head()

8. Reduce the data to a monthly frequency¶

Since we have generated monthly returns and sd, we can now keep only one observation for each month for each firm. For this, we can use the duplicates_drop() function

mstocks = stocks.drop_duplicates(subset ='monthdate', keep = 'last')

mstocks.head()

9. Write this data to Excel¶

mstocks.to_excel('Monthly Data.xls')

	symbol	date	close
0	AABS	2013-01-07	103.500000
1	AABS	2013-01-14	103.000000
2	AABS	2013-01-16	103.000000
3	AABS	2013-01-17	105.750000
4	AABS	2013-02-25	96.699997

	symbol	date	close	returns	month
0	AABS	2013-01-07	103.500000	NaN	1
1	AABS	2013-01-14	103.000000	-0.004831	1
2	AABS	2013-01-16	103.000000	0.000000	1
3	AABS	2013-01-17	105.750000	0.026699	1
4	AABS	2013-02-25	96.699997	-0.085579	2

	symbol	date	close	returns	month	monthdate
0	AABS	2013-01-07	103.500000	NaN	1	2013-01
1	AABS	2013-01-14	103.000000	-0.004831	1	2013-01
2	AABS	2013-01-16	103.000000	0.000000	1	2013-01
3	AABS	2013-01-17	105.750000	0.026699	1	2013-01
4	AABS	2013-02-25	96.699997	-0.085579	2	2013-02

	returns
1	-0.004831
2	0.000000
3	0.026699
4	-0.085579
5	0.044467

	symbol	date	close	returns	month	monthdate	return1	mret
0	AABS	2013-01-07	103.500000	NaN	1	2013-01	NaN	0.021739
1	AABS	2013-01-14	103.000000	-0.004831	1	2013-01	0.995169	0.021739
2	AABS	2013-01-16	103.000000	0.000000	1	2013-01	1.000000	0.021739
3	AABS	2013-01-17	105.750000	0.026699	1	2013-01	1.026699	0.021739
4	AABS	2013-02-25	96.699997	-0.085579	2	2013-02	0.914421	-0.044917

	symbol	date	close	returns	month	monthdate	return1	mret	sd
8975	AEL	2017-07-31	14.90	0.000000	7	2017-07	1.000000	-0.067001	0.045379
8979	AEL	2017-08-31	14.30	0.056911	8	2017-08	1.056911	-0.040268	0.064265
8986	AEL	2017-09-21	12.60	-0.030769	9	2017-09	0.969231	-0.118881	0.061009
8995	AEL	2017-10-26	12.75	0.062500	10	2017-10	1.062500	0.011905	0.050943
9013	AGIC	2013-01-30	16.25	0.000000	1	2013-01	1.000000	0.274510	0.031025

Groupby function of Pandas for Financial Data | Python