Dataframes of Pandas: Practice Session

This is practice session on the Pandas's DataFrame. If you have not already taken the first two introductory session on Panda's Series (can be accessed here ) and Panda's DataFrame (can be accessed here), please conver those first.
1. Import comma-separated (CSV) or tab separated (TSV) file into Python as a DataFrame object of the Pandas libray
2. Explore the DataFrame properties
3. Getting subsets of the data.
4. Apply groupby() functions 5. Use the reset_index() function.
To download the datasets used in this tutorial, pleas see the following link
tips.csv

In [2]:
# Import the Data
import pandas as pd
In [6]:
cd "D:\Dropbox\CLASSES\Data Science for Finance\PYTHON 1"
D:\Dropbox\CLASSES\Data Science for Finance\PYTHON 1
In [7]:
tips = pd.read_csv('https://opendoors.pk/wp-content/uploads/2020/02/tips.csv')
In [8]:
tips.head()
Out[8]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Filter data for smokers

In [20]:
# First get all the row numbers where smoker is equal to No
rownumbers = tips['smoker'] =='No'
In [34]:
# Now get the actual rows
non_smoker = tips.loc[rownumbers]
In [35]:
non_smoker.head()
Out[35]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Filter Female records in the nonsmoker group

In [36]:
# Get the rows index first
femalerows = non_smoker['sex'] == 'Female'
In [41]:
# Now get the actual rows
nonsmoker_females = non_smoker.loc[femalerows]
nonsmoker_females.head()
Out[41]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
11 35.26 5.00 Female No Sun Dinner 4
14 14.83 3.02 Female No Sun Dinner 2
16 10.33 1.67 Female No Sun Dinner 3
In [38]:
# Interestingly, the following also works
nononsmoker_females = non_smoker[femalerows]
In [40]:
nonsmoker_females.head()
Out[40]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
11 35.26 5.00 Female No Sun Dinner 4
14 14.83 3.02 Female No Sun Dinner 2
16 10.33 1.67 Female No Sun Dinner 3

Filter Non Smoker Females in one go

In [43]:
tips.head()
Out[43]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
In [46]:
nonsmoker_females = tips.loc[(tips['smoker'] == 'No') & (tips['sex'] == 'Female')]
In [47]:
nonsmoker_females.head()
Out[47]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
11 35.26 5.00 Female No Sun Dinner 4
14 14.83 3.02 Female No Sun Dinner 2
16 10.33 1.67 Female No Sun Dinner 3

AVERAGE BY GROUPS

Average total_bill by smoker, day, and time

In [50]:
tips.groupby(['smoker', 'day', 'time'])['total_bill'].mean()
Out[50]:
smoker  day   time  
No      Fri   Dinner    19.233333
              Lunch     15.980000
        Sat   Dinner    19.661778
        Sun   Dinner    20.506667
        Thur  Dinner    18.780000
              Lunch     17.075227
Yes     Fri   Dinner    19.806667
              Lunch     12.323333
        Sat   Dinner    21.276667
        Sun   Dinner    24.120000
        Thur  Lunch     19.190588
Name: total_bill, dtype: float64

Get the results as a DataFrame

In [52]:
tips.groupby(['smoker', 'day', 'time'])['total_bill'].mean().reset_index()
Out[52]:
smoker day time total_bill
0 No Fri Dinner 19.233333
1 No Fri Lunch 15.980000
2 No Sat Dinner 19.661778
3 No Sun Dinner 20.506667
4 No Thur Dinner 18.780000
5 No Thur Lunch 17.075227
6 Yes Fri Dinner 19.806667
7 Yes Fri Lunch 12.323333
8 Yes Sat Dinner 21.276667
9 Yes Sun Dinner 24.120000
10 Yes Thur Lunch 19.190588
In [ ]: