Dataframes of Pandas: Practice Session ¶

This is practice session on the Pandas's DataFrame. If you have not already taken the first two introductory session on Panda's Series (can be accessed here ) and Panda's DataFrame (can be accessed here), please conver those first.
1. Import comma-separated (CSV) or tab separated (TSV) file into Python as a DataFrame object of the Pandas libray
2. Explore the DataFrame properties
3. Getting subsets of the data.
4. Apply groupby() functions 5. Use the reset_index() function.
To download the datasets used in this tutorial, pleas see the following link
tips.csv

# Import the Data
import pandas as pd

cd "D:\Dropbox\CLASSES\Data Science for Finance\PYTHON 1"

D:\Dropbox\CLASSES\Data Science for Finance\PYTHON 1

tips = pd.read_csv('https://opendoors.pk/wp-content/uploads/2020/02/tips.csv')

tips.head()

Filter data for smokers¶

# First get all the row numbers where smoker is equal to No
rownumbers = tips['smoker'] =='No'

# Now get the actual rows
non_smoker = tips.loc[rownumbers]

non_smoker.head()

Filter Female records in the nonsmoker group¶

# Get the rows index first
femalerows = non_smoker['sex'] == 'Female'

# Now get the actual rows
nonsmoker_females = non_smoker.loc[femalerows]
nonsmoker_females.head()

# Interestingly, the following also works
nononsmoker_females = non_smoker[femalerows]

nonsmoker_females.head()

Filter Non Smoker Females in one go¶

tips.head()

nonsmoker_females = tips.loc[(tips['smoker'] == 'No') & (tips['sex'] == 'Female')]

nonsmoker_females.head()

AVERAGE BY GROUPS¶

Average total_bill by smoker, day, and time

tips.groupby(['smoker', 'day', 'time'])['total_bill'].mean()

smoker  day   time  
No      Fri   Dinner    19.233333
              Lunch     15.980000
        Sat   Dinner    19.661778
        Sun   Dinner    20.506667
        Thur  Dinner    18.780000
              Lunch     17.075227
Yes     Fri   Dinner    19.806667
              Lunch     12.323333
        Sat   Dinner    21.276667
        Sun   Dinner    24.120000
        Thur  Lunch     19.190588
Name: total_bill, dtype: float64

Get the results as a DataFrame¶

tips.groupby(['smoker', 'day', 'time'])['total_bill'].mean().reset_index()

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

Practice on Pandas Filtering Slicing Subscripting Groupby in Python