Dataframes of Pandas: Practice Session ¶
This is practice session on the Pandas's DataFrame. If you have not already taken the first two introductory session on Panda's Series
(can be accessed here )
and Panda's DataFrame (can be accessed here), please conver those first.
1. Import comma-separated (CSV) or tab separated (TSV) file into Python as a DataFrame object of the Pandas libray
2. Explore the DataFrame properties
3. Getting subsets of the data.
4. Apply groupby() functions
5. Use the reset_index() function.
To download the datasets used in this tutorial, pleas see the following link
tips.csv
In [2]:
# Import the Data
import pandas as pd
In [6]:
cd "D:\Dropbox\CLASSES\Data Science for Finance\PYTHON 1"
In [7]:
tips = pd.read_csv('https://opendoors.pk/wp-content/uploads/2020/02/tips.csv')
In [8]:
tips.head()
Out[8]:
Filter data for smokers¶
In [20]:
# First get all the row numbers where smoker is equal to No
rownumbers = tips['smoker'] =='No'
In [34]:
# Now get the actual rows
non_smoker = tips.loc[rownumbers]
In [35]:
non_smoker.head()
Out[35]:
Filter Female records in the nonsmoker group¶
In [36]:
# Get the rows index first
femalerows = non_smoker['sex'] == 'Female'
In [41]:
# Now get the actual rows
nonsmoker_females = non_smoker.loc[femalerows]
nonsmoker_females.head()
Out[41]:
In [38]:
# Interestingly, the following also works
nononsmoker_females = non_smoker[femalerows]
In [40]:
nonsmoker_females.head()
Out[40]:
Filter Non Smoker Females in one go¶
In [43]:
tips.head()
Out[43]:
In [46]:
nonsmoker_females = tips.loc[(tips['smoker'] == 'No') & (tips['sex'] == 'Female')]
In [47]:
nonsmoker_females.head()
Out[47]:
AVERAGE BY GROUPS¶
Average total_bill by smoker, day, and time
In [50]:
tips.groupby(['smoker', 'day', 'time'])['total_bill'].mean()
Out[50]:
Get the results as a DataFrame¶
In [52]:
tips.groupby(['smoker', 'day', 'time'])['total_bill'].mean().reset_index()
Out[52]:
In [ ]: