Date and Time - Part 1

Date and Time in Pandas

Generate a dummy date series

We can use the date_range() function of pandas. Option period specifies the number of enteries to be generated. And option freq specifies the date and time frequency, where :
D =    Daily Data
M =    Monthly Data
MS =  Month Start dates
B =    Business Dates

Access the pandas documentation at this page for date and time https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html

In [6]:
import pandas as pd
range = pd.date_range('2002 July 15 10:15', periods = 10, freq = 'M')
In [7]:
range
Out[7]:
DatetimeIndex(['2002-07-31 10:15:00', '2002-08-31 10:15:00',
               '2002-09-30 10:15:00', '2002-10-31 10:15:00',
               '2002-11-30 10:15:00', '2002-12-31 10:15:00',
               '2003-01-31 10:15:00', '2003-02-28 10:15:00',
               '2003-03-31 10:15:00', '2003-04-30 10:15:00'],
              dtype='datetime64[ns]', freq='M')

Create dates from July 15, 2016 to July 24, 2016

In [15]:
range = pd.date_range(start = '2016 july 15', end = '2016 july 24', freq = 'D')
In [16]:
range
Out[16]:
DatetimeIndex(['2016-07-15', '2016-07-16', '2016-07-17', '2016-07-18',
               '2016-07-19', '2016-07-20', '2016-07-21', '2016-07-22',
               '2016-07-23', '2016-07-24'],
              dtype='datetime64[ns]', freq='D')

How many dates are in the above range

len() return the number of items in a container.

In [19]:
len(range)
Out[19]:
10

Which time formats pandas will not accept

'2016 Jul 1' , '7/1/2016', '1/7/2016', 'July 1, 2016', '2016-07-01'

In [21]:
pew = pd.read_csv('https://opendoors.pk/wp-content/uploads/2020/02/pew.csv')
In [22]:
pew.head()
Out[22]:
religion <$10k $10-20k $20-30k $30-40k $40-50k $50-75k $75-100k $100-150k >150k Don't know/refused
0 Agnostic 27 34 60 81 76 137 122 109 84 96
1 Atheist 12 27 37 52 35 70 73 59 74 76
2 Buddhist 27 21 30 34 33 58 62 39 53 54
3 Catholic 418 617 732 670 638 1116 949 792 633 1489
4 Don’t know/refused 15 14 15 11 10 35 21 17 18 116

CONVERT STRING TO DATE TIME FORMAT

In [23]:
# Use the ufo dataset from the web
ufo = pd.read_csv('http://bit.ly/uforeports')
In [24]:
ufo.head()
Out[24]:
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00
3 Abilene NaN DISK KS 6/1/1931 13:00
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00
In [25]:
# Find the characteristics of the variables in the dataset
ufo.dtypes
Out[25]:
City               object
Colors Reported    object
Shape Reported     object
State              object
Time               object
dtype: object
In [26]:
# As we can se that the Time column is an object, meaning it is a text column

Convert string to date time

We can use the top level function to_datetime() of the pandas library

In [27]:
# Replace the existing Time column
ufo['Time'] = pd.to_datetime(ufo['Time'])
In [28]:
ufo.head()
Out[28]:
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 1930-06-01 22:00:00
1 Willingboro NaN OTHER NJ 1930-06-30 20:00:00
2 Holyoke NaN OVAL CO 1931-02-15 14:00:00
3 Abilene NaN DISK KS 1931-06-01 13:00:00
4 New York Worlds Fair NaN LIGHT NY 1933-04-18 19:00:00
In [29]:
ufo.dtypes
Out[29]:
City                       object
Colors Reported            object
Shape Reported             object
State                      object
Time               datetime64[ns]
dtype: object

What are the benefits of datatime format

1. We can pull out hours now

Now we can use the top level dt function and all other low level time functions. To get hours from the column, add dt.hour

In [31]:
hours = ufo['Time'].dt.hour
In [32]:
hours.head()
Out[32]:
0    22
1    20
2    14
3    13
4    19
Name: Time, dtype: int64

2. Get Weekdays Names

In [34]:
weekedays = ufo['Time'].dt.weekday_name
In [35]:
weekedays.head()
Out[35]:
0     Sunday
1     Monday
2     Sunday
3     Monday
4    Tuesday
Name: Time, dtype: object

3. Get Weekdays Numbers

In [38]:
weekedaysn = ufo['Time'].dt.weekday
In [39]:
weekedaysn.head()
Out[39]:
0    6
1    0
2    6
3    0
4    1
Name: Time, dtype: int64

4. Day of the year

In [40]:
dofy = ufo['Time'].dt.dayofyear
In [41]:
dofy.head()
Out[41]:
0    152
1    181
2     46
3    152
4    108
Name: Time, dtype: int64

5. Year

In [43]:
year = ufo['Time'].dt.year
In [44]:
year.head()
Out[44]:
0    1930
1    1930
2    1931
3    1931
4    1933
Name: Time, dtype: int64

6. Month

In [45]:
month = ufo['Time'].dt.month
In [46]:
month.head()
Out[46]:
0    6
1    6
2    2
3    6
4    4
Name: Time, dtype: int64

Full Reference

Function Purpose
Series.dt.date Returns numpy array of python datetime.date objects (namely, the date part of Timestamps without timezone information).
Series.dt.time Returns numpy array of datetime.time.
Series.dt.timetz Returns numpy array of datetime.time also containing timezone information.
Series.dt.year The year of the datetime.
Series.dt.month The month as January=1, December=12.
Series.dt.day The days of the datetime.
Series.dt.hour The hours of the datetime.
Series.dt.minute The minutes of the datetime.
Series.dt.second The seconds of the datetime.
Series.dt.microsecond The microseconds of the datetime.
Series.dt.nanosecond The nanoseconds of the datetime.
Series.dt.week The week ordinal of the year.
Series.dt.weekofyear The week ordinal of the year.
Series.dt.dayofweek The day of the week with Monday=0, Sunday=6.
Series.dt.weekday The day of the week with Monday=0, Sunday=6.
Series.dt.dayofyear The ordinal day of the year.
Series.dt.quarter The quarter of the date.
Series.dt.is_month_start Indicates whether the date is the first day of the month.
Series.dt.is_month_end Indicates whether the date is the last day of the month.
Series.dt.is_quarter_start Indicator for whether the date is the first day of a quarter.
Series.dt.is_quarter_end Indicator for whether the date is the last day of a quarter.
Series.dt.is_year_start Indicate whether the date is the first day of a year.
Series.dt.is_year_end Indicate whether the date is the last day of the year.
Series.dt.is_leap_year Boolean indicator if the date belongs to a leap year.
Series.dt.daysinmonth The number of days in the month.
Series.dt.days_in_month The number of days in the month.
Series.dt.tz Return timezone, if any.
Series.dt.freq
[/fusion_table]

In [ ]: