Lecture 3 - Introduction to Pandas Series

Pands Series

Series is a single column of data whereas a dataframe is a collection of columns or series. In the following exercise, we shall learn :

  1. Create a Series
  2. Compare two series
  3. Add, subtract, multiply, and divide one series on another
  4. Convert series to Python list
  5. Convert to Numeric
  6. Add Data to an Existing Series
  7. Filtering or Subsetting
  8. Find Mean and Standard Deviation of a Series
  9. Common Elements of two Series
  10. Elements of Series s1 not Present in Series s2
  11. Frequency Count
  12. Find difference between consectivie values of a series
  13. Get Previous values with Shift()
  14. Find the Forward Values using the shift(-1)
  15. MAP Function
In [1]:
import pandas as pd

1. Create a Series

A pandas series is created with the key word pd.Series ([ ]). Inside the square brackets, we can either put a python list or type values, separated by a comma.

In [2]:
ds = pd.Series([1,2,3,4])
In [3]:
ds2 =pd.Series([1,3,4,5])

2. Compare two Series

Two Series can be compared using the == symbols. It will generate a list of True and False

In [4]:
ds == ds2
Out[4]:
0     True
1    False
2    False
3    False
dtype: bool

3. Math Operations on Series

Two Series can be simply added, subtracted, divided, or multiplied.

In [5]:
ds + ds2
Out[5]:
0    2
1    5
2    7
3    9
dtype: int64
In [6]:
ds - ds2
Out[6]:
0    0
1   -1
2   -1
3   -1
dtype: int64

4. Convert Series to Python list

we can use the tolist( ) method to convert a series to Python list

In [7]:
ds.tolist()
Out[7]:
[1, 2, 3, 4]

5. Convert to Numeric

If a Series has a text or string, the default series type is then string. We can force conversion of string Series to numeric Series using the pd.to_numeric(series_name ) method. Please note that doing so will replace any string element with NaN, which means, Not a Number

In [8]:
ds3 = pd.Series([1,2,'Apple',4])

# See the data type of this Series
ds3
Out[8]:
0        1
1        2
2    Apple
3        4
dtype: object
In [9]:
ds3 = pd.to_numeric(ds3,errors='coerce')
ds3
Out[9]:
0    1.0
1    2.0
2    NaN
3    4.0
dtype: float64

6. Add Data to an Existing Series

First create the new series and then use the append() method.

In [10]:
ds4 = pd.Series([400,200])
In [11]:
ds3 = ds3.append(ds4)
ds3
Out[11]:
0      1.0
1      2.0
2      NaN
3      4.0
0    400.0
1    200.0
dtype: float64

7. Filtering or Subsetting

We can filtering data on a condition. For example, let us create a series from 100 to 1000, and filter values that are greater than 400

In [12]:
ds5 = pd.Series([100,200,300,400,500,600,700,800,900,1000])
In [13]:
# Remember comparison of a Series values from the point 2 above?
# What does the comparsion code generate?
# Let us try this code
f = ds5 > 400
In [14]:
# What does the f variable hold?
f
Out[14]:
0    False
1    False
2    False
3    False
4     True
5     True
6     True
7     True
8     True
9     True
dtype: bool

The f variable holds True and False values for each row of the ds5 Series. This f variable can now be passed to teh ds5 series to filter the required data. When we write ds5[f], it is actually slicing the ds5 series and gets only those values where the f variable has the value of True

In [15]:
ds5[f]
Out[15]:
4     500
5     600
6     700
7     800
8     900
9    1000
dtype: int64

Descriptive Statistics of a Series

8. Find Mean and Standard Deviation of a Series

use the mean() and std() methods

In [16]:
ds5.mean()
Out[16]:
550.0
In [17]:
ds5.std()
Out[17]:
302.7650354097492

9. Common Elements of two Series

use the isin([value1, value2]) method to confirm whether the given values are present in a series

In [18]:
s1 = pd.Series([1,2,3,4,5])
s2 = pd.Series([2,4,6,8,10])
In [19]:
# Let us check whether 1 and 5 are present in series s1?
f = s1.isin([1,5])
f
Out[19]:
0     True
1    False
2    False
3    False
4     True
dtype: bool
In [20]:
# Instead of passing 1 and 5, let us pass the series s2 to the isin([ ])
f = s1.isin(s2)
f
Out[20]:
0    False
1     True
2    False
3     True
4    False
dtype: bool
In [21]:
# Since the variable f is a boolean, we can actually get the elements which are present in both Series
# The following codes shows that the values 2 and 4 are present in both the Series
s1[f]
Out[21]:
1    2
3    4
dtype: int64

10. Elements of Series s1 not Present in Series s2

In [22]:
f = ~s1.isin(s2)
s1[f]
Out[22]:
0    1
2    3
4    5
dtype: int64

11. Frequency Count

Use the value_counts( ) method to count how many times an elment appears in the dataset

In [23]:
s1 = pd.Series([1,2,3,4,5, 1,3,5])
In [24]:
s1.value_counts()
Out[24]:
5    2
3    2
1    2
4    1
2    1
dtype: int64

12. Find difference between consectivie values of a series

In [25]:
prices = pd.Series([10,12,13,15,12,16,17])
dif = prices.diff()
dif
Out[25]:
0    NaN
1    2.0
2    1.0
3    2.0
4   -3.0
5    4.0
6    1.0
dtype: float64

13. Get Previous values with Shift()

We can get the previous (also called th lagged values) values of a series using the shift( ) function. So in the prices sereis, let's say we want to divide the current price on the previous price, for code will look like this

In [26]:
prices / prices.shift()
Out[26]:
0         NaN
1    1.200000
2    1.083333
3    1.153846
4    0.800000
5    1.333333
6    1.062500
dtype: float64

1. Stock returns

Let's find the stock returns using the shift() method. As we know that stock returns are equal to :

$ \large returns = \frac{\large Current\ Price \; - \ Previous\ \large Price}{\ \large Previous\ Price} $

In [27]:
returns = (prices - prices.shift()) / prices.shift()
returns
Out[27]:
0         NaN
1    0.200000
2    0.083333
3    0.153846
4   -0.200000
5    0.333333
6    0.062500
dtype: float64

2. Stock returns using the percentage method

We can also use the pct_change() function to find percentage changes in the values of a series

In [28]:
returns2 = prices.pct_change()
returns2
Out[28]:
0         NaN
1    0.200000
2    0.083333
3    0.153846
4   -0.200000
5    0.333333
6    0.062500
dtype: float64

14. Find the Forward Values using the shift(-1)

The shift function uses a default value of 1, and it gets the lagged values of a series. When we pass a negative one as an argument to the shift() function, it then gets the forward values. See this example:

In [29]:
prices
Out[29]:
0    10
1    12
2    13
3    15
4    12
5    16
6    17
dtype: int64
In [30]:
prices.shift(-1)
Out[30]:
0    12.0
1    13.0
2    15.0
3    12.0
4    16.0
5    17.0
6     NaN
dtype: float64

MAP Function

The map( ) function of Python will apply a given function to each element of a list. So if we carete a list with the name s6, and its has three words. We would like to count the characters of each word using a len( ) function. The manual way would be to apply the len( ) function to each of the three elements, however, the map() function can do it in one line

In [31]:
# Create a list
s6= ['Khyber', 'Punj', 'KpK']

x = map(len, s6)

# display the contentes of x
list(x)
Out[31]:
[6, 4, 3]

So the first word has 6 characters, the 2nd has 4 characters, and the 3rd has 3 characters.

Capitalize first letter of each item of the following series.

In [32]:
ds = pd.Series(['mangoes', 'banans', 'tall person', 'data science'])
plist = ds.tolist()
plist
Out[32]:
['mangoes', 'banans', 'tall person', 'data science']
In [33]:
N = len(plist)
for i in range(N) :
    plist[i] = plist[i].title()
In [34]:
plist
Out[34]:
['Mangoes', 'Banans', 'Tall Person', 'Data Science']

Explanation

  1. We converted the panda series to python list
  2. Since the list has string values, we first counted the number of items in the list and wrote them to variable N
  3. Then used for loop an replaced each item of the plist using title capitalization

Can we do the above using the map() function?

In [35]:
def title(s) :
    return s.title()
In [36]:
x = map(title, plist)
In [37]:
list(x)
Out[37]:
['Mangoes', 'Banans', 'Tall Person', 'Data Science']