Lecture 3 - Introduction to Pandas Series

# Pands Series¶

Series is a single column of data whereas a dataframe is a collection of columns or series. In the following exercise, we shall learn :

1. Create a Series
2. Compare two series
3. Add, subtract, multiply, and divide one series on another
4. Convert series to Python list
5. Convert to Numeric
6. Add Data to an Existing Series
7. Filtering or Subsetting
8. Find Mean and Standard Deviation of a Series
9. Common Elements of two Series
10. Elements of Series s1 not Present in Series s2
11. Frequency Count
12. Find difference between consectivie values of a series
13. Get Previous values with Shift()
14. Find the Forward Values using the shift(-1)
15. MAP Function
In [1]:
import pandas as pd


### 1. Create a Series¶

A pandas series is created with the key word pd.Series ([ ]). Inside the square brackets, we can either put a python list or type values, separated by a comma.

In [2]:
ds = pd.Series([1,2,3,4])

In [3]:
ds2 =pd.Series([1,3,4,5])


### 2. Compare two Series¶

Two Series can be compared using the == symbols. It will generate a list of True and False

In [4]:
ds == ds2

Out[4]:
0     True
1    False
2    False
3    False
dtype: bool

### 3. Math Operations on Series¶

Two Series can be simply added, subtracted, divided, or multiplied.

In [5]:
ds + ds2

Out[5]:
0    2
1    5
2    7
3    9
dtype: int64
In [6]:
ds - ds2

Out[6]:
0    0
1   -1
2   -1
3   -1
dtype: int64

### 4. Convert Series to Python list¶

we can use the tolist( ) method to convert a series to Python list

In [7]:
ds.tolist()

Out[7]:
[1, 2, 3, 4]

### 5. Convert to Numeric¶

If a Series has a text or string, the default series type is then string. We can force conversion of string Series to numeric Series using the pd.to_numeric(series_name ) method. Please note that doing so will replace any string element with NaN, which means, Not a Number

In [8]:
ds3 = pd.Series([1,2,'Apple',4])

# See the data type of this Series
ds3

Out[8]:
0        1
1        2
2    Apple
3        4
dtype: object
In [9]:
ds3 = pd.to_numeric(ds3,errors='coerce')
ds3

Out[9]:
0    1.0
1    2.0
2    NaN
3    4.0
dtype: float64

### 6. Add Data to an Existing Series¶

First create the new series and then use the append() method.

In [10]:
ds4 = pd.Series([400,200])

In [11]:
ds3 = ds3.append(ds4)
ds3

Out[11]:
0      1.0
1      2.0
2      NaN
3      4.0
0    400.0
1    200.0
dtype: float64

### 7. Filtering or Subsetting¶

We can filtering data on a condition. For example, let us create a series from 100 to 1000, and filter values that are greater than 400

In [12]:
ds5 = pd.Series([100,200,300,400,500,600,700,800,900,1000])

In [13]:
# Remember comparison of a Series values from the point 2 above?
# What does the comparsion code generate?
# Let us try this code
f = ds5 > 400

In [14]:
# What does the f variable hold?
f

Out[14]:
0    False
1    False
2    False
3    False
4     True
5     True
6     True
7     True
8     True
9     True
dtype: bool

The f variable holds True and False values for each row of the ds5 Series. This f variable can now be passed to teh ds5 series to filter the required data. When we write ds5[f], it is actually slicing the ds5 series and gets only those values where the f variable has the value of True

In [15]:
ds5[f]

Out[15]:
4     500
5     600
6     700
7     800
8     900
9    1000
dtype: int64

### 8. Find Mean and Standard Deviation of a Series¶

use the mean() and std() methods

In [16]:
ds5.mean()

Out[16]:
550.0
In [17]:
ds5.std()

Out[17]:
302.7650354097492

### 9. Common Elements of two Series¶

use the isin([value1, value2]) method to confirm whether the given values are present in a series

In [18]:
s1 = pd.Series([1,2,3,4,5])
s2 = pd.Series([2,4,6,8,10])

In [19]:
# Let us check whether 1 and 5 are present in series s1?
f = s1.isin([1,5])
f

Out[19]:
0     True
1    False
2    False
3    False
4     True
dtype: bool
In [20]:
# Instead of passing 1 and 5, let us pass the series s2 to the isin([ ])
f = s1.isin(s2)
f

Out[20]:
0    False
1     True
2    False
3     True
4    False
dtype: bool
In [21]:
# Since the variable f is a boolean, we can actually get the elements which are present in both Series
# The following codes shows that the values 2 and 4 are present in both the Series
s1[f]

Out[21]:
1    2
3    4
dtype: int64

### 10. Elements of Series s1 not Present in Series s2¶

In [22]:
f = ~s1.isin(s2)
s1[f]

Out[22]:
0    1
2    3
4    5
dtype: int64

### 11. Frequency Count¶

Use the value_counts( ) method to count how many times an elment appears in the dataset

In [23]:
s1 = pd.Series([1,2,3,4,5, 1,3,5])

In [24]:
s1.value_counts()

Out[24]:
5    2
3    2
1    2
4    1
2    1
dtype: int64

### 12. Find difference between consectivie values of a series¶

In [25]:
prices = pd.Series([10,12,13,15,12,16,17])
dif = prices.diff()
dif

Out[25]:
0    NaN
1    2.0
2    1.0
3    2.0
4   -3.0
5    4.0
6    1.0
dtype: float64

### 13. Get Previous values with Shift()¶

We can get the previous (also called th lagged values) values of a series using the shift( ) function. So in the prices sereis, let's say we want to divide the current price on the previous price, for code will look like this

In [26]:
prices / prices.shift()

Out[26]:
0         NaN
1    1.200000
2    1.083333
3    1.153846
4    0.800000
5    1.333333
6    1.062500
dtype: float64

#### 1. Stock returns

Let's find the stock returns using the shift() method. As we know that stock returns are equal to :

$\large returns = \frac{\large Current\ Price \; - \ Previous\ \large Price}{\ \large Previous\ Price}$

In [27]:
returns = (prices - prices.shift()) / prices.shift()
returns

Out[27]:
0         NaN
1    0.200000
2    0.083333
3    0.153846
4   -0.200000
5    0.333333
6    0.062500
dtype: float64

#### 2. Stock returns using the percentage method

We can also use the pct_change() function to find percentage changes in the values of a series

In [28]:
returns2 = prices.pct_change()
returns2

Out[28]:
0         NaN
1    0.200000
2    0.083333
3    0.153846
4   -0.200000
5    0.333333
6    0.062500
dtype: float64

### 14. Find the Forward Values using the shift(-1)¶

The shift function uses a default value of 1, and it gets the lagged values of a series. When we pass a negative one as an argument to the shift() function, it then gets the forward values. See this example:

In [29]:
prices

Out[29]:
0    10
1    12
2    13
3    15
4    12
5    16
6    17
dtype: int64
In [30]:
prices.shift(-1)

Out[30]:
0    12.0
1    13.0
2    15.0
3    12.0
4    16.0
5    17.0
6     NaN
dtype: float64

# MAP Function¶

The map( ) function of Python will apply a given function to each element of a list. So if we carete a list with the name s6, and its has three words. We would like to count the characters of each word using a len( ) function. The manual way would be to apply the len( ) function to each of the three elements, however, the map() function can do it in one line

In [31]:
# Create a list
s6= ['Khyber', 'Punj', 'KpK']

x = map(len, s6)

# display the contentes of x
list(x)

Out[31]:
[6, 4, 3]

So the first word has 6 characters, the 2nd has 4 characters, and the 3rd has 3 characters.

### Capitalize first letter of each item of the following series.¶

In [32]:
ds = pd.Series(['mangoes', 'banans', 'tall person', 'data science'])
plist = ds.tolist()
plist

Out[32]:
['mangoes', 'banans', 'tall person', 'data science']
In [33]:
N = len(plist)
for i in range(N) :
plist[i] = plist[i].title()

In [34]:
plist

Out[34]:
['Mangoes', 'Banans', 'Tall Person', 'Data Science']

### Explanation¶

1. We converted the panda series to python list
2. Since the list has string values, we first counted the number of items in the list and wrote them to variable N
3. Then used for loop an replaced each item of the plist using title capitalization

### Can we do the above using the map() function?¶

In [35]:
def title(s) :
return s.title()

In [36]:
x = map(title, plist)

In [37]:
list(x)

Out[37]:
['Mangoes', 'Banans', 'Tall Person', 'Data Science']