Pands Series¶

Series is a single column of data whereas a dataframe is a collection of columns or series. In the following exercise, we shall learn :

Create a Series
Compare two series
Add, subtract, multiply, and divide one series on another
Convert series to Python list
Convert to Numeric
Add Data to an Existing Series
Filtering or Subsetting
Find Mean and Standard Deviation of a Series
Common Elements of two Series
Elements of Series s1 not Present in Series s2
Frequency Count
Find difference between consectivie values of a series
Get Previous values with Shift()
Find the Forward Values using the shift(-1)
MAP Function

import pandas as pd

1. Create a Series¶

A pandas series is created with the key word pd.Series ([ ]). Inside the square brackets, we can either put a python list or type values, separated by a comma.

ds = pd.Series([1,2,3,4])

ds2 =pd.Series([1,3,4,5])

2. Compare two Series¶

Two Series can be compared using the == symbols. It will generate a list of True and False

ds == ds2

0     True
1    False
2    False
3    False
dtype: bool

3. Math Operations on Series¶

Two Series can be simply added, subtracted, divided, or multiplied.

ds + ds2

0    2
1    5
2    7
3    9
dtype: int64

ds - ds2

0    0
1   -1
2   -1
3   -1
dtype: int64

4. Convert Series to Python list¶

we can use the tolist( ) method to convert a series to Python list

ds.tolist()

[1, 2, 3, 4]

5. Convert to Numeric¶

If a Series has a text or string, the default series type is then string. We can force conversion of string Series to numeric Series using the pd.to_numeric(series_name ) method. Please note that doing so will replace any string element with NaN, which means, Not a Number

ds3 = pd.Series([1,2,'Apple',4])

# See the data type of this Series
ds3

0        1
1        2
2    Apple
3        4
dtype: object

ds3 = pd.to_numeric(ds3,errors='coerce')
ds3

0    1.0
1    2.0
2    NaN
3    4.0
dtype: float64

6. Add Data to an Existing Series¶

First create the new series and then use the append() method.

ds4 = pd.Series([400,200])

ds3 = ds3.append(ds4)
ds3

0      1.0
1      2.0
2      NaN
3      4.0
0    400.0
1    200.0
dtype: float64

7. Filtering or Subsetting¶

We can filtering data on a condition. For example, let us create a series from 100 to 1000, and filter values that are greater than 400

ds5 = pd.Series([100,200,300,400,500,600,700,800,900,1000])

# Remember comparison of a Series values from the point 2 above?
# What does the comparsion code generate?
# Let us try this code
f = ds5 > 400

# What does the f variable hold?
f

0    False
1    False
2    False
3    False
4     True
5     True
6     True
7     True
8     True
9     True
dtype: bool

The f variable holds True and False values for each row of the ds5 Series. This f variable can now be passed to teh ds5 series to filter the required data. When we write ds5[f], it is actually slicing the ds5 series and gets only those values where the f variable has the value of True

ds5[f]

4     500
5     600
6     700
7     800
8     900
9    1000
dtype: int64

Descriptive Statistics of a Series¶

8. Find Mean and Standard Deviation of a Series¶

use the mean() and std() methods

ds5.mean()

550.0

ds5.std()

302.7650354097492

9. Common Elements of two Series¶

use the isin([value1, value2]) method to confirm whether the given values are present in a series

s1 = pd.Series([1,2,3,4,5])
s2 = pd.Series([2,4,6,8,10])

# Let us check whether 1 and 5 are present in series s1?
f = s1.isin([1,5])
f

0     True
1    False
2    False
3    False
4     True
dtype: bool

# Instead of passing 1 and 5, let us pass the series s2 to the isin([ ])
f = s1.isin(s2)
f

0    False
1     True
2    False
3     True
4    False
dtype: bool

# Since the variable f is a boolean, we can actually get the elements which are present in both Series
# The following codes shows that the values 2 and 4 are present in both the Series
s1[f]

1    2
3    4
dtype: int64

10. Elements of Series s1 not Present in Series s2¶

f = ~s1.isin(s2)
s1[f]

0    1
2    3
4    5
dtype: int64

11. Frequency Count¶

Use the value_counts( ) method to count how many times an elment appears in the dataset

s1 = pd.Series([1,2,3,4,5, 1,3,5])

s1.value_counts()

5    2
3    2
1    2
4    1
2    1
dtype: int64

12. Find difference between consectivie values of a series¶

prices = pd.Series([10,12,13,15,12,16,17])
dif = prices.diff()
dif

0    NaN
1    2.0
2    1.0
3    2.0
4   -3.0
5    4.0
6    1.0
dtype: float64

13. Get Previous values with Shift()¶

We can get the previous (also called th lagged values) values of a series using the shift( ) function. So in the prices sereis, let's say we want to divide the current price on the previous price, for code will look like this

prices / prices.shift()

0         NaN
1    1.200000
2    1.083333
3    1.153846
4    0.800000
5    1.333333
6    1.062500
dtype: float64

1. Stock returns

Let's find the stock returns using the shift() method. As we know that stock returns are equal to :

$ \large returns = \frac{\large Current\ Price \; - \ Previous\ \large Price}{\ \large Previous\ Price} $

returns = (prices - prices.shift()) / prices.shift()
returns

0         NaN
1    0.200000
2    0.083333
3    0.153846
4   -0.200000
5    0.333333
6    0.062500
dtype: float64

2. Stock returns using the percentage method

We can also use the pct_change() function to find percentage changes in the values of a series

returns2 = prices.pct_change()
returns2

0         NaN
1    0.200000
2    0.083333
3    0.153846
4   -0.200000
5    0.333333
6    0.062500
dtype: float64

14. Find the Forward Values using the shift(-1)¶

The shift function uses a default value of 1, and it gets the lagged values of a series. When we pass a negative one as an argument to the shift() function, it then gets the forward values. See this example:

prices

0    10
1    12
2    13
3    15
4    12
5    16
6    17
dtype: int64

prices.shift(-1)

0    12.0
1    13.0
2    15.0
3    12.0
4    16.0
5    17.0
6     NaN
dtype: float64

MAP Function¶

The map( ) function of Python will apply a given function to each element of a list. So if we carete a list with the name s6, and its has three words. We would like to count the characters of each word using a len( ) function. The manual way would be to apply the len( ) function to each of the three elements, however, the map() function can do it in one line

# Create a list
s6= ['Khyber', 'Punj', 'KpK']

x = map(len, s6)

# display the contentes of x
list(x)

[6, 4, 3]

So the first word has 6 characters, the 2nd has 4 characters, and the 3rd has 3 characters.

Capitalize first letter of each item of the following series.¶

ds = pd.Series(['mangoes', 'banans', 'tall person', 'data science'])
plist = ds.tolist()
plist

['mangoes', 'banans', 'tall person', 'data science']

N = len(plist)
for i in range(N) :
    plist[i] = plist[i].title()

plist

['Mangoes', 'Banans', 'Tall Person', 'Data Science']

Explanation¶

We converted the panda series to python list
Since the list has string values, we first counted the number of items in the list and wrote them to variable N
Then used for loop an replaced each item of the plist using title capitalization

Can we do the above using the map() function?¶

def title(s) :
    return s.title()

x = map(title, plist)

list(x)

['Mangoes', 'Banans', 'Tall Person', 'Data Science']

Introduction to Pandas Series | Python Lecture Series