Pands Series¶
Series is a single column of data whereas a dataframe is a collection of columns or series. In the following exercise, we shall learn :
- Create a Series
- Compare two series
- Add, subtract, multiply, and divide one series on another
- Convert series to Python list
- Convert to Numeric
- Add Data to an Existing Series
- Filtering or Subsetting
- Find Mean and Standard Deviation of a Series
- Common Elements of two Series
- Elements of Series s1 not Present in Series s2
- Frequency Count
- Find difference between consectivie values of a series
- Get Previous values with Shift()
- Find the Forward Values using the shift(-1)
- MAP Function
import pandas as pd
1. Create a Series¶
A pandas series is created with the key word pd.Series ([ ]). Inside the square brackets, we can either put a python list or type values, separated by a comma.
ds = pd.Series([1,2,3,4])
ds2 =pd.Series([1,3,4,5])
2. Compare two Series¶
Two Series can be compared using the == symbols. It will generate a list of True and False
ds == ds2
3. Math Operations on Series¶
Two Series can be simply added, subtracted, divided, or multiplied.
ds + ds2
ds - ds2
4. Convert Series to Python list¶
we can use the tolist( ) method to convert a series to Python list
ds.tolist()
5. Convert to Numeric¶
If a Series has a text or string, the default series type is then string. We can force conversion of string Series to numeric Series using the pd.to_numeric(series_name ) method. Please note that doing so will replace any string element with NaN, which means, Not a Number
ds3 = pd.Series([1,2,'Apple',4])
# See the data type of this Series
ds3
ds3 = pd.to_numeric(ds3,errors='coerce')
ds3
6. Add Data to an Existing Series¶
First create the new series and then use the append() method.
ds4 = pd.Series([400,200])
ds3 = ds3.append(ds4)
ds3
7. Filtering or Subsetting¶
We can filtering data on a condition. For example, let us create a series from 100 to 1000, and filter values that are greater than 400
ds5 = pd.Series([100,200,300,400,500,600,700,800,900,1000])
# Remember comparison of a Series values from the point 2 above?
# What does the comparsion code generate?
# Let us try this code
f = ds5 > 400
# What does the f variable hold?
f
The f variable holds True and False values for each row of the ds5 Series. This f variable can now be passed to teh ds5 series to filter the required data. When we write ds5[f], it is actually slicing the ds5 series and gets only those values where the f variable has the value of True
ds5[f]
Descriptive Statistics of a Series¶
8. Find Mean and Standard Deviation of a Series¶
use the mean() and std() methods
ds5.mean()
ds5.std()
9. Common Elements of two Series¶
use the isin([value1, value2]) method to confirm whether the given values are present in a series
s1 = pd.Series([1,2,3,4,5])
s2 = pd.Series([2,4,6,8,10])
# Let us check whether 1 and 5 are present in series s1?
f = s1.isin([1,5])
f
# Instead of passing 1 and 5, let us pass the series s2 to the isin([ ])
f = s1.isin(s2)
f
# Since the variable f is a boolean, we can actually get the elements which are present in both Series
# The following codes shows that the values 2 and 4 are present in both the Series
s1[f]
10. Elements of Series s1 not Present in Series s2¶
f = ~s1.isin(s2)
s1[f]
11. Frequency Count¶
Use the value_counts( ) method to count how many times an elment appears in the dataset
s1 = pd.Series([1,2,3,4,5, 1,3,5])
s1.value_counts()
12. Find difference between consectivie values of a series¶
prices = pd.Series([10,12,13,15,12,16,17])
dif = prices.diff()
dif
13. Get Previous values with Shift()¶
We can get the previous (also called th lagged values) values of a series using the shift( ) function. So in the prices sereis, let's say we want to divide the current price on the previous price, for code will look like this
prices / prices.shift()
1. Stock returns
Let's find the stock returns using the shift() method. As we know that stock returns are equal to :
$ \large returns = \frac{\large Current\ Price \; - \ Previous\ \large Price}{\ \large Previous\ Price} $
returns = (prices - prices.shift()) / prices.shift()
returns
2. Stock returns using the percentage method
We can also use the pct_change() function to find percentage changes in the values of a series
returns2 = prices.pct_change()
returns2
14. Find the Forward Values using the shift(-1)¶
The shift function uses a default value of 1, and it gets the lagged values of a series. When we pass a negative one as an argument to the shift() function, it then gets the forward values. See this example:
prices
prices.shift(-1)
MAP Function¶
The map( ) function of Python will apply a given function to each element of a list. So if we carete a list with the name s6, and its has three words. We would like to count the characters of each word using a len( ) function. The manual way would be to apply the len( ) function to each of the three elements, however, the map() function can do it in one line
# Create a list
s6= ['Khyber', 'Punj', 'KpK']
x = map(len, s6)
# display the contentes of x
list(x)
So the first word has 6 characters, the 2nd has 4 characters, and the 3rd has 3 characters.
Capitalize first letter of each item of the following series.¶
ds = pd.Series(['mangoes', 'banans', 'tall person', 'data science'])
plist = ds.tolist()
plist
N = len(plist)
for i in range(N) :
plist[i] = plist[i].title()
plist
Explanation¶
- We converted the panda series to python list
- Since the list has string values, we first counted the number of items in the list and wrote them to variable N
- Then used for loop an replaced each item of the plist using title capitalization
Can we do the above using the map() function?¶
def title(s) :
return s.title()
x = map(title, plist)
list(x)