**Course Description**

In this Specialization learners will develop foundational Data Science skills to prepare them for a career or further learning that involves more advanced topics in Data Science. The specialization entails understanding what is Data Science and the various kinds of activities that a Data Scientist performs. It will familiarize learners with various open source tools, like Jupyter notebooks, MS Excel and Stata. It will teach you about methodology involved in tackling data science problems. Learners will complete hands-on labs and projects to apply their newly acquired skills and knowledge.

**Course Outline**

Too often, finance courses stop short of making a connection between textbook finance and the problems of real-world business. This course bridges this gap between theory and practice by providing a nuts-and-bolts guide to solving common financial models with spreadsheets. This course takes the students step by step through financial models and various data management issues, showing how they can be solved.

Week No. |
Description |

Week 1 | Lecture 1: Introduction to Python |

Week 1 | Lecture 1.1: Getting Started with Pandas |

Week 1 | Lecture 1.2 Tidy Data |

**Lecture Notes and Files**

Week No. |
Description |

LECTURE 1: Getting Started with Python | Download |

Lecture 1.1: Practice on Python lists | Download |

Lecture 1.2: Advanced Topic: Python Dictionary | Download |

Lecture 2: Project: Create a Number to Text Function | Access Page |

Lecture 3: Introduction to Pandas Series | Download |

Lecture 4: Pandas DataFrames | Download |

Lecture 4.1: Date and Time in Pandas | Download |

Lecture 5.1 : Pands – Groupby | Download |

Lecture 5.2 : Excercise – Groupby | Download |

Lecture 6: Merging Datasets | Download |

Lecture 7: Portfolios and Returns | Download |

Lecture 8: Tidy Data | Download |

Lecture 9: Data Cleaning | Download |

## Assignment 7 : Research with Stata

Marks = 20

Deadline = July 25, 2020; till 11:59:59 PM

Submission Link = Submit on google classroom

assignment

**Important Note:** Do not submit files that were created during a lecture in the lab. You are supposed to submit files that you have created at home. The purpose is to have as much practice as possible. Use as many comments and examples as possible. Marks are based on the size, examples, and comments inside the submitted assignments.

1. Use data of at least 100 companies (2010-2018). You can download the data

manually or using Python from this page https://opendoors.pk/financials/ or any other site for the following variables.

a. Total assets

b. Net income

c. Dividend Percentage

d. Face value per share

e. Equity

f. sales

g. Financial expenses

h. Total Shares

2. Import the data from Excel files

3. Create the following variables:

a. Firm size = log of total assets

b. Profitability = net income / total assets

c. Dividend payout = dps / eps

d. Leverage = total liabilities / total assets

e. The standard deviation of net income for each firm’s

f. growth = geometric mean of the annual increase in total sales

h. Year dummies

i. Create a dummy if a firm pays a dividend, else zero. call it divdum

4. From the above variable, develop some interesting hypotheses and test them using ttest and regressions.

(Just one example: Using transaction cost hypothesis: Large firms have better access to external finance, hence they pay more dividends as they do not rely desperately on internally generated funds)

More interesting topics can be found here https://opendoors.pk/home/research-topics/research-topics-in-finance/.

I do not believe in change if two students have chosen the same hypothesis :D

5. Arrange data in panel data format and assign numeric identifiers to each firm

6. Declare the data panel data

7. Winsorize the variables at 1st and 99th percentiles

8. Create residual vs fitted values plots and visually confirm influential and outlier observations

9. Remove influential and outlier observations using studentized residuals and Cook’s D

10.Test for panel-level heteroscedasticity

11. Test for panel-level serial correlation

12. Use all appropriate steps in the panel data model selection and use robust standard errors according to steps 10 and 11.

13. Report your descriptive statistics, correlations, VIF, Hausman test, and regression results (use nested models) using asdoc.

14. Make a zip folder and add the following to it:

a. Stata log file

b. Stata Do File

C. Stata Data file

D. Excel file containing the data

E. Word Files containing the results

## Assignment 6 : Merging and Portfolios

Marks = 10

Deadline = June 22, 2020; till 11:59:59 PM

Submission Link = Submit on google classroom

**Important Note:** Do not submit files that were created during a lecture in the lab. You are supposed to submit files that you have created at home. The purpose is to have as much practice as possible. Use as many comments and examples as possible. Marks are based on the size, examples, and comments inside the submitted assignments.

1. Download share prices data of 50 companies either from this site https://opendoors.pk/share-prices/ or any other site for the period 1/1/2010 to 12/30/2018

2. Create a year variable in this dataset

3. Download index points for the same period from this page https://opendoors.pk/od/kse-100-index-points-historical-data-in-excel/ or any other site

4. Download company symbols, year, companies book value per share and shares outstanding from this page https://opendoors.pk/financials/ or any other site for the same 50 companies

5. Arrange data in panel data format.

6. Merge share prices data with the financial data (point 4 above) using symbol and year as merge criteria

7. Merge index points into this new dataset (obtained in point 6) using the date as merge criterion

8. In each year, classify firms into small or big based on median market capitalization (me = shares * price). Small firms should be tagged as 1 while big firms should be tagged as 2.

9. In each year, classify firms into three groups based on book-to-market (beme) ratio: firms with low beme ratio should be tagged as 1; medium as 2; and high as 3

10. Make 6 portfolios and find their monthly returns as per the following definitions:

SL = Monthly average returns of firms that have size =1 and beme = 1

SM = Monthly average returns of firms that have size =1 and beme = 2

SH = Monthly average returns of firms that have size =1 and beme = 3

BL = Monthly average returns of firms that have size =2 and beme = 1

BM = Monthly average returns of firms that have size =2 and beme = 2

BH = Monthly average returns of firms that have size =2 and beme = 3

11. Reduce the dataset to portfolio level and remove duplicates.

## Assignment 5 : Groupby and Filtering

Marks = 5

Deadline = March 27, 2020; till 11:59:59 PM

Submission Link = MS / MBA Evening | MBA Morning

**Important Note:** Do not submit files that were created during a lecture in the lab. You are supposed to submit files that you have created at home. The purpose is to have as much practice as possible. Use as many comments and examples as possible. Marks are based on the size, examples, and comments inside the submitted assignments.

1. Import the following industries.xls file from our website and name it ind

2. Create stock returns for each firm

3. Convert the firms returns to monthly returns

4. Create monthly returns for each industry

5. Remove duplicates and keep only one observation for each industry in each month

6. Report descriptive statistics for each industry

7. Which industry has the highest risk returns ratio (Sharp ratio)?

(hint find the mean and standard deviation for each industry in the whole period)

8. Repeat all of the above steps using weekly returns

## Assignment 4 : Pandas Filtering, Slicing

Marks = 5

Deadline = March 12, 2020

Submission Link = Deadline passed

**Important Note:** Do not submit files that were created during a lecture in the lab. You are supposed to submit files that you have created at home. The purpose is to have as much practice as possible. Use as many comments and examples as possible. Marks are based on the size, examples, and comments inside the submitted assignments.

1. Import a TSV file from a Computer folder and create a DataFrame, name it df2

2. Import Pipe (|) delimited file from this link into DataFrame and name the DataFrame as df3

3. Import data from this csv file into Panda’s DataFrame, name the dataframe as prices

4. Create stock returns for each firm

5. Find mean returns for all firms

6. Create a DataFrame with the name updays from the prices DataFrame by including only those rows where returns are higher than the mean returns

7. Create a DataFrame with the name downdays from the prices DataFrame by including only those rows where returns are lower than the mean returns

8. Find the standard deviation of returns of the updays DataFrame, call this variable as sdup

9. Find the standard deviation of the returns of the downdays DataFrame, call this variable as sddown

10. Find the log(sddown / sdup)

11. Show practice on the following functions using the symbol column from the prices dataframe:

11.1 count()

11.2 value_counts()

11.3 Find min(), max() of the returns column

12. Show practice on isin() method by creating two Pandas Series

14. Show practice on adding columns to a DataFrame

15. Show practice on dropping columns

16. Impress me with something that we have not studied in the class

## Assignment 3 : Pandas Series

Marks = 5

Deadline = March 5, 2020

Submission Link =

**Important Note:** Do not submit files that were created during a lecture in the lab. You are supposed to submit files that you have created at home. The purpose is to have as much practice as possible. Use as many comments and examples as possible. Marks are based on the size, examples, and comments inside the submitted assignments.

1. Make a pandas series manually

2. Make a series from a python list

3. Make a series from another series

4. Practice on filtering data from a series

5. Compare two series

6. Perform mathematical operations on a series

7. Convert a pandas Series to python list

8. Use pd.read_csv() method to import a file into pandas

9. Filter that data, create a subset, and find descriptive statistics

10. Convert an object series to numeric series

11. Append one series to another

12. USe shift(), shift(-1) functions

13. Find percentage changes in a series

14. Practice on changing from on directory to another

15. Write DataFrame to Excel File

16. Import TSV file into DataFrame

17. Slice columns from DataFrame

18. Use two conditions to create a boolean series, and then slice data from dataframe using that boolean series

## Assignment 2: Python Basics II

Marks = 5

Deadline = Feb 27, 2020

Submission Link =

Use Jupyter Notebook for completing the following tasks

1. Make the NumberToText function : minimum values acceptance is 999, though making it to accept any number would be great.

2. String to integers and integers to string practice

3. FOR loop

4. Use DIR command to show available attributes with Python objects

5. Practicing on as many string methods as possible from the dir(str) command

6. Combine two string objects with + symbol

7. Practice on RANGE function

8. Practice on FOR loop using RANGE function

9. USe FOR loop to show odd and even number when looping through RANGE items

10. Show the use of WHILE loop

## Assignment 1: Python Basics

Marks = 5

Deadline = Feb 20, 2020

Submission Link =

Use Jupyter Notebook for completing the following tasks

1. Show the use of all mathametical operations including the PEMDAS rule

2. Show output of integers and floating numbers

3. Use comments

4. Write and print text

5. Show variable assignments

6. Assign some text to variables and then add more text to the same variable

7. Show practice on SLICING of string

8. SLICE from a given index upto a given index

9. Create lists and slice elements from list

10. Show the use of FOR loop

11. Show the use of WHILE loop

12. SHOW IF, ELIF, ELSE statements