About This Course
In this course, our aim is to get the students STARTED on using data science in practice, either for academic projects or problems in industry which they might face later on in their career.
The course is more about breadth than depth, i.e. we try to cover a lot of topics, but we won’t get too deep in them (as it would take “forever”!)
We go over all steps of a data problem workflow, namely:
Gathering Data → Cleaning → Preprocessing → Analysis → Visualization
Throught this process, we work on several different sample problems in the lab sessions as well as through homeworks and final projects, in order to help the students be ready for various type of data problems they’ll face later on in their careers.
Learning Objectives
By the end of the semester, the students will (hopefuly!):
 Learn how to prepare the dataset they are going to use (cleaning, preprocessing and exploratory data analysis),
 Become familiar with different types of problems (e.g. regression/classification), as well as main learning types (e.g. supervised/unsupervised),
 Practice using some of the most wellknown and widely used algorithms for classification/regression problems (including Decision Trees, Neural Networks, etc.)
 Strengthen their reporting/visualizing skills to communicate the results in the most effective way.
 Practice the learned skills on different datasets to get more experience on different data problems.
Overall, our main target is that we can conduct the whole workflow (e.g. from gathering/finding the dataset to pareparing it and then analyzing and reporting the results) in a confident smooth manner.
Prerequisites
 Familiarity with statistics and basics of data science
 Knowledge of programming in Python, NumPy and Pandas (homeworks can be done in R as well, but we cannot help with any bugs along the way).
 Having worked with Google Colab before would be a plus!
Reading Material
There is no required textbook for this course. But the students are expected to go over the material that is introduced along the way during the course.
Grading
 Homeworks: 40%
 Final Project: 40%
 Notebook (runs correctly and does the task): 70 Points
 Presentation (runs correctly and does the task): 30 Points
 Of the 30 points of the presentation, 20 comes from other groups!

Final Exam: 20%
 We might have a bonus of upto 10% for participation in a Kaggle contest! More on that later on
Logistics
 The course is divided into a lecture (on Sundays) and a lab session (on Tuesdays).
 We’ll arrange a session throuout the week for helping you with issues you might encounters when working on your homework/project.
 We are going to use Python for the hands on lab sessions. But homeworks/projects can be handed as a precomputed Jupyter notebooks (so they can be done in R).
 Communications are done via MS Teams chat rooms.