Email:info@araniconsulting.com

**Unit 1: Introduction to Data Science**- What is Data Science, what does a data scientist do, various examples of Data Science in the industries and how Python is deployed for Data Science applications, various steps in Data Science process like data wrangling, data exploration and selecting the model, understanding data visualization, what is exploratory data analysis and building of hypothesis, plotting and other techniques.

**Unit 2: Introduction to Python**- Introduction to Python programming language, important Python features, how is Python different from other programming languages, Python installation, Anaconda Python distribution for Windows, Linux and Mac, how to run a sample Python script, Python IDE working mechanism, running some Python basic commands, Python variables, data types, and keywords.

**Unit 3: Python basic constructs**- Introduction to a basic construct in Python, understanding indentation like tabs and spaces, code comments like Pound # character, names and variables, Python built-in data types like containers (list, set, tuple and dict), numeric (float, complex, int), text sequence (string), constants (true, false, ellipsis) and others (classes, instances, modules, exceptions and more), basic operators in Python like logical, bitwise, assignment, comparison and more, slicing and the slice operator, loop and control statements like break, if, for, continue, else, range() and more.

**Unit 4: Writing OOP in Python and connecting to database**- Understanding the OOP paradigm like encapsulation, inheritance, polymorphism, and abstraction, what are access modifiers, instances, class members, classes and objects, function parameter and return type functions, Lambda expressions, connecting with database to pull the data.

**Unit 5: NumPy for mathematical computing**- Introduction to mathematical computing in Python, what are arrays and matrices, array indexing, array math, ND-array object, datatypes, standard deviation, conditional probability in NumPy, correlation, covariance

**Unit 6: SciPy for scientific computing**- Introduction to SciPy, building on top of NumPy, what are the characteristics of SciPy, various sub-packages for SciPy like Signal, Integrate, Fftpack, Cluster, Optimize, Stats and more, Bayes Theorem with SciPy.

**Unit 7: Data Analysis and Machine Learning (Pandas)**- Introduction to Machine Learning with Python, various tools in Python used for Machine Learning like NumPy, Scikit-Learn, Pandas, Matplotlib and more, use cases of Machine Learning, process flow of Machine Learning, various categories of Machine Learning, understanding Linear Regression and Logistic Regression, what is gradient descent in Machine Learning, introduction to Python DataFrames, importing data from JSON, CSV, Excel, SQL database, NumPy array to DataFrame, various data operations like selecting, filtering, sorting, viewing, joining and combining, how to handle missing values, time series analysis.

**Unit 8: Data manipulation**- What is a data object and its basic functionalities, using Pandas library for data manipulation, NumPy dependency of Pandas library, loading and handling data with Pandas, how to merge data objects, concatenation and various types of joins on data objects, exploring and analyzing datasets.

**Unit 9: Data visualization with Matplotlib**- Using Matplotlib for plotting graphs and charts like Scatter, Bar, Pie, Line, Histogram and more, Matplotlib API, Subplots and Pandas built-in data visualization.

**Unit 10: Supervised learning**- What is supervised learning, classification, Decision Tree, algorithm for Decision Tree induction, Confusion Matrix, Random Forest, Naïve Bayes, working of Naïve Bayes, how to implement Naïve Bayes Classifier, Support Vector Machine, working process of Support Vector Mechanism, what is Hyperparameter Optimization, comparing Random Search with Grid Search, how to implement Support Vector Machine for classification.

**Unit 11: Unsupervised Learning**- Introduction to unsupervised learning, use cases of unsupervised learning, what is K-means clustering, understanding the K-means clustering algorithm, optimal clustering, hierarchical clustering, and K-means clustering and how does hierarchical clustering work, what is natural language processing, working with NLP on text data, setting up the environment using Jupyter Notebook, analyzing sentence, the Scikit-Learn Machine Learning algorithms, bags of words model, extracting feature from text, searching a grid, model training, multiple parameters and building of a pipeline.

**Unit 12: Web Scraping with Python**- Introduction to web scraping in Python, various web scraping libraries, BeautifulSoup, Scrapy Python packages, installing of BeautifulSoup, installing Python parser lxml, creating soup object with input HTML, searching of the tree, full or partial parsing, output print and searching the tree.

**Unit 13: Python integration with Hadoop and Spark**- What is the need for integrating Python with Hadoop and Spark, the basics of the Hadoop ecosystem, Hadoop Common, the architecture of MapReduce and HDFS and deploying Python coding for MapReduce jobs on Hadoop framework, understanding Apache Spark, setting up Cloudera QuickStart VM, Spark tools, RDD in Spark, PySpark, integrating PySpark with Jupyter Notebook, introduction to Artificial Intelligence and Deep Learning, deploying Spark code with Python, the Machine Learning library of Spark MLlib, deploying Spark MLlib for classification, clustering and regression.

**Python for Data Science Projects****Project 1: Analyzing the naming pattern using Python****Industry**: GeneralProblem Statement: How to analyze the trends and most popular baby names**Topics**: In this Python project you will work with the United States Social Security Administra4on (SSA) has made available data on the frequency of baby names from 1880 through 2016. The project requires analyzing the data considering different methods. You will visualize the most frequent names, determine the naming trends, and come up with the most popular names for a certain year.**Highlights**:

- Analyzing data using Pandas Library
- Deploying Data Frame Manipulation
- Bar & box plots with MatPlotLib

**Project 2: Python Web Scraping for Data Science**- In this project you will be introduced to the process of web scraping using Python. It involves installation of Beautiful Soup, web scraping libraries, working on common data and page format on the web, learning the important kinds of objects, Navigable String, deploying the searching tree, navigation options, parser, search tree, searching by CSS class, list, function and keyword argument.

**Project 3: Predicting customer churn in Telecom Company****Industry**: TelecommunicationsProblem Statement: How to increase the profitability of a telecom major by reducing the churn rate**Topics**: In this project, you will work with the telecom company’s customer dataset. This dataset includes subscribing to telephone customer’s details. Each of the columns has data on phone number, call minutes during various times of the day, the charges incurred, lifetime account duration, whether or not the customer has churned some services by unsubscribing it. The goal is to predict whether a customer will eventually churn or not.**Highlights**:

- Deploy Scikit-learn ML library
- Develop code with Jupyter Notebook
- Build a model using a performance matrix.

**Project 4: Server logs/Firewall logs****Objective**: This includes the process of loading the server logs into the cluster using Flume. It can then be refined using Pig Script, Ambari and HCatlog. You can then visualize it using elastic search and excel. This project task includes:- Server logs- Potential uses of server log data

- Pig script
- Firewall logs
- Workflow editor