Customer Churn PredictionLast Updated on May 3, 2021
A manager at the bank is disturbed with more and more customers leaving their credit card services. They would really appreciate if one could predict for them who is gonna get churned so they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction.
Share with someone who needs it
Enterprise AiLast Updated on May 3, 2021
Enterprise AI is about enhancing the customer satisfaction index and to ensure customer stickiness to your organization. By infusing emerging technologies like artificial intelligence to engage and retain the set of customers. Using AI algorithm, we should address the use case. The business operation
processes like determining the customer sentiments from various different media like - Social media, Audio Calls, Video Calls, Images, Emails & Chats, interact with customers to provide quick and effortless
solutions, analyze and learn from buying behaviour to generate next best offer, ascertain customer retention and ensure lesser churn, derive AI-based Customer Segmentation, manage customer
touchpoints, evaluate customer feedback and engage with the customers. We provide a membership card to all the customers who purchase stocks in the store. By scanning the QR code the customer can fill the
feedback. Through the user can easily complete the feedback (Bad, Good, Very good) after purchasing. We are providing three categories (Bronze, Gold and Platinum) for our customers to categorize their
purchasing list to calculate the purchasing efficiency based on their quality ,they purchase. The customer who gives feedback as very good, they come under platinum category, best offers are provided to
them (free purchase for Rs.1000). Notifications will be sent to customers through the messages about the new products available along with its price. Best offers are also provided on festival occasions. We classify the feedback using classification algorithms like random forest to get the positive and negative feedbacks.
Negative feedback will be collected and rectified soon. Through this approach, the shopkeeper is able to get clear feedback about his shop easily.
Long Term ToolLast Updated on May 3, 2021
My previous project was shear project project that is Long term tool .This tool is used by wind farm owners who want to know in which location it is going to give best profits.
Suppose A wants to start a wind farm business A is having money but he is not aware of wind speeds at particular location ,so he took help from B (The wind pioneers) wind pioneers uses sensor for every wind station to find the wind speed and wind direction. Here wind pioneers role is to record the data which contain wind speeds and wind directions for every hour.
wind pioneers measuring wind speeds at various heights of sensor like ws_120m,ws_100m. For each minute we have some observations ,for every hour the number of observations will increases ,so it is very large data to deal. so we cannot do manual calculations for analyzing this big data. So here we come up with one tool that is long term tool.
I worked on this project along with team this tool provide you interactive software for performing all the analysis like plots, correlation values, scatter plots for finding relationship between two variables. You can just simply download the files that you are working for. It will going to give you everything in detail.
Here we are taking Reference data as NASA data of past 30 years which contains wind speed and wind direction In order to predict the wind speeds of particular location for next 30 years by making use of linear regression model .
Here we are predicting wind speeds of next 30 years for particular location by taking reference data as NASA data.
We are performing linear model for various time periods 1hr,6hr,1 day,3day,7day,10 day,1 month. Again sometimes your weather file and climate file may be differ with time In order to compensate time period we are using time shifting for reference file.
Web Base Application Heart Failure Prediction SystemLast Updated on May 3, 2021
In this situation, approximately 17 million people kill globally per year in the whole world because of cardiovascular disease, and they mainly exhibit myocardial-exhibit myocardial infarction and heart failure. Heart failure (HF) occurs when the heart cannot pump enough blood to meet the needs of the body.
In this heart prediction problem statement, we are trying to predict whether the patient's heart muscle pumps blood properly or not using Logistic Regression. In this project, a dataset is downloaded from the UCI repository and this dataset is real. this dataset is collected from one of the most famous hospitals is in the United Kingdom (UK) in 2015 and there are 299 patient records and 12 features(attribute) and one label. Based on that 12 features, we will predict whether the patient's heart working properly or not.
In this problem statement, we analyze a dataset of 299 patients with heart failure collected in 2015. We apply several machine learning & classifiers to both predict the patient’s survival, and rank the features corresponding to the most important risk factors. We also perform an alternative feature ranking analysis by employing traditional biostatistics tests and compare these results with those provided by the machine learning algorithms. Since both feature ranking approaches clearly identify serum creatinine and ejection fraction as the two most relevant features, we then build the machine learning survival prediction models on these two factors alone.
For model building we use various library packages like Pandas, Scikit learns (sklearn), matplotlib, Seaborn, Tensorflow, Keras, etc., then we will use data description, Data description involves carrying out initial analysis on the data to understand more about the data, its source, volume, attributes, and relationships. Once these details are documented, any shortcomings if noted should be informed to relevant personnel. after that, we use the data cleaning method for cleaning the dataset to check if there are any missing values or not and we split the dataset into training & testing purposes with 70%, 30% criteria. Then the next step is Model Building, The process of model building is also known as training the model using data and features from our dataset. A combination of data (features) and Machine Learning algorithms together give us a model that tries to generalize on the training data and give necessary results in the form of insights and/or predictions. Generally, various algorithms are used to try out multiple modeling approaches on the same data to solve the same problem to get the best model that performs and gives outputs that are the closest to the business success criteria. Key things to keep track of here are the models created, model parameters being used, and their results. And the last step is to analyze the result in this step we check our model score or accuracy by using Confusion Matrix and Model Score. For this model, we got 80% accuracy. In the future, we try to improve that accuracy. For model deployment, we use the python flask and based on that we build the web-based application.
Retail Analysis Of Walmart DataLast Updated on May 3, 2021
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An
ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
This is the historical data which covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:
- Store - the store number
- Date - the week of sales
- Weekly_Sales - sales for the given store
- Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
- Temperature - Temperature on the day of sale
- Fuel_Price - Cost of fuel in the region
- CPI – Prevailing consumer price index
- Unemployment - Prevailing unemployment rate
Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13
Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13
Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13
Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Basic Statistics tasks
- Which store has maximum sales
- Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation
- Which store/s has good quarterly growth rate in Q3’2012
- Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together
- Provide a monthly and semester view of sales in units and give insights
For Store 1 – Build prediction models to forecast demand
- Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.
- Change dates into days by creating new variable.
Select the model which gives best accuracy.
Loan PredictionLast Updated on May 3, 2021
A Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have provided a dataset to identify the customers segments that are eligible for loan amount so that they can specifically target these customers. So in this project our main objective is to predict whether a individual is eligible for loan or not based on given dataset.
For simplicity i divided my projects into small parts-
- Data Collection :- I collected data from 'Anylitical Vidhya' as a CSV file. We have two CSV file one is train data which is used for training the data and other is test data which is used for prediction based on training of model.
- Import Libraries:- I import differnt Sklearn package for algorithm and different tasks.
- Reading data:- i read the data using pandas 'read csv()' function.
- Data Preprocessing -: In this part i first found missing values then i remove a column or imputed some value (mean, mode, median) According to the amount of data missing for a particular column.
I checked the unique value in each column. Then i did label encoding to convert all string types data to integer value. I used dummie function to convert each unique value to different columns . I find out correlation matrix which shows the correlation between columns to each other.
Then i split the data. I did analysis on each column and row of dataset.
Here i selected a classifier algorithm because it is a classification problem i.e. in this problem target value is of categorial datatype.
Then i create a model . I trained that model using Logistic regression Algorithm , which is a classification algorithm. I feed training dataset to model using Logistic regression algorithm. After creating model i did similiar data preprocessing to test dataset . And then i feed test dataset to trained model which predict the values of this test dataset. And then i found accuracy of this model using actual target value which is given in training dataset. and predict target value which we predict from test dataset.
After this i used another algorithm which is random forest classifier. i did traied the model using random forest classifier and then calculate the accuracy.
I compared the accuracy of both algorithm and i preffered algorithm which had better accuracy.
In this project i got 78.03% accuracy when i create model using random forest classifier and got 80.06% when i create model using logistic regression.