Retail Analysis Of Walmart DataLast Updated on May 3, 2021
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An
ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
This is the historical data which covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:
- Store - the store number
- Date - the week of sales
- Weekly_Sales - sales for the given store
- Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
- Temperature - Temperature on the day of sale
- Fuel_Price - Cost of fuel in the region
- CPI – Prevailing consumer price index
- Unemployment - Prevailing unemployment rate
Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13
Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13
Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13
Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Basic Statistics tasks
- Which store has maximum sales
- Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation
- Which store/s has good quarterly growth rate in Q3’2012
- Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together
- Provide a monthly and semester view of sales in units and give insights
For Store 1 – Build prediction models to forecast demand
- Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.
- Change dates into days by creating new variable.
Select the model which gives best accuracy.
Share with someone who needs it
Web Base Application Heart Failure Prediction SystemLast Updated on May 3, 2021
In this situation, approximately 17 million people kill globally per year in the whole world because of cardiovascular disease, and they mainly exhibit myocardial-exhibit myocardial infarction and heart failure. Heart failure (HF) occurs when the heart cannot pump enough blood to meet the needs of the body.
In this heart prediction problem statement, we are trying to predict whether the patient's heart muscle pumps blood properly or not using Logistic Regression. In this project, a dataset is downloaded from the UCI repository and this dataset is real. this dataset is collected from one of the most famous hospitals is in the United Kingdom (UK) in 2015 and there are 299 patient records and 12 features(attribute) and one label. Based on that 12 features, we will predict whether the patient's heart working properly or not.
In this problem statement, we analyze a dataset of 299 patients with heart failure collected in 2015. We apply several machine learning & classifiers to both predict the patient’s survival, and rank the features corresponding to the most important risk factors. We also perform an alternative feature ranking analysis by employing traditional biostatistics tests and compare these results with those provided by the machine learning algorithms. Since both feature ranking approaches clearly identify serum creatinine and ejection fraction as the two most relevant features, we then build the machine learning survival prediction models on these two factors alone.
For model building we use various library packages like Pandas, Scikit learns (sklearn), matplotlib, Seaborn, Tensorflow, Keras, etc., then we will use data description, Data description involves carrying out initial analysis on the data to understand more about the data, its source, volume, attributes, and relationships. Once these details are documented, any shortcomings if noted should be informed to relevant personnel. after that, we use the data cleaning method for cleaning the dataset to check if there are any missing values or not and we split the dataset into training & testing purposes with 70%, 30% criteria. Then the next step is Model Building, The process of model building is also known as training the model using data and features from our dataset. A combination of data (features) and Machine Learning algorithms together give us a model that tries to generalize on the training data and give necessary results in the form of insights and/or predictions. Generally, various algorithms are used to try out multiple modeling approaches on the same data to solve the same problem to get the best model that performs and gives outputs that are the closest to the business success criteria. Key things to keep track of here are the models created, model parameters being used, and their results. And the last step is to analyze the result in this step we check our model score or accuracy by using Confusion Matrix and Model Score. For this model, we got 80% accuracy. In the future, we try to improve that accuracy. For model deployment, we use the python flask and based on that we build the web-based application.
Natural Language ProcessingLast Updated on May 3, 2021
The problem statement is about allocation of projects using given dataset. We are provided with some requirements like project details (project name, project location and required project skills) and
candidate details (candidate id, location, candidate skills and description). From the given dataset, we have to filter the perfect candidate based on the requirements and their skills. Our work is to check whether the candidate is having required skills to do the project and also determine the evaluation status based on their location. If suppose the candidates is having required skills and match the location, the candidate is selected for that project, if does not match we reject the candidate for that project. In such case the rejected
candidates are checked with other projects. The foremost step is to clean up the data to highlight attributes.
Cleaning (or pre-processing) the data typically consists of a number of steps like remove punctuation, tokenization and remove stop words. I have taken a set of keywords which is most related to the skills that’s given in the project based on certain criteria .To describe the presence of keywords within the cleaned data we need to vectorize the data by Bag of Words. We are going to filter the candidate skills according to the current trends. Based on their number of skills known(languages) they will be prioritized. So, we want to use NLP Toolkit to arrange the candidates by their preferences. By doing this process in the given dataset, we can able to filter 50% of data. If the skills of the prioritized candidates match with same location of the project, the similarities will be calculated and the candidate is selected for that project else the candidate is rejected.
Real Time Object Detection Using TensorflowLast Updated on May 3, 2021
Object detection is a computer vision technique in which a software system can detect, locate, and trace the object from a given image or video. The special attribute about object detection is that it identifies the class of object (person, table, chair, etc.) and their location-specific coordinates in the given image. The location is pointed out by drawing a bounding box around the object. The bounding box may or may not accurately locate the position of the object. The ability to locate the object inside an image defines the performance of the algorithm used for detection. Face detection is one of the examples of object detection.
These object detection algorithms might be pre-trained or can be trained from scratch. In most use cases, we use pre-trained weights from pre-trained models and then fine-tune them as per our requirements and different use cases.
Generally, the object detection task is carried out in three steps:
- Generates the small segments in the input as shown in the image below. As you can see the large set of bounding boxes are spanning the full image
- Feature extraction is carried out for each segmented rectangular area to predict whether the rectangle contains a valid object.
- Overlapping boxes are combined into a single bounding rectangle (Non-Maximum Suppression)
Tensorflow is an open-source library for numerical computation and large-scale machine learning that ease Google Brain TensorFlow, the process of acquiring data, training models, serving predictions, and refining future results.
- Tensorflow bundles together Machine Learning and Deep Learning models and algorithms.
- It uses Python as a convenient front-end and runs it efficiently in optimized C++.
- Tensorflow allows developers to create a graph of computations to perform.
- Each node in the graph represents a mathematical operation and each connection represents data. Hence, instead of dealing with low-details like figuring out proper ways to hitch the output of one function to the input of another, the developer can focus on the overall logic of the application.
The TensorFlow Object Detection API is an open-source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models.
- There are already pre-trained models in their framework which are referred to as Model Zoo.
- It includes a collection of pre-trained models trained on various datasets such as the
- COCO (Common Objects in Context) dataset,
- the KITTI dataset,
- and the Open Images Dataset.
As you may see below there are various models available so what is different in these models. These various models have different architecture and thus provide different accuracies but there is a trade-off between speed of execution and the accuracy in placing bounding boxes.
Tensorflow allows developers to create a graph of computations to perform. Each node in the graph represents a mathematical operation and each connection represents data. Hence, instead of dealing with low-details like figuring out proper ways to hitch the output of one function to the input of another, the developer can focus on the overall logic of the application.
The deep learning artificial intelligence research team at Google, Google Brain, in the year 2015 developed TensorFlow for Google’s internal use. This Open-Source Software library is used by the research team to perform several important tasks.
TensorFlow is at present the most popular software library. There are several real-world applications of deep learning that makes TensorFlow popular. Being an Open-Source library for deep learning and machine learning, TensorFlow finds a role to play in text-based applications, image recognition, voice search, and many more. DeepFace, Facebook’s image recognition system, uses TensorFlow for image recognition. It is used by Apple’s Siri for voice recognition. Every Google app that you use has made good use of TensorFlow to make your experience better.
Here mAP (mean average precision) is the product of precision and recall on detecting bounding boxes. It’s a good combined measure for how sensitive the network is to objects of interest and how well it avoids false alarms. The higher the mAP score, the more accurate the network is but that comes at the cost of execution speed which we want to avoid here.
As my PC is a low-end machine with not much processing power, I am using the model ssd_mobilenet_v1_coco which is trained on COCO dataset. This model has decent mAP score and less execution time. Also, the COCO is a dataset of 300k images of 90 most commonly found objects so the model can recognise 90 objects.
This brings us to the end of this project where we learned how to use Tensorflow object detection API to detect objects in images
Online Gardening StoreLast Updated on May 3, 2021
This is a project made in Nodejs, MySQL and some npm packages .The aim of the project is to provide gardening people a easy interface from where they could buy necessities for gardening through online. There are various categories of the products from which the user can buy them.
We have options of adding options into cart, modifying them as well as deleting the required items. We have user authentication also in the application, To make it easier for the customers while making a payment we have an option from where one can directly choose the saved cards for the payment, Taxes are also calculated on the sub total once obtained. As of now no payment integration is done. Once a user submits the order, he/she will also able to see the history of their previous orders.
Once a user registers in the application or even when he/she confirms a order a verification of the order as well as login is sent to the registered email-id and mobile numbers.
For future enhancement we have thought of :-
- adding a filtering options
- search feature
- Take user input through some forms for their requirement and use NLP to retrieve the necessary products
- A chatbot for the whole application for the customers if they have any queries
Cifar-10 Image Classification Using TensorflowLast Updated on May 3, 2021
The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.
The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.
Computer algorithms for recognizing objects in photos often learn by example. CIFAR-10 is a set of images that can be used to teach a computer how to recognize objects. Since the images in CIFAR-10 are low-resolution (32x32), this dataset can allow researchers to quickly try different algorithms to see what works. Various kinds of covolutional neural networks tend to be the best at recognizing the images in CIFAR-10.