Regression Analysis On Wallmart Sales Data

Last Updated on May 3, 2021

About

One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An 

ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.

 Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.

 Dataset Description

This is the historical data which covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:

·        Store - the store number

·        Date - the week of sales

·        Weekly_Sales - sales for the given store

·        Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week

·        Temperature - Temperature on the day of sale

·        Fuel_Price - Cost of fuel in the region

·        CPI – Prevailing consumer price index

·        Unemployment - Prevailing unemployment rate

 Holiday Events

Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13

Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13

Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13

Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13

 Analysis Tasks

Basic Statistics tasks

1.     Which store has maximum sales

2.     Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation

3.     Which store/s has good quarterly growth rate in Q3’2012

4.     Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together

5.     Provide a monthly and semester view of sales in units and give insights

 Statistical Model

For Store 1 – Build prediction models to forecast demand

·        Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.

·        Change dates into days by creating new variable.

Select the model which gives best accuracy.

More Details: Regression Analysis on Wallmart Sales Data
Share with someone who needs it

Car Price Prediction

Last Updated on May 3, 2021

About

It is a complete end to end project from starting stage of Data preprocessing till the last stage i.e. Model Deployment. In this project first I have done data wrangling which includes data cleaning phase in order to make our dataset more organized. Some of the common steps which I have included in my data cleaning phase are removing of outliers , handling missing values . After that I split my dataset into training and testing dataset with the help of train_test_split function . After that I have passed my training dataset to train my model , In this case I have used Random Forest Regressor as my model and GridSearchCV for the hyperparameter tuning. GridSearchCV helps us to find out the best parameters for our model which ultimately increases the accuracy of our model. After performing all these operations I have tested my model on my testing dataset and Fortunately my model is producing amazing result . I have calculated my accuracy score with the help of a function named accuracy_score. We can also use confusion matrix, classification report to see our model's performance. Accuracy of my model is 98.5%. Now at the end I deployed my model with the help of my basic web development knowledge. It includes some of the files like pickle file, app.py, requirements.txt. Now If I talk about working of my model It is used to predict selling price of a car by taking some of the features like cost price, km driven, type of fuel etc.

More Details: Car Price Prediction

Submitted By


Real Time Object Detection Using Tensorflow

Last Updated on May 3, 2021

About

Object detection is a computer vision technique in which a software system can detect, locate, and trace the object from a given image or video. The special attribute about object detection is that it identifies the class of object (person, table, chair, etc.) and their location-specific coordinates in the given image. The location is pointed out by drawing a bounding box around the object. The bounding box may or may not accurately locate the position of the object. The ability to locate the object inside an image defines the performance of the algorithm used for detection. Face detection is one of the examples of object detection.

These object detection algorithms might be pre-trained or can be trained from scratch. In most use cases, we use pre-trained weights from pre-trained models and then fine-tune them as per our requirements and different use cases.

Generally, the object detection task is carried out in three steps:

  • Generates the small segments in the input as shown in the image below. As you can see the large set of bounding boxes are spanning the full image

  • Feature extraction is carried out for each segmented rectangular area to predict whether the rectangle contains a valid object.

  • Overlapping boxes are combined into a single bounding rectangle (Non-Maximum Suppression)

Tensorflow is an open-source library for numerical computation and large-scale machine learning that ease Google Brain TensorFlow, the process of acquiring data, training models, serving predictions, and refining future results.

  • Tensorflow bundles together Machine Learning and Deep Learning models and algorithms. 
  • It uses Python as a convenient front-end and runs it efficiently in optimized C++.
  • Tensorflow allows developers to create a graph of computations to perform. 
  • Each node in the graph represents a mathematical operation and each connection represents data. Hence, instead of dealing with low-details like figuring out proper ways to hitch the output of one function to the input of another, the developer can focus on the overall logic of the application.

The TensorFlow Object Detection API is an open-source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models.

  • There are already pre-trained models in their framework which are referred to as Model Zoo. 
  • It includes a collection of pre-trained models trained on various datasets such as the 
  • COCO (Common Objects in Context) dataset, 
  • the KITTI dataset, 
  • and the Open Images Dataset.

As you may see below there are various models available so what is different in these models. These various models have different architecture and thus provide different accuracies but there is a trade-off between speed of execution and the accuracy in placing bounding boxes.

Tensorflow bundles together Machine Learning and Deep Learning models and algorithms. It uses Python as a convenient front-end and runs it efficiently in optimized C++.

Tensorflow allows developers to create a graph of computations to perform. Each node in the graph represents a mathematical operation and each connection represents data. Hence, instead of dealing with low-details like figuring out proper ways to hitch the output of one function to the input of another, the developer can focus on the overall logic of the application.

The deep learning artificial intelligence research team at Google, Google Brain, in the year 2015 developed TensorFlow for Google’s internal use. This Open-Source Software library is used by the research team to perform several important tasks.

TensorFlow is at present the most popular software library. There are several real-world applications of deep learning that makes TensorFlow popular. Being an Open-Source library for deep learning and machine learning, TensorFlow finds a role to play in text-based applications, image recognition, voice search, and many more. DeepFace, Facebook’s image recognition system, uses TensorFlow for image recognition. It is used by Apple’s Siri for voice recognition. Every Google app that you use has made good use of TensorFlow to make your experience better.

Here mAP (mean average precision) is the product of precision and recall on detecting bounding boxes. It’s a good combined measure for how sensitive the network is to objects of interest and how well it avoids false alarms. The higher the mAP score, the more accurate the network is but that comes at the cost of execution speed which we want to avoid here.

As my PC is a low-end machine with not much processing power, I am using the model ssd_mobilenet_v1_coco which is trained on COCO dataset. This model has decent mAP score and less execution time. Also, the COCO is a dataset of 300k images of 90 most commonly found objects so the model can recognise 90 objects.

This brings us to the end of this project where we learned how to use Tensorflow object detection API to detect objects in images 

More Details: Real Time Object Detection using Tensorflow

Submitted By


Hacksat

Last Updated on May 3, 2021

About

Imagine a satellite which enables anyone to avoid thinking in data transfer, energy and all of those nuisances.

What it does

HackSat consists of a prototype for a CubeSat blueprint which will allow anyone who wants to do any experimentation up in outer space to avoid worrying about how to send data or how to provide energy and start thinking about which data will be sent and when they will sent it.

It is also worth noticing that everything will be released under an Open Source License

How we built it

We designed the basic structure, based on the CubeSat specs provided by California Polytechnic State University and used by NASA to send low cost satellites.

We printed the structure by means of a couple 3d printers.

We handcrafted all electronics by using a combination of 3 Arduinos, which required us to search for low consuming components, in order to maximize the battery power, we also work on minimize the energy consumption for the whole satellite.

We opted to use recycled components, like solar panels, cables, battery, converter...

We worked a lot on the data transfer part, so it allows the Sat to be sleeping by the most part, on an effort to increase even more the battery life.

And almost 24hours of nonstop work and a lot of enthusiasm!!

Challenges we ran into

We find mostly challenging the electronics, because our main objective was to get the optimal energy out of our battery and avoid draining it too fast.

Another point worth mentioning was the data transfer between the experiment section and the Sat section, because we wanted to isolate each part as much as possible from the other, so the experiment just need to tell the Sat to send the data and nothing more.

Accomplishments that we are proud of

We are very proud to have accomplished the objective of making a viable prototype, even though we have faced some issues during these days, nonetheless we managed to overcome all of those issues and as a consequence we have grown wiser and our vision has become wider.

What we learned

During the development for HackSat, we have learned a lot about radio transmission, a huge lot about serial port and how to communicate data between 3 different micros, using 2 different protocols.

What's next for HackSat

The first improvement that should be made is fix some issues we encountered with the measures of our designs, which have required some on site profiling.

Another obvious improvement is update the case so it is made of aluminium instead of plastic, which is the first blocking issue at the moment for HackSat to be launched.

Finally, we would change the hardware so it has more dedicated hardware which most likely will allow us to optimise even the battery consumption and global lifespan for the Sat.

More Details: HackSat

Submitted By


Tic-Tac-Toe Game

Last Updated on May 3, 2021

About

This project Tic Tac Toe game against a simple artificial intelligence. An artificial intelligence (or AI) is a computer program that can intelligently respond to the player’s moves. This game doesn’t introduce any complicated new concepts. The artificial intelligence that plays Tic Tac Toe is really just a few lines of code.

Two people play Tic Tac Toe with paper and pencil. One player is X and the other player is O. Players take turns placing their X or O. If a player gets three of their marks on the board in a row, column or one of the two diagonals, they win. When the board fills up with neither player winning, the game ends in a draw.

This chapter doesn’t introduce many new programming concepts. It makes use of our existing programming knowledge to make an intelligent Tic Tac Toe player. The player makes their move by entering the number of the space they want to go. These numbers are in the same places as the number keys on your keyboard's keypad


First, you must figure out how to represent the board as data in a variable. On paper, the Tic Tac Toe board is drawn as a pair of horizontal lines and a pair of vertical lines, with either an X, O, or empty space in each of the nine spaces.

In the program, the Tic Tac Toe board is represented as a list of strings. Each string will represent one of the nine spaces on the board. To make it easier to remember which index in the list is for which space, they will mirror the numbers on a keyboard’s number keypad.

The strings will either be 'X' for the X player, 'O' for the O player, or a single space ' ' for a blank space.

So if a list with ten strings was stored in a variable named board, then board[7] would be the top-left space on the board. board[5] would be the center. board[4] would be the left side space, and so on. The program will ignore the string at index 0 in the list. The player will enter a number from 1 to 9 to tell the game which space they want to move on.


Creating a program that can play a game comes down to carefully considering all the possible situations the AI can be in and how it should respond in each of those situations. The Tic Tac Toe AI is simple because there are not many possible moves in Tic Tac Toe compared to a game like chess or checkers.

Our AI checks if any possible move can allow itself to win. Otherwise, it checks if it must block the player’s move. Then the AI simply chooses any available corner space, then the center space, then the side spaces. This is a simple algorithm for the computer to follow.

The key to implementing our AI is by making copies of the board data and simulating moves on the copy. That way, the AI code can see if a move results in a win or loss. Then the AI can make that move on the real board. This type of simulation is effective at predicting what is a good move or not.

More Details: Tic-Tac-Toe Game

Submitted By


Long Term Tool

Last Updated on May 3, 2021

About

My previous project was shear project project that is Long term tool .This tool is used by wind farm owners who want to know in which location it is going to give best profits.

Suppose A wants to start a wind farm business A is having money but he is not aware of wind speeds at particular location ,so he took help from B (The wind pioneers) wind pioneers uses sensor for every wind station to find the wind speed and wind direction. Here wind pioneers role is to record the data which contain wind speeds and wind directions for every hour.

wind pioneers measuring wind speeds at various heights of sensor like ws_120m,ws_100m. For each minute we have some observations ,for every hour the number of observations will increases ,so it is very large data to deal. so we cannot do manual calculations for analyzing this big data. So here we come up with one tool that is long term tool.

I worked on this project along with team this tool provide you interactive software for performing all the analysis like plots, correlation values, scatter plots for finding relationship between two variables. You can just simply download the files that you are working for. It will going to give you everything in detail.

Here we are taking Reference data as NASA data of past 30 years which contains wind speed and wind direction In order to predict the wind speeds of particular location for next 30 years by making use of linear regression model .

Here we are predicting wind speeds of next 30 years for particular location by taking reference data as NASA data.

We are performing linear model for various time periods 1hr,6hr,1 day,3day,7day,10 day,1 month. Again sometimes your weather file and climate file may be differ with time In order to compensate time period we are using time shifting for reference file.



More Details: long term tool

Submitted By