Movie Recommendation System

Last Updated on May 3, 2021

About

It is a content based recommendation system which uses NLP concepts i.e., Tfi-df lemmatization etc., it uses Cosine similarity to build a machine learning system.

More Details: movie recommendation system

Submitted By


Share with someone who needs it

Regression Analysis On Wallmart Sales Data

Last Updated on May 3, 2021

About

One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An 

ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.

 Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.

 Dataset Description

This is the historical data which covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:

·        Store - the store number

·        Date - the week of sales

·        Weekly_Sales - sales for the given store

·        Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week

·        Temperature - Temperature on the day of sale

·        Fuel_Price - Cost of fuel in the region

·        CPI – Prevailing consumer price index

·        Unemployment - Prevailing unemployment rate

 Holiday Events

Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13

Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13

Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13

Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13

 Analysis Tasks

Basic Statistics tasks

1.     Which store has maximum sales

2.     Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation

3.     Which store/s has good quarterly growth rate in Q3’2012

4.     Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together

5.     Provide a monthly and semester view of sales in units and give insights

 Statistical Model

For Store 1 – Build prediction models to forecast demand

·        Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.

·        Change dates into days by creating new variable.

Select the model which gives best accuracy.

More Details: Regression Analysis on Wallmart Sales Data

Natural Language Processing

Last Updated on May 3, 2021

About

The problem statement is about allocation of projects using given dataset. We are provided with some requirements like project details (project name, project location and required project skills) and

candidate details (candidate id, location, candidate skills and description). From the given dataset, we have to filter the perfect candidate based on the requirements and their skills. Our work is to check whether the candidate is having required skills to do the project and also determine the evaluation status based on their location. If suppose the candidates is having required skills and match the location, the candidate is selected for that project, if does not match we reject the candidate for that project. In such case the rejected

candidates are checked with other projects. The foremost step is to clean up the data to highlight attributes.

Cleaning (or pre-processing) the data typically consists of a number of steps like remove punctuation, tokenization and remove stop words. I have taken a set of keywords which is most related to the skills that’s given in the project based on certain criteria .To describe the presence of keywords within the cleaned data we need to vectorize the data by Bag of Words. We are going to filter the candidate skills according to the current trends. Based on their number of skills known(languages) they will be prioritized. So, we want to use NLP Toolkit to arrange the candidates by their preferences. By doing this process in the given dataset, we can able to filter 50% of data. If the skills of the prioritized candidates match with same location of the project, the similarities will be calculated and the candidate is selected for that project else the candidate is rejected.

More Details: Natural Language Processing

Telecom Churn Prediction

Last Updated on May 3, 2021

About

This case requires trainees to develop a model for predicting customer churn at a fictitious wireless telecom company and use insights from the model to develop an incentive plan for enticing would-be churners to remain with company. Data for the case are available in csv format. The data are a scaled down version of the full database generously donated by an anonymous wireless telephone company. There are still 7043 customers in the database, and 20 potential predictors. Candidates can use whatever method they wish to develop their machine learning model. The data are available in one data file with 7043 rows that combines the calibration and validation customers. “calibration” database consisting of 4000 customers and a “validation” database consisting of 3043 customers. Each database contained (1) a “churn” variable signifying whether the customer had left the company two months after observation, and (2) a set of 20 potential predictor variables that could be used in a predictive churn model. Following usual model development procedures, the model would be estimated on the calibration data and tested on the validation data. This case requires both statistical analysis and creativity/judgment. I recommend you pend much time on both fine-tuning and interpreting results of your machine learning model.

More Details: Telecom churn Prediction

Submitted By


Python Snake Game

Last Updated on May 3, 2021

About

This game reminds everyone their childhood memories.

In this snake game, the player has to move the snake to the fruit in order to eat it. The score will increase once the fruit is eaten. Also, the length of the snake will increase if the snake eats the fruit. The game will get over if the snake touches itself.

The turtle and random modules are used in this game project. So as to install these libraries, simply type “pip install turtle” and “pip install random” on the command prompt.

Turtle library allows us to create pictures, diagrams in a virtual form whereas random module gives the value between the given range of it.

There are 3 functions defined in this game which is “change”, “inside function”, and “move” function. In change function, the x-axis and y-axis are defined. In inside function, the logic of the game is written and in the move function, movement to the snake is given.

There are 4 keys mentioned in the code “right, left, up, down”.

If the player presses the right key, the snake will move to right direction, If the player presses the left key , the snake will move to left direction, If the player presses the up key , the snake will move to upward direction, If the player presses the down key , the snake will move to downward direction and if the snake touches itself the game will get over.

More Details: Python Snake Game

Submitted By


Cluster Ai

Last Updated on May 3, 2021

About

Explore a galaxy of research papers in 3D space using a state-of-the-art machine learning model.

Inspiration

Search engines like Google Scholar make it easy to find research papers on a specific topic. However, it can be hard to branch out from a general position to find topics for your research that need to be specified. Wouldn’t it be great to have a tool that not only recommends you research papers, but does it in a way that makes it easy to explore other related topics and solutions to your topic?

What it does

Users will input either a text query or research paper into Cluster AI. Cluster AI uses BERT (Bidirectional Encoder Representations from Transformers), a Natural Language Processing model, in order to connect users to similar papers. Cluster AI uses the CORE Research API to fetch research articles that may be relevant, then visualizes the similarity of these papers in a 3d space. Each node represents a research paper, and the distances between the nodes show the similarity between those papers. Using this, users can visualize clusters of research papers with close connections in order to quickly find resources that pertain to their topic.

Test Cluster AI here

Note: Running on CPU based server, deploying your own Django server using instructions in the Source Code is highly recommended. Demo may have delays depending on the query and number of users at any given point. 10-100 papers, but up to 20 papers requested in the query will be optimal.

Check out the Source Code!

How we built it

We used a multitude of technologies, languages, and frameworks in order to build ClusterAI.

  1. BERT (Bidirectional Encoder Representations from Transformers) and MDS (Multidimensional Scaling) with PyTorch for the Machine Learning
  2. Python and Django for the backend
  3. Javascript for the graph visualizations (ThreeJS/WebGL)
  4. Bootstrap/HTML/CSS/Javascript for the frontend

Challenges we ran into

The CORE Research API did not always provide all the necessary information that was requested. It sometimes returned papers not in English or without abstracts. We were able to solve this problem by validating the results ourselves. Getting the HTML/CSS to do exactly what we wanted gave us trouble.

Accomplishments that we're proud of

We worked with a state-of-the-art natural language processing model which successfully condensed each paper into a 3D point.

The visualization of the graph turned out great and let us see the results of the machine learning techniques we used and the similarities between large amounts of research papers.

What we learned

We learned more about HTML, CSS, JavaScript, since the frontend required new techniques and knowledge to accomplish what we wanted. We learned more about the BERT model and dimensionality reduction. The semantic analysis of each paper’s abstract the BERT model provided served as the basis for condensing each paper into 3D points.

What's next for Cluster AI

We can add filtering to the nodes so that only nodes of a given specification are shown. We can expand Cluster AI to visualize other corpora of text, such as books, movie scripts, or news articles. Some papers are in different languages; we would like to use an API to convert the different languages into a person’s native language, so anyone will be able to read the papers.

More Details: Cluster AI

Submitted By