Movie Recommendation SystemLast Updated on May 3, 2021
It is a content based recommendation system which uses NLP concepts i.e., Tfi-df lemmatization etc., it uses Cosine similarity to build a machine learning system.
Share with someone who needs it
Regression Analysis On Wallmart Sales DataLast Updated on May 3, 2021
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An
ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
This is the historical data which covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:
· Store - the store number
· Date - the week of sales
· Weekly_Sales - sales for the given store
· Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
· Temperature - Temperature on the day of sale
· Fuel_Price - Cost of fuel in the region
· CPI – Prevailing consumer price index
· Unemployment - Prevailing unemployment rate
Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13
Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13
Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13
Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Basic Statistics tasks
1. Which store has maximum sales
2. Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation
3. Which store/s has good quarterly growth rate in Q3’2012
4. Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together
5. Provide a monthly and semester view of sales in units and give insights
For Store 1 – Build prediction models to forecast demand
· Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.
· Change dates into days by creating new variable.
Select the model which gives best accuracy.
Natural Language ProcessingLast Updated on May 3, 2021
The problem statement is about allocation of projects using given dataset. We are provided with some requirements like project details (project name, project location and required project skills) and
candidate details (candidate id, location, candidate skills and description). From the given dataset, we have to filter the perfect candidate based on the requirements and their skills. Our work is to check whether the candidate is having required skills to do the project and also determine the evaluation status based on their location. If suppose the candidates is having required skills and match the location, the candidate is selected for that project, if does not match we reject the candidate for that project. In such case the rejected
candidates are checked with other projects. The foremost step is to clean up the data to highlight attributes.
Cleaning (or pre-processing) the data typically consists of a number of steps like remove punctuation, tokenization and remove stop words. I have taken a set of keywords which is most related to the skills that’s given in the project based on certain criteria .To describe the presence of keywords within the cleaned data we need to vectorize the data by Bag of Words. We are going to filter the candidate skills according to the current trends. Based on their number of skills known(languages) they will be prioritized. So, we want to use NLP Toolkit to arrange the candidates by their preferences. By doing this process in the given dataset, we can able to filter 50% of data. If the skills of the prioritized candidates match with same location of the project, the similarities will be calculated and the candidate is selected for that project else the candidate is rejected.
Telecom Churn PredictionLast Updated on May 3, 2021
This case requires trainees to develop a model for predicting customer churn at a fictitious wireless telecom company and use insights from the model to develop an incentive plan for enticing would-be churners to remain with company. Data for the case are available in csv format. The data are a scaled down version of the full database generously donated by an anonymous wireless telephone company. There are still 7043 customers in the database, and 20 potential predictors. Candidates can use whatever method they wish to develop their machine learning model. The data are available in one data file with 7043 rows that combines the calibration and validation customers. “calibration” database consisting of 4000 customers and a “validation” database consisting of 3043 customers. Each database contained (1) a “churn” variable signifying whether the customer had left the company two months after observation, and (2) a set of 20 potential predictor variables that could be used in a predictive churn model. Following usual model development procedures, the model would be estimated on the calibration data and tested on the validation data. This case requires both statistical analysis and creativity/judgment. I recommend you pend much time on both fine-tuning and interpreting results of your machine learning model.
Python Snake GameLast Updated on May 3, 2021
This game reminds everyone their childhood memories.
In this snake game, the player has to move the snake to the fruit in order to eat it. The score will increase once the fruit is eaten. Also, the length of the snake will increase if the snake eats the fruit. The game will get over if the snake touches itself.
The turtle and random modules are used in this game project. So as to install these libraries, simply type “pip install turtle” and “pip install random” on the command prompt.
Turtle library allows us to create pictures, diagrams in a virtual form whereas random module gives the value between the given range of it.
There are 3 functions defined in this game which is “change”, “inside function”, and “move” function. In change function, the x-axis and y-axis are defined. In inside function, the logic of the game is written and in the move function, movement to the snake is given.
There are 4 keys mentioned in the code “right, left, up, down”.
If the player presses the right key, the snake will move to right direction, If the player presses the left key , the snake will move to left direction, If the player presses the up key , the snake will move to upward direction, If the player presses the down key , the snake will move to downward direction and if the snake touches itself the game will get over.
Cluster AiLast Updated on May 3, 2021
Explore a galaxy of research papers in 3D space using a state-of-the-art machine learning model.
Search engines like Google Scholar make it easy to find research papers on a specific topic. However, it can be hard to branch out from a general position to find topics for your research that need to be specified. Wouldn’t it be great to have a tool that not only recommends you research papers, but does it in a way that makes it easy to explore other related topics and solutions to your topic?
What it does
Users will input either a text query or research paper into Cluster AI. Cluster AI uses BERT (Bidirectional Encoder Representations from Transformers), a Natural Language Processing model, in order to connect users to similar papers. Cluster AI uses the CORE Research API to fetch research articles that may be relevant, then visualizes the similarity of these papers in a 3d space. Each node represents a research paper, and the distances between the nodes show the similarity between those papers. Using this, users can visualize clusters of research papers with close connections in order to quickly find resources that pertain to their topic.
Test Cluster AI here
Note: Running on CPU based server, deploying your own Django server using instructions in the Source Code is highly recommended. Demo may have delays depending on the query and number of users at any given point. 10-100 papers, but up to 20 papers requested in the query will be optimal.
Check out the Source Code!
How we built it
We used a multitude of technologies, languages, and frameworks in order to build ClusterAI.
- BERT (Bidirectional Encoder Representations from Transformers) and MDS (Multidimensional Scaling) with PyTorch for the Machine Learning
- Python and Django for the backend
Challenges we ran into
The CORE Research API did not always provide all the necessary information that was requested. It sometimes returned papers not in English or without abstracts. We were able to solve this problem by validating the results ourselves. Getting the HTML/CSS to do exactly what we wanted gave us trouble.
Accomplishments that we're proud of
We worked with a state-of-the-art natural language processing model which successfully condensed each paper into a 3D point.
The visualization of the graph turned out great and let us see the results of the machine learning techniques we used and the similarities between large amounts of research papers.
What we learned
What's next for Cluster AI
We can add filtering to the nodes so that only nodes of a given specification are shown. We can expand Cluster AI to visualize other corpora of text, such as books, movie scripts, or news articles. Some papers are in different languages; we would like to use an API to convert the different languages into a person’s native language, so anyone will be able to read the papers.