Academic Project

Last Updated on May 3, 2021


This project was done in my masters.

We developed a fitness application which in android . In this application we used mysql , java in android studio and adobeXD to create prototype of our application pages.

In this application user set a goal for their desired fitness.

Application will assist user to track their calorie consumption as well as they will food suggestion to reach a daily calorie goal.

the link is provided in below.

More Details: academic project

Submitted By

Share with someone who needs it

Book Recommendation System

Last Updated on May 3, 2021


Book Recommendation System

Book recommendation is created and deployed in this approach of work, which helps in recommending books. Recommendation achieved by the users feedbacks and rating, this is the online which analyse the ratings, comments and reviews of user, negative positive nature of comments using opinion mining. User searching for the interested book will be displayed in top list and also can read feedback given by people about the book or any searched items. Whenever user search for any book from the large data available, he gets confused from the number of displayed item, which one to choose. In that case recommendation helps and displays on the interested items. This is the trustworthy approach, which is used in this project where selection is based on the dataset.


This project used clustering as the central idea. A clustering approach is used. Clustering is based on similarity where similar elements are kept in a single group. Likewise similar element, the irrelevant elements are also reside in a group, which is another group, based on similarity value or maximum size of cluster. The clustering approach which is used in our work is K-mean clustering for grouping of similar users. It is the unsupervised and simplest learning algorithm, which simplifies mining work by grouping similar elements forming cluster. This is done using a parameter called K-centroids. Distance between each element is calculated for checking the similarity and forming a single cluster to reside the similar elements, after comparing with K-centroid parameter.

In this project, 6 clusters were made.

The project is made with 2 separate datsets in .csv format taken from Kaggle.

  1. Books dataset
  2. Ratings

This project is GUI based. The output page has 2 options:

  1. Rate books
  2. Recommend books

The user can chose either according to themselves.

Rate books

In this option, the user can rate books.

Recommend books

In this option the books are recommended to the user, according to their previous readings.

More Details: Book Recommendation System

Submitted By

Spotify Data Analysis

Last Updated on May 3, 2021


Spotify Data Analysis

This project was made by using Tableau Software. Tableau is an interactive visualization software. A lot of functions can be performed by using this software. Many charts can be drawn by using single or multiple attributes. Colours can be added to show variation in the charts or to show the intensity of a particular attribute. Charts/graphs that can be made are:

1.     Pie chart

2.     Bar graph

3.     Line graph

4.     Waterfall chart

In my project, I had used a dataset from Kaggle. The dataset was about the details of songs from Spotify app. The dataset had 119 different attributes out of which 2 were in string format and the rest were in numerical. A few attributes were:

1.     Song name

2.     Artist name

3.     Danceability

4.     Loudness

5.     Liveliness

6.     Speechiness

7.     Tempo

From theses 19 attributes I had made a total of 13 visualizations based on different factors, and had assembled them in 6 dashboards.

Dashboard 1:

It gives the analysis of the danceability. It shows 2 analysis:

1.     Artists who provide most danceability

It is a bar graph with danceability in the y-axis. It shows that the artist named Katy Perry had most danceability in her songs.

2.     Artists in top 10 with the most danceability

It is a bar graph, which dims its colour as the bar’s size decreases.

Dashboard 2:

It gives the analysis of the genre of songs. It shows 2 analysis:

1.     How the proposition of genres has changed in 10 years

Canadian pop was famous in 2009 as well as in 2020. While Detroit hip hop is not as famous now.

2.     Least famous artists and the genre of their songs

It is a point chart which shows which artist makes songs in which genre

Dashboard 3:

It gives the analysis of the popularity. It shows 2 analysis:

1.     Most popular artists and their popularity

It shows how the popularity of the artists have changed over the years.

2.     Most popular artists and their song’s popularity

It shows that the artist Sara Barailles has the most popularity with 71 average popularity

Dashboard 4:

It gives the analysis of the positivity. It shows 2 analysis:

1.     Loudness vs energy with respect to positivity

A colour changing bar graph which dims as the value decreases.

2.     Artist with most popularity

A bar graph showing artist Katy Perry with most positive songs

Dashboard 5:

It shows 2 analysis:

1.     Song names that start with question related phrases

Such songs had a popularity index of only 1055

2.     Change in speechiness vs beats

A bar graph that shows the change of speechiness vs beats over the years

Dashboard 6:

It gives the analysis of the most popular artist Katy Perry. It shows 3 analysis:

1.     Songs sung over the years

It is in tabular format with 2 columns

2.     Popularity of songs

It shows how much her songs have been popular over the years

3.     Popularity and number of times her songs appeared in top 10

It shows her most popular and hit songs popularity index

More Details: Spotify Data Analysis

Submitted By

Iris Flower Prediction

Last Updated on May 3, 2021


Understanding the scenario

Let’s assume that a hobby botanist is interested in distinguishing the species of some iris flowers that she has found. She has collected some measurements associated with each iris, which are:

  • the length and width of the petals
  • the length and width of the sepals, all measured in centimetres.

She also has the measurements of some irises that have been previously identified by an expert botanist as belonging to the species setosa, versicolor, or virginica. For these measurements, she can be certain of which species each iris belongs to. We will consider that these are the only species our botanist will encounter.

The goal is to create a machine learning model that can learn from the measurements of these irises whose species are already known, so that we can predict the species for the new irises that she has found.

Modules imported

  • SkLearn is a pack of Python modules built for data science applications (which includes machine learning). Here, we’ll be using three particular modules:
  • load_iris: The classic dataset for the iris classification problem. (NumPy array)
  • train_test_split: method for splitting our dataset.
  • KNeighborsClassifier: method for classifying using the K-Nearest Neighbor approach.
  • NumPy is a Python library that makes it easier to work with N-dimensional arrays and has a large collection of mathematical functions at its disposal. It’s’ base data type is the “numpy.ndarray”.

Building our model

As we have measurements for which we know the correct species of iris, this is a supervised learning problem. We want to predict one of several options (the species of iris), making it an example of a classification problem. The possible outputs (different species of irises) are called classes. Every iris in the dataset belongs to one of three classes considered in the model, so this problem is a three-class classification problem. The desired output for a single data point (an iris) is the species of the flower considering it’s features. For a particular data point, the class / species it belongs to is called its label.

As already stated, we will use the Iris Dataset already included in scikit-learn.

Now, let’s print some interesting data about our dataset:

ACCURACY we get an accuracy of 93%

OUTPUT IN THIS CASE    as we have 2 samples [[3,5,4,2], [2,3,5,4]]

so the iris type predicted by our model based on the given features are

predictions:  ['versicolor', 'virginica']

for more details this is my Github repository

ml-2/iris_flower.ipynb at main · THC1111/ml-2 (

More Details: Iris Flower Prediction

Submitted By

Worklet Allocation System

Last Updated on May 3, 2021


Innovation and Technology have granted numerous opportunities for people around the world who are in need of employment. It has created new marketplaces that offer stable economic benefits which were never thought of before. However, in this modern society, with a plethora of media & mass communication approaches, people offering domestic services still struggle to find jobs on their own and most of them end up joining agencies which take away a significant portion of their income. Services such as home repairs, beauty, and cleaning can be provided at much cheaper rates if the workers are approached directly without any inefficient middlemen.

A web-based home services marketplace is a more convenient and efficient way for people to locate, hire, and provide feedback about nearby domestic employees who are willing to provide their services as per the customer’s requirement. Our proposed system aims to hire skilled workers and connect them to the right clients based on locational proximity. India has a huge demand for these kinds of services and a platform such as this can be used to cater to them.

The aim of this project: to provide a worklet-servicing application, capable of managing its workers employed in a variety of fields as well as its clientele who enlist the services on a day-to-day basis. Our Algorithm aims to match the client to the best service professionals as per their need that is closest to their location in a shorter time period with the help of effective allocation algorithms such as the Shortest Job First and the Banker’s Algorithm. 

Project is built in NodeJS,MongoDB as well as presently algorithm runs in python which needs to called as an API in future enhancements

In this application there are three interfaces Admin,Customer,Client

Client is one who needs the services on a daily basis, Customer is one who needs the services for some period of time in a day.

Admin has the access to the workers data and live tracking of their location where they are working.

To decide for how many hours the service is required we made a questionaire through which a rough estimation of time can be done to allocate the workers.

Future enhancements of the project are -

  • We intend to add the feature where a worker can give their attendance for the day right from within the mobile application and possibly add a chat feature in order to let them communicate with the consumer. 
More Details: Worklet Allocation System

Submitted By

Dimensionality Reduction

Last Updated on May 3, 2021


What is dimensionality reduction?

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension.

Here are some of the benefits of applying dimensionality reduction to a dataset: Space required to store the data is reduced as the number of dimensions comes down. Less dimensions lead to less computation/training time. Some algorithms do not perform well when we have a large dimensions.

Dimensionality reduction refers to techniques for reducing the number of input variables in training data. When dealing with high dimensional data, it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the “essence” of the data.


Principal Component Analysis (PCA)

PCA is a technique from linear algebra that can be used to automatically perform dimensionality reduction.

Linear Discriminent Analysis (LDA)

Linear Discriminant Analysis, or LDA for short, is a predictive modeling algorithm for multi-class classification. It can also be used as a dimensionality reduction technique, providing a projection of a training dataset that best separates the examples by their assigned class.

Kernel – PCA 

PCA linearly transforms the original inputs into new uncorrelated features. KPCA is a nonlinear PCA. As the name suggests Kernal trick is used to make KPCA nonlinear.

Problem description:

the dataset is taken from UCI ML repository. dataset is of wine where each row is of different wine with 10 different features: Alcohol, Malic_Acid, Ash, sh_Alcanity, Magnesium, Total_Phenols, Flavanoids, Nonflavanoid_Phenols, Proanthocyanins, Color_Intensity, Hue, OD280, Proline, Customer_Segment.

It is a business case study ,I have to apply clustering to identify diverse segments of customers grouped by their taste of similar wine preferences where there are 3 categories . now for the owner of this wine shop I have to build a predictive model that will be trained on this data so that for each new wine that the owner has in his shop we can deploy the predictive model applied to reduced dimensionality reduction ,then predict which customer segment does this new wine belongs to . so that finally we can recommend the right wine for the right customer to optimise the sales n profit.