# Data Analysis

Last Updated on May 3, 2021

Here in this project I had Analysed data of popular E-commerce website Jiomart and Grofers .

I had the scrapped data of various similar products from Jiomart and Grofers. The scraping was done using Beautiful soup and Selenium of Python .

The scraped data contained various Info like Item, Quantity , Price.

So, here I analysed the data , did some preprocessing and removed items which were not required .

So, my main motive was to predict the item which has the lowest price between the two ( Jiomart and Grofers)

By this way it would be easy for customers to buy items where they find a low price.

The libraries used were numpy and Pandas.

I analysed the data and created a dataframe using pandas .

I've also used lambda function to find subtring of the input product or quantity from the scraped data.

After finding the input substring of the same product from both the website , we compared their corresponding Price .

The product that has the lowest price among both the website along with its Price was displayed.

A dataframe was constructed using headers Website, Item , Quantity and price .

The language used was complete Python and the code is freely shared on Github Platform.

More Details: Data Analysis

# Iris Flower Prediction

Last Updated on May 3, 2021

# Understanding the scenario

Let’s assume that a hobby botanist is interested in distinguishing the species of some iris flowers that she has found. She has collected some measurements associated with each iris, which are:

• the length and width of the petals
• the length and width of the sepals, all measured in centimetres.

She also has the measurements of some irises that have been previously identified by an expert botanist as belonging to the species setosa, versicolor, or virginica. For these measurements, she can be certain of which species each iris belongs to. We will consider that these are the only species our botanist will encounter.

The goal is to create a machine learning model that can learn from the measurements of these irises whose species are already known, so that we can predict the species for the new irises that she has found.

## Modules imported

• SkLearn is a pack of Python modules built for data science applications (which includes machine learning). Here, we’ll be using three particular modules:
• load_iris: The classic dataset for the iris classification problem. (NumPy array)
• train_test_split: method for splitting our dataset.
• KNeighborsClassifier: method for classifying using the K-Nearest Neighbor approach.
• NumPy is a Python library that makes it easier to work with N-dimensional arrays and has a large collection of mathematical functions at its disposal. It’s’ base data type is the “numpy.ndarray”.

## Building our model

As we have measurements for which we know the correct species of iris, this is a supervised learning problem. We want to predict one of several options (the species of iris), making it an example of a classification problem. The possible outputs (different species of irises) are called classes. Every iris in the dataset belongs to one of three classes considered in the model, so this problem is a three-class classification problem. The desired output for a single data point (an iris) is the species of the flower considering it’s features. For a particular data point, the class / species it belongs to is called its label.

As already stated, we will use the Iris Dataset already included in scikit-learn.

Now, let’s print some interesting data about our dataset:

ACCURACY we get an accuracy of 93%

```OUTPUT IN THIS CASE    as we have 2 samples [[3,5,4,2], [2,3,5,4]]
```

so the iris type predicted by our model based on the given features are

```predictions:  ['versicolor', 'virginica']
```

for more details this is my Github repository

ml-2/iris_flower.ipynb at main · THC1111/ml-2 (github.com)

More Details: Iris Flower Prediction

# Vaccine Prediction

Last Updated on May 3, 2021

Can you predict whether people got H1N1 and seasonal flu vaccines using information they shared about their backgrounds, opinions, and health behaviors?

In this challenge, we will take a look at vaccination, a key public health measure used to fight infectious diseases. Vaccines provide immunization for individuals, and enough immunization in a community can further reduce the spread of diseases through "herd immunity."

As of the launch of this competition, vaccines for the COVID-19 virus are still under development and not yet available. The competition will instead revisit the public health response to a different recent major respiratory disease pandemic. Beginning in spring 2009, a pandemic caused by the H1N1 influenza virus, colloquially named "swine flu," swept across the world. Researchers estimate that in the first year, it was responsible for between 151,000 to 575,000 deaths globally.

A vaccine for the H1N1 flu virus became publicly available in October 2009. In late 2009 and early 2010, the United States conducted the National 2009 H1N1 Flu Survey. This phone survey asked respondents whether they had received the H1N1 and seasonal flu vaccines, in conjunction with questions about themselves. These additional questions covered their social, economic, and demographic background, opinions on risks of illness and vaccine effectiveness, and behaviors towards mitigating transmission. A better understanding of how these characteristics are associated with personal vaccination patterns can provide guidance for future public health efforts.

I have created two model, one for H1N1 and another for Seasonal Vaccine.

More Details: Vaccine prediction

# Loan Prediction

Last Updated on May 3, 2021

A Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have provided a dataset to identify the customers segments that are eligible for loan amount so that they can specifically target these customers. So in this project our main objective is to predict whether a individual is eligible for loan or not based on given dataset.

For simplicity i divided my projects into small parts-

1. Data Collection :- I collected data from 'Anylitical Vidhya' as a CSV file. We have two CSV file one is train data which is used for training the data and other is test data which is used for prediction based on training of model.
2. Import Libraries:- I import differnt Sklearn package for algorithm and different tasks.
4. Data Preprocessing -: In this part i first found missing values then i remove a column or imputed some value (mean, mode, median) According to the amount of data missing for a particular column.

I checked the unique value in each column. Then i did label encoding to convert all string types data to integer value. I used dummie function to convert each unique value to different columns . I find out correlation matrix which shows the correlation between columns to each other.

Then i split the data. I did analysis on each column and row of dataset.

Here i selected a classifier algorithm because it is a classification problem i.e. in this problem target value is of categorial datatype.

Then i create a model . I trained that model using Logistic regression Algorithm , which is a classification algorithm. I feed training dataset to model using Logistic regression algorithm. After creating model i did similiar data preprocessing to test dataset . And then i feed test dataset to trained model which predict the values of this test dataset. And then i found accuracy of this model using actual target value which is given in training dataset. and predict target value which we predict from test dataset.

After this i used another algorithm which is random forest classifier. i did traied the model using random forest classifier and then calculate the accuracy.

I compared the accuracy of both algorithm and i preffered algorithm which had better accuracy.

In this project i got 78.03% accuracy when i create model using random forest classifier and got 80.06% when i create model using logistic regression.

More Details: Loan prediction

# Social Distance Monitoring System(Python, Deep Learning And Opencv)(Research Paper)

Last Updated on May 3, 2021

Social distancing is one of the community mitigation measures that may be recommended during Covid-19 pandemics. Social distancing can reduce virus transmission by increasing physical distance or reducing frequency of congregation in socially dense community settings, such as ATM,Airport Or market place .

Covid-19 pandemics have demonstrated that we cannot expect to contain geographically the next influenza pandemic in the location it emerges, nor can we expect to prevent international spread of infection for more than a short period. Vaccines are not expected to be available during the early stage of the next pandemic (1), a Therefore, we came up with this system to limit the spread of COVID via ensuring social distancing among people. It will use cctv camera feed to identify social distancing violations

We are first going to apply object detection using a YOLOv3 model trained on a coco dataset that has 80 classes. YOLO uses darknet frameworks to process incoming feed frame by frame. It returns the detections with their IDs, centroids, corner coordinates and the confidences in the form of multidimensional ndarrays. We receive that information and remove the IDs that are not a “person”. We will draw bounding boxes to highlight the detections in frames. Then we use centroids to calculate the euclidean distance between people in pixels. Then we will check if the distance between two centroids is less than the configured value then the system will throw an alert with a beeping sound and will turn the bounding boxes of violators to red.

More Details: Social Distance Monitoring System(Python, Deep Learning And OpenCV)(Research paper)

# Hotel Management System Using Python

Last Updated on May 3, 2021