Wafer Fault Detection

Last Updated on May 3, 2021

About

Problem Statement

To build a classification methodology to predict the quality of wafer sensors based on the given training data.


Data Description

  • The client will send data in multiple sets of files in batches at a given location. Data will contain Wafer names and 590 columns of different sensor values for each wafer. The last column will have the "Good/Bad" value for each wafer.
  • "Good/Bad" column will have two unique values +1 and -1.
  • "+1" represents Bad wafer.
  • "-1" represents Good Wafer.
  • Apart from training files, we also require a "schema" file which contains all the relevant information about the training files such as:
  • Name of the files, Length of Date value in FileName, Length of Time value in FileName, Number of Columns, Name of the Columns, and their datatype.


business use and Work flow

It can automate the work of checking for faulty wafers among the batch of wafers which saves time and working resources.


 Approach: Exported Data from Database and converted to CSV file, Performed data preprocessing, data clustering. After creating clustering found best model Random Forest and XGBoost and selected Random Forest which has AUC Score of 0.92 percentage.

More Details: Wafer fault detection

Submitted By


Share with someone who needs it

Smart Glasses For Visually Impaired

Last Updated on May 3, 2021

About


This is our Second year Hardware & Software Tools project . We wanted to invent 

 

something that would benefit handicapped people in some way. He came up with this idea 

 

for glasses that could help blind people sense if there was an object in front of them that 

 

they might hit their head on. The white cane that they use when walking is used for helping 

 

them navigate the ground but does not do much for up above. Using an Arduino Pro Mini

 

MCU, Ultrasonic Sensor, and a buzzer, we created these glasses that will sense the distance 

 

of an object in front and beep to alert the person that something is in front of them. Simple

 

and inexpensive to make. Credit to http://robu.in for some of the parts.

 

These “Smart Glasses” are designed to help the blind people to read and translate the typed text 

which is written in the English language. These kinds of inventions consider a solution to 

motivate blind students to complete their education despite all their difficulties. Its main 

objective is to develop a new way of reading texts for blind people and facilitate their 

communication. The first task of the glasses is to scan any text image and convert it into audio 

text, so the person will listen to the audio through a headphone that’s connected to the glasses. 

The second task is to translate the whole text or some words of it by pressing a button that is 

connected to the glasses as well. The glasses used many technologies to perform its tasks which 

are OCR, (gTTS) and Google translation. Detecting the text in the image was done using the 

OpenCV and Optical Character Recognition technology (OCR) with Tesseract and Efficient and 

Accurate Scene Text Detector (EAST). In order to convert the text into speech, it used Text to 

Speech technology (gTTS). For translating the text, the glasses used Google translation API. The 

glasses are provided by Ultrasonic sensor which is used to measure the required distance 

between the user and the object that has an image to be able to take a clear picture. The picture 

will be taken when the user presses the button. Moreover, the motion sensor was used to 

introduce the user to the university’s halls, classes and labs locations using Radio-frequency 

identification (RFID) reader. All the computing and processing operations were done using the 

Raspberry Pi 3 B+ and Raspberry pi 3 B. For the result, the combination of using OCR with

EAST detector provide really high accuracy which showed the ability of the glasses to recognize 

almost 99% of the text. However, the glasses have some drawbacks such as: supporting only the 

English language and the maximum distance of capturing the images is between 40-150 cm. As a 

future plan, it is possible to support many languages and enhance the design to make it smaller 

and more comfortable to wear.

 

More Details: Smart Glasses for visually impaired

Submitted By


Identifying Water Sources For Smallholder Farmers With Agri

Last Updated on May 3, 2021

About

CIAT and The Zamorano Pan-American Agricultural School, in coordination with the United States Agency for International Development (USAID)/Honduras, began in March the validation and dissemination process of the geographic information system (GIS) tool AGRI (Water for Irrigation, by its Spanish acronym).

What is AGRI?

AGRI was developed in ArcGIS 10.1® for western Honduras with the aim of providing support for decision making in identifying suitable water sources for small drip irrigation systems. These systems cover areas of up to 10 hectares and are part of the U.S. government initiative Feed the Future in six departments of western Honduras (Santa Bárbara, Copán, Ocotepeque, Lempira, Intibucá, and La Paz).

AGRI identifies surface-water sources and sites suitable for rainwater harvesting for agriculture. In addition, AGRI maps the best routes for installing water pipes between the first parcel of the irrigation system and the identified water source. The tool is complemented by deforestation analyses of upstream areas, as an indicator of watershed conservation status.

How was AGRI developed?

Developing this tool required the implementation of a complex framework of spatial analysis that included correcting the terrain Digital Elevation Model (DEM), using weather information derived from remote sensors, hydrological analysis such as estimation of runoff and water balance, and modeling the path with lower costs or fewer difficulties in installing pipes across the landscape. Additionally, it was necessary to do digital soil mapping for some variables.

What does AGRI offer to its users?

AGRI was developed based on the following needs identified by USAID-Honduras in relation to the implementation of small irrigation systems in the country:

  1. To find the closest water source that permits transportation of the water by gravity to parcels.
  2. To search for “permanent and sufficient” water sources to establish water outlets.
  3. To find suitable sites for building reservoirs for the harvest of runoff water.
  4. To take into account the protection of water sources for human consumption and other protected zones and avoid possible conflicts on water use.
  5. The tool needs to be easy to use for technicians and agronomists.
  6. The tool should use information that is readily available in the country.

This application was developed at the request of USAID-Honduras and it responds to the implementation needs of its programs. This implementation was led by the Decision and Policy Analysis (DAPA) area of CIAT with the participation of the soil area, which contributed with the digital soil mapping for the project. Likewise, Zamorano University supported the field validation and the analysis of the legal context related to water use, which serves as a basis for the application of this tool.

More Details: Identifying water sources for smallholder farmers with AGRI

Submitted By


Covid Tracket On Twitter Using Data Science And Ai

Last Updated on May 3, 2021

About

Introduction

Hi folks, I hope you are doing well in these difficult times! We all are going through the unprecedented time of the Corona Virus pandemic. Some people lost their lives, but many of us successfully defeated this new strain i.e. Covid-19. The virus was declared a pandemic by World Health Organization on 11th March 2020. This article will analyze various types of “Tweets” gathered during pandemic times. The study can be helpful for different stakeholders.

For example, Government can make use of this information in policymaking as they can able to know how people are reacting to this new strain, what all challenges they are facing such as food scarcity, panic attacks, etc. Various profit organizations can make a profit by analyzing various sentiments as one of the tweets telling us about the scarcity of masks and toilet papers. These organizations can able to start the production of essential items thereby can make profits. Various NGOs can decide their strategy of how to rehabilitate people by using pertinent facts and information.

In this project, we are going to predict the Sentiments of COVID-19 tweets. The data gathered from the Tweeter and I’m going to use Python environment to implement this project.

 

Problem Statement

The given challenge is to build a classification model to predict the sentiment of Covid-19 tweets. The tweets have been pulled from Twitter and manual tagging has been done. We are given information like Location, Tweet At, Original Tweet, and Sentiment.

Approach To Analyze Various Sentiments

Before we proceed further, One should know what is mean by Sentiment Analysis. Sentiment Analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic is Positive, Negative, or Neutral. (Oxford Dictionary)

Following is the Standard Operating Procedure to tackle the Sentiment Analysis kind of project. We will be going through this procedure to predict what we supposed to predict!

  1. Exploratory Data Analysis.

  2. Data Preprocessing.

  3. Vectorization.

  4. Classification Models.

  5. Evaluation.

  6. Conclusion.

Let’s Guess some tweets

I will read the tweet and can you tell me the sentiment of that tweet whether it is Positive, Negative, Or Neutral. So the first tweet is “Still shocked by the number of #Toronto supermarket employees working without some sort of mask. We all know by now, employees can be asymptomatic while spreading #coronavirus”. What’s your guess? Yeah, you are correct. This is a Negative tweet because it contains negative words like “shocked”.

If you can’t able to guess the above tweet, don’t worry I have another tweet for you. Let’s guess this tweet-“Due to the Covid-19 situation, we have increased demand for all food products. The wait time may be longer for all online orders, particularly beef share and freezer packs. We thank you for your patience during this time”. This time you are absolutely correct in predicting this tweet as “Positive”. The words like “thank you”, “increased demand” are optimistic in nature hence these words categorized the tweet into positive.

Data Summary

The original dataset has 6 columns and 41157 rows. In order to analyze various sentiments, We require just two columns named Original Tweet and Sentiment. There are five types of sentiments- Extremely Negative, Negative, Neutral, Positive, and Extremely Positive as you can see in the following picture.

Summary Of Dataset

 

Basic Exploratory Data Analysis

The columns such as “UserName” and “ScreenName” do not give any meaningful insights for our analysis. Hence we are not using these features for model building. All the tweets data collected from the months of March and April 2020. The following Bar plot shows us the number of unique values in each column.

There are some null values in the location column but we don’t need to deal with them as we are just going to use two columns i.e. “Sentiment” and “Original Tweet”. Maximum tweets came from London(11.7%) location as evident from the following figure.

There are some words like ‘coronavirus’, ‘grocery store’, having the maximum frequency in our dataset. We can see it from the following word cloud. There are various #hashtags in the tweets column. But they are almost the same in all sentiments hence they are not giving us meaningful full information.

World Cloud showing the words having a maximum frequency in our Tweet column

When we try to explore the ‘Sentiment’ column, we came to know that most of the peoples are having positive sentiments about various issues shows us their optimism during pandemic times. Very few people are having extremely negatives thoughts about Covid-19.

 

Data Pre-processing

The preprocessing of the text data is an essential step as it makes the raw text ready for mining. The objective of this step is to clean noise those are less relevant to find the sentiment of tweets such as punctuation(.,?,” etc.), special characters(@,%,&,$, etc.), numbers(1,2,3, etc.), tweeter handle, links(HTTPS: / HTTP:)and terms which don’t carry much weightage in context to the text.

Also, we need to remove stop words from tweets. Stop words are those words in natural language that have very little meaning, such as “is”, “an”, “the”, etc. To remove stop words from a sentence, you can divide your text into words and then remove the word if it exists in the list of stop words provided by NLTK.

Then we need to normalize tweets by using Stemming or Lemmatization. “Stemming” is a rule-based process of stripping the suffixes (“ing”, “ly”, “es”, “ed”, “s” etc) from a word. For example — “play”, “player”, “played”, “plays” and “playing” are the different variations of the word — “play”.

Stemming will not convert original words into meaningful words. As you can see “considered” gets stemmed into “condit” which does not have meaning and a spelling mistake too. The better way is to use Lemmatization instead of stemming process.

Lemmatization is a more powerful operation, and it takes into consideration the morphological analysis of the words. It returns the lemma which is the base form of all its inflectional forms.

 

Here in the Lemmatization process, we are converting the word “raising” to its basic form “raise”. We also need to convert all tweets into the lower case before we do the normalization process.

We can include the process of tokenization. In tokenization, we convert a group of sentences into tokens. It is also called text segmentation or lexical analysis. It is basically splitting data into a small chunk of words. Tokenization in python can be done by the python NLTK library’s word_tokenize() function.

Vectorization

We can use a count vectorizer or a TF-IDF vectorizer. Count Vectorizer will create a sparse matrix of all words and the number of times they are present in a document.

TFIDF, short for term frequency-inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. The TF–IDF value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general. (wiki)

Building Classification Models

The given problem is Ordinal Multiclass classification. There are five types of sentiments so we have to train our models so that they can give us the correct label for the test dataset. I am going to built different models like Naive Bayes, Logistic Regression, Random Forest, XGBoost, Support Vector Machines, CatBoost, and Stochastic Gradient Descent.

I have used the given problem of Multiclass Classification that is dependent variable has the values -Positive, Extremely Positive, Neutral, Negative, Extremely Negative. I also convert this problem into binary classification i.e. I clubbed all tweets into just two types Positive and Negative. You can also go for three-class classification i.e. Positive, Negative and Neutral in order to achieve greater accuracy. In the evaluation phase, we will be comparing the results of these algorithms.

Feature Importance

The feature importance (variable importance) describes which features are relevant. It can help with a better understanding of the solved problem and sometimes lead to model improvements by employing feature selection. The top three important feature words are panic, crisis, and scam as we can see from the following graph.

Conclusion

In this way, we can explore more from various textual data and tweets. Our models will try to predict the various sentiments correctly. I have used various models for training our dataset but some models show greater accuracy while some do not. For multiclass classification, the best model for this dataset would be CatBoost. For binary classification, the best model for this dataset would be Stochastic Gradient Descent.

More Details: Covid Tracket on Twitter using Data Science and AI

Submitted By


Design And Analysis Of Automobile Chasis

Last Updated on May 3, 2021

About

Completed under the guidance of Dr. Shailesh Ganpule, Department of Mechanical and Industrial Engineering during August 2019 to November 2019. The objective of this design analysis is to find out the best material and most suitable cross-section for a common “Goods Carrier Truck” ladder chassis with the constraints of maximum shear stress, equivalent stress and deflection of the chassis under maximum load condition. In present the Ladder chassis which are used for making buses and trucks are C and I cross section type, but here we also analysed the Box type and Tube Type. In Trucks generally heavy amounts of loads are carried due to which there are always possibilities of being failure/fracture in the chassis/frame. Therefore Chassis with high strength cross section is needed to minimize the failures including factor of safety in design. The different vehicle chassis have been modeled by considering three different cross-sections namely C, I , Rectangular Box (Hollow) and Tubular type cross sections. The problem to be dealt with for this dissertation work is to Design and Analyze using suitable CAD software and Ansys 19.2  for ladder chassis. The report is the work performed towards the optimization of the Truck chassis with constraints of stiffness and strength. The modeling is done using Solid works, and analysis is done using Ansys 19.2 .. The overhangs of the chassis are calculated for the stresses and deflections analytically are compared with the results obtained with the analysis software. Involved in designing of Heavy Loaded Vehicle chassis in SolidWorks with stress simulation and strain analysis in Ansys. Carried out Failure Analysis using Von Mises Criterion to obtain their sustainability. Performed Convergence Analysis to select the most optimized model with the desired factor of safety. Compared software(practical) value obtained with theoretical value obtained.

More Details: DESIGN AND ANALYSIS OF AUTOMOBILE CHASIS

Submitted By


Salary Predictor

Last Updated on May 3, 2021

About

This is a web app created using open source python library called Streamlit. This library is mainly used to create web apps for machine learning and data science. In this Project I collected data required from the

Kaggle. I Used Sklearn library to get the model required for the data and I fitted the data using in-built methods in it. So I created a web app which contain two pages named Home and Prediction. In home page I displayed the data collected and a scatter graph plotted using the matplotlib library with the help of data collected from Kaggle. In prediction page there will be a text filed where we can enter the experience of the employee and click the button which ultimately shows the precited salary for that employee. Stream lit Web app gives the output of a local host URL. So we have to deploy it globally. So I deployed the web app in Heroku platform. Here in this project I just downloaded a small data set to test how it works. So here a large data set can also be taken but the process will be different in training the model. For large datasets the data should be split to train and testing data so that we can train the model accurately and advanced algorithms to train the model is also used. So based on our convenience and requirements we can do machine learning models and save it into a file and this file can be used while creating a web app.

More Details: Salary Predictor

Submitted By