Determine the numeric and categorical attributes in the dataset
An attribute in a dataset is a major of two types — Numeric or Categorical.
A Numeric attribute can either be of an integer or float data type. Such type of an attribute will comprise of a range of values. To understand this take a look in the image below -
The dataset we are currently studying is Heart failure clinical records Data Set, provided by the UCI Machine Learning Repository.
So as we can see the values of the attribute “creatinine_phosphokinase”, the values of the variables are integers and there is a range of values the attribute is comprised of. Hence, “creatinine_phosphokinase” is a Numeric Attribute.
But in contrast, let us take a look at some other variables of the dataset. For example, take a look in the image(s) below -
If we take a look at the above attributes the images show — anaemia, diabetes & high_blood_pressure, these attributes comprise only a set of values, (0,1 in the above case). Such types of attributes are known as Categorical Attributes.
In most use cases of data science, categorical attributes generally signify a particular value or situation of the attribute in the data instance. In the above examples, all the attributes either carry a value of 0 or 1. Here, 0 or 1 signify that either the patient has anaemia or it does not, that either the patient has diabetes or it does not, that either the patient has high blood pressure or it does not.
Technically, 0 means False & 1 means True.
Not just this, many categorical variables may also comprise of a set of more than 2 discrete values, which may signify a certain meaning which can be inferred from the data.