## [1] 1599 12
## tibble [1,599 x 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ fixed.acidity : num [1:1599] 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num [1:1599] 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num [1:1599] 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num [1:1599] 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num [1:1599] 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num [1:1599] 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num [1:1599] 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num [1:1599] 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num [1:1599] 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num [1:1599] 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num [1:1599] 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : Ord.factor w/ 6 levels "3"<"4"<"5"<"6"<..: 3 3 3 4 3 3 3 5 5 3 ...
## - attr(*, "spec")=
## .. cols(
## .. fixed.acidity = col_double(),
## .. volatile.acidity = col_double(),
## .. citric.acid = col_double(),
## .. residual.sugar = col_double(),
## .. chlorides = col_double(),
## .. free.sulfur.dioxide = col_double(),
## .. total.sulfur.dioxide = col_double(),
## .. density = col_double(),
## .. pH = col_double(),
## .. sulphates = col_double(),
## .. alcohol = col_double(),
## .. quality = col_double()
## .. )
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide density
## Min. :0.01200 Min. : 1.00 Min. : 6.00 Min. :0.9901
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00 1st Qu.:0.9956
## Median :0.07900 Median :14.00 Median : 38.00 Median :0.9968
## Mean :0.08747 Mean :15.87 Mean : 46.47 Mean :0.9967
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00 3rd Qu.:0.9978
## Max. :0.61100 Max. :72.00 Max. :289.00 Max. :1.0037
## pH sulphates alcohol quality
## Min. :2.740 Min. :0.3300 Min. : 8.40 3: 10
## 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 4: 53
## Median :3.310 Median :0.6200 Median :10.20 5:681
## Mean :3.311 Mean :0.6581 Mean :10.42 6:638
## 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 7:199
## Max. :4.010 Max. :2.0000 Max. :14.90 8: 18
Our dataset has 12 variables with 1599 observations.
Most wines have an alcohol percentage of between 9 to 11%.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
Most cars have a pH of between 3.0 to 3.5.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
Most wines had a quality of 5 and 6.
There are 1599 observations in our dataset with 12 variables (fixed acidity, volatile acidity, citric acidity, residual sugar, chlorides, free sulphur dioxide, total sulphur dioxide, density, pH, sulphates, alcohol, quality). The categorical variable is quality.
Other observations:
Majority of wines have a density of between 0.995 and 1.000.
Most wines have a volatile acidity of less than 0.8 g/dm^3 hence they have pleasant taste.
Most wines have a small amount of salt less than 0.2 g/dm^3.
The mean alcohol percentage of the wines is 10%.
The main feature of interest in this dataset is quality which is the quality level of the wine. I want to know which features of wine influence quality and can be used to predict the quality level of a wine.
Alcohol, pH, residual sugar, chlorides, and fixed acidity are features that may influence fuel consumption of a car.
## fixed.acidity volatile.acidity citric.acid residual.sugar
## fixed.acidity 1.00 -0.26 0.67 0.11
## volatile.acidity -0.26 1.00 -0.55 0.00
## citric.acid 0.67 -0.55 1.00 0.14
## residual.sugar 0.11 0.00 0.14 1.00
## chlorides 0.09 0.06 0.20 0.06
## free.sulfur.dioxide -0.15 -0.01 -0.06 0.19
## total.sulfur.dioxide -0.11 0.08 0.04 0.20
## density 0.67 0.02 0.36 0.36
## pH -0.68 0.23 -0.54 -0.09
## sulphates 0.18 -0.26 0.31 0.01
## alcohol -0.06 -0.20 0.11 0.04
## chlorides free.sulfur.dioxide total.sulfur.dioxide density
## fixed.acidity 0.09 -0.15 -0.11 0.67
## volatile.acidity 0.06 -0.01 0.08 0.02
## citric.acid 0.20 -0.06 0.04 0.36
## residual.sugar 0.06 0.19 0.20 0.36
## chlorides 1.00 0.01 0.05 0.20
## free.sulfur.dioxide 0.01 1.00 0.67 -0.02
## total.sulfur.dioxide 0.05 0.67 1.00 0.07
## density 0.20 -0.02 0.07 1.00
## pH -0.27 0.07 -0.07 -0.34
## sulphates 0.37 0.05 0.04 0.15
## alcohol -0.22 -0.07 -0.21 -0.50
## pH sulphates alcohol
## fixed.acidity -0.68 0.18 -0.06
## volatile.acidity 0.23 -0.26 -0.20
## citric.acid -0.54 0.31 0.11
## residual.sugar -0.09 0.01 0.04
## chlorides -0.27 0.37 -0.22
## free.sulfur.dioxide 0.07 0.05 -0.07
## total.sulfur.dioxide -0.07 0.04 -0.21
## density -0.34 0.15 -0.50
## pH 1.00 -0.20 0.21
## sulphates -0.20 1.00 0.09
## alcohol 0.21 0.09 1.00
## corrplot 0.84 loaded
The negative relationship between pH and citric was interesting. Another interesting relationship was the negative relationship between citric acid and volatile acidity.
There was a strong relationships between fixed acidity and citric acid, fixed acidity and density, and free sulphur dioxide and total sulphur dioxide.
Wines with a moderate pH have a low citric acid/pH rate. Wines with quality level have a high alcohol level compared to the rest. Wines with quality level of 3 and 8 have a high residual sugar.
The distribution of residual sugar of the wines seem to skew to the right. Most wines have a residual sugar of below 4 g/dm^3.
Wines with quality level of 5 had a low median alcohol percentage. But some wines with quality level of 5 seemed to have high alcohol percentage hence appearing as outliers. Wines with quality level of 8 have a high median alcohol percentage.
Wines with quality level of 8 generally had a low pH compared to wines of other quality levels. The wines with quality level of 8 also had a wide distribution of pH compare to the other levels.
The wineQualityReds dataset contained information of about 1599 wines. The dataset comprises of information about wines and their features. I did some exploration to understand the variables in the dataset. I explored relationship between the quality of wines with other variables.
There was a good relationship between the fixed acidity and citric acid. A moderate relationship was also observed between fixed acidity and density The surprising thing was that pH and citric acid had a negative correlation despite them being closely related.
The limitation of this dataset is it had red wines observations only. The dataset did not also include other types of wines wich as white wine. Give the dataset contains data from 2009 the analysis of wines would not reflect on the wines produced from 2010 to 2021.