Data visualization toolbox

5 Hypervariate Data

Hypervariate data consist of associated samples or measurements of more than three quantitative variables. The general task with this type of data is to determine how the variables are related. In come cases the variables are factors (independent) and a response (dependent). In other cases the variables are not functionally related, but their distributions can be related.

As the number of variables in a data set increases, it becomes increasingly likely that some of the factors have no significant effect on the response and that some of the factors are not independent.

Data sets

This chapter includes examples of analyzing several data sets:

Environmental
Measurements of wind speed, temperature, solar radiation and ozone concentration from an air pollution study.  The goal of the analysis is to determine how ozone depends on the other variables.
Iris
Measurements of sepal length and width and petal length and width for a collection of 150 irises. The analysis goal is to determine a classification rule for iris variety based on the sepal and petal lengths and widths. These data have been analyzed many times to illustrate analysis methods. Visualization reveals important insights that others missed.

 

Scatterplots  (scattermatrix.m)

The same scatterplot matrix which is used for trivariate data also extends to more variables. It provides convenient visualization of the relationships of pairs of variables.

fig_5_1.gif (10948 bytes) Figure 5.1 Scatterplot matrix of the environmental data. (book 5.1)


Figure 5.1 indicates that ozone concentration generally increases with temperature, decreases with windspeed, and has a nonmonotonic change with radiation.

Color scatterplots  (scattermatrixc.m)

The color scatterplot can also be used in a matrix of panels. This presentation allows us to see the effect of combinations of factors. In Figure 5.2 the combination of temperature and wind speed has a strong effect on ozone. The complex relationship of ozone and radiation appears to be due to interaction with the other factors. Figure 5.2 also reveals a probable outlier at Wind Speed 20 and Solar Radiation 280, which has not previously been noted.

alt_5a.gif (11204 bytes) Figure 5.2  Color scatterplot matrix of  the environmental data. The ozone concentration is encoded in color. Darker colors are higher concentrations.


Data sets such as this, with three factors and a response, are also nicely visualized using three axis color scatterplots. (Figure 5.2) This presentation allows us to see where the data are available in factor space as well as the response variation with all three factors. A real time display with interactive rotation is especially helpful.

alt_5b.gif (9881 bytes) Figure 5.3 Color three axis scatterplot of cube root ozone concentration.

 

 

 

 

 

 

 

 

 

 

 


Sometimes the dimensions of a data set can be reduced by combining variables. A scatterplot matrix of the iris data revealed that the varieties are well separated by petal length and width. The relevant panel is shown in Figure 5.4. Note that both color and symbol differ by variety. This presentation redundancy makes it easier to see the distinctions.

alt_5c.gif (4162 bytes) Figure 5.4 Iris variety by petal length and width.



The relations in Figure 5.4 suggest that petal area would be a good basis for variety classification. This observation is used in Figure 5.5. Elongation (length/width) is used as the other axis to separate the points.

fig_5_21.gif (4686 bytes) Figure  5.5 Iris variety by petal area and elongation.   (book 5.21)


Surprisingly this simple classification based on petal area was not found by any of the many previous numerical analyses of the data.

 

1 Introduction 4 Trivariate Data
2 Univariate Data 5 Hypervariate Data
3 Bivariate Data 6 Multiway Data

 

Send feedback to author@datatool.com

Go to Data visualization home