Data visualization toolbox
Trivariate data consist of associated samples or measurements of three quantitative variables. The general task with this type of data is to determine how the variables are related. In come cases the variables are factors (independent variables) and a response (dependent variable). In other cases the variables are not functionally related, but their distributions can be related.
This chapter includes examples of analyzing several data sets:
(scattermatrix.m)
A scatterplot is an effective display for revealing a relationship between a pair of variables. A scatterplot matrix extends this concept to multiple pairs of variables. The variable names on the diagonal of the matrix label the rows and columns of the display. For example the upper left graph plots abrasion loss against hardness.
![]() |
Figure 4.1 Scatterplot matrix of the rubber data. (book 4.1) |
In Figure 4.1 abrasion loss decreases somewhat as hardness and tensile strength increase. The relationships appear to be complex.
A color scatterplot is another extension of the basic scatterplot method. Color allows encoding an additional variable in a simple plot. In particular it can be effective to use color for the response on a scatterplot of two factors.
![]() |
Figure 4.2 Color scatterplot of the rubber data. Abrasion loss is encoded in color. |
Figure 4.2 helps us see how abrasion loss depends on combinations of hardness and tensile strength. The three points which have low abrasion loss at high hardness and low tensile strength may be outliers. They certainly suggest that additional data at higher hardness would be useful.
(coplot.m)
A conditioning plot, or coplot, efficiently displays conditional dependence. The top panel is a given plot which displays the intervals of the conditioning variable. The other panels are dependence plots, each showing the relationship of the two other variables for data in the corresponding interval of the conditioning variable. Loess curves are also plotted. The dependence panels are arranged in a pattern similar to the layout of the interval rectangles on the given plot.
![]() |
Figure 4.3 Coplot of the rubber data. (book 4.3) |
In this figure abrasion loss has a hockey stick dependence on tensile strength for all hardness ranges except the highest. The three data points which cause the hook pattern in the curve for the highest hardness range are the possible outliers seen in Figure 4.2.
Sometimes the distributions of the factors in the data set cover or nearly cover a planar region. The soil data is such an example. (Figure 4.4)
![]() |
Figure 4.4 Locations for the soil data. (book ) |
In these cases it can be useful to get a preliminary visualization of the response by viewing it as an image. (Figure 4.5)
![]() |
Figure 4.5 An image of the spatial distribution of soil resistivity. |
Even without smoothing, this image shows the major regions of high and low resistivity.
Real data often require smoothing to allow interpretation. Trivariate data can be
smoothed by surface fitting -- a two dimensional version of polynomial or loess curve
fitting. If the factors do not have the same units and the loess method is used, it is
desirable to standardize the factors so that the neighborhood weights are consistent. (standardize.m)
This is
achieved by dividing the values of each factor by their trimmed sample standard deviation.
(trimmedstd.m)
Contour plots are the traditional display for three dimensional data. With black and white contour plots, some time is required to tell ridges from valleys. This difficulty can be eased by using color contours.
![]() |
Figure 4.6 Contour plot of velocity for the galaxy data. (book 4.50) |
This figure reveals low velocities in the upper left grading to high velocities in the lower right. The contour labels can make this type of display a bit cluttered. An alternative is the filled contour plot. (Figure 4.7) In this figure the shading is faceted which makes black lines along the contours.
![]() |
Figure 4.7 Filled contour plot of velocity for the galaxy data with data locations. |
Surface fitting routines are ambitious and they will happily generate fit points despite a lack of data in the neighborhood. When the data spacing is irregular, it can be helpful to plot the data locations, as in Figure 4.7, to check the evidence for the contours.
A filled contour plot with flat shading provides a more subtle display of the contours. (Figure 4.8) The simpler two hue colormap in this figure can make it easier to appreciate the order of the contours.
![]() |
Figure 4.8 Filled contour plot of the smoothed soil resistivity data. (book 4.68) |
This smoothed presentation shows the major trends more clearly than the raw data in Figure 4.5
Another familiar method for displaying three dimensional data is the mesh or wireframe plot. It is more useful for acquiring a gestalt of the relationships of the variables than for extracting any numerical information. Compare Figure 4.9 with Figure 4.6.
![]() |
Figure 4.9 Mesh plot of the galaxy data. (book 4.58) |
The surface plot is an extension of the mesh plot which can provide a stronger physical sense of the data. Fiddling with it can also sink a substantial amount of time.
![]() |
Figure 4.10 Surface plot of the soil data. (book 4.69) |
This chapter has presented several revealing methods for the display of three dimensional data on a planar two dimensional surface.
1 Introduction | 4 Trivariate Data |
2 Univariate Data | 5 Hypervariate Data |
3 Bivariate Data | 6 Multiway Data |
Send feedback to author@datatool.com | Go to Data visualization home |