The RadialVisGadgets package provides interactive Shiny gadgets for interactive radial visualizations. By interacting with the gadgets, Exploratory Data Analysis can be performed. The gadgets can be used at any time during the analysis. They allow the exploration of the underlying nature of the data in tasks related to cluster analysis, outlier detection, and exploratory data analysis, e.g., by investigating the effect of specific dimensions on the separation of the data.
Star Coordinate’s (SC) goal is to generate a configuration of the dimensional vectors which reveals the underlying nature of the data. Let’s look at the well known Iris dataset [1].
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
One can observe four numerical attributes and one factor. The traditional Star Coordinates approach is defined for numerical attributes only. Therefore, as default we set attempt the conversion of all factors to numerical attributes. This can be disabled with numericRepresentation = FALSE to be described below.
Following the traditional approach [2] the five attributes are placed at equal angle steps from each other.
You can move your move towards the endings of the dimensional vectors. The circle at the end will be highlighted. As you can see in the figure below.
You can move these axes in order to create a configuration that you believe suitable and brush a selection of points.
Orthographic Star Coordinates are supported by the Star Coordinates by adding the approach=“OSC” parameter. The axes are reconditioned with every movement as described by Lehmann & Theisel [3]. The interaction is kept the same as before. With this approach, the dimensional vectors are constrained under conditions described in [3].
The traditional approach [2] was defined for numerical attributes only. However [4] extended the approach to mixed datasets. The axis for the factor dimensions are divided according to the frequency of each categorical value within the categorical dimension. Given that the 3 species labels are uniformly distributed, 2 ticks appear separating the 3 blocks for each categorical dimension.
By clicking at the axis, you can activate it. The categorical value blocks are now visible on the selected factor.
By double-clicking on a categorical block, the value the block represents is highlighted. If another categorical block is selected by double-clicking, then those two blocks will swap with each other. Allowing to shift categorical values in one dimension. You can disable a categorical selection by double clicking a second time in the same categorical block.
By sending a factor dimension name in colorVar, the analysis can be
performed on labeled data.
The points are then coloured according to the selected dimension. The
“Standard” and “OSC” approach are avaible for both analysis.
Hints are used to describe possible movements if a label and a function is provided. A button named Hint will appear. An increase in the evaluation of the function defines an increase in projection quality i.e. larger values are better. Details on the hints usage are defined in [4]. The thickness of the segments represent an increase in quality. In the figure below, it would imply that interacting with Petal.Width by moving it down will result an increase in quality. The absolute maximum increase in quality is shown in the Hint Button, allowing for early termination. The hints are computed on-demand only and are based on the current vector configuration. Once a movement is performed, the hints will disappear.
library(clValid)
func <- function(points, labels){ dunn(Data=points, clusters=labels)}
StarCoordinates(iris, colorVar="Species", clusterFunc = func)
RadViz’s goal is to generate a configuration which reveals the
underlying nature of the data for cluster analysis, outlier detection,
and exploratory data analysis, e.g., by investigating the effect of
specific dimensions on the separation of the data.
Each dimension is assigned to a point known as dimensional anchors
across a unit-circle. Each sample is projected according to the relative
attraction to each of the anchors. We will follow with the iris
dataset.
RadViz is not defined for non-numerical dimensions and given it’s non-linear behavior for the projection generation it would be “even more” misleading to convert the factors to numeric. As with Star Coordinates, we can interact in order to change the projection. The anchors represented by the circles can be moved around the unit circle.
However, even a factor dimension can be used for the coloring the points according to a label. This can be done by supplying the name of the column as a color.