The purpose of squish, according to the documentation, is very simple: "squish values into range." This turns out to be incredibly useful in some situations.
Suppose that you are looking at data from on online course that describes the number of pages views for each module for each student (for our illustration, we'll assume 10 students and four modules, A, B, C, and D) :
Further imagine that there is one student who, for whatever reason, happened to have a huge number of page views for one of the modules. In other words, there is an outlier in one of the cells.
The challenge comes when you want to visualize the relative difference between students across the four modules. The presence of the outlier makes a heatmap-type visualization almost useless:
library(ggplot2)
data %>% ggplot(aes(x=Module, y=as.factor(Student))) +
ylab("Student") +
xlab("Module") +
geom_tile(aes(fill=Views)) +
scale_fill_gradient(low="green", high="blue") +
ggtitle("Page views per module by student")
This is where squish comes in. Instead of using the actual range of 1 view to 168 views to determine the colors, we can "squish" the range to be, say, 1 to 12. Anything over 12 will be "squished" into the range (i.e., treated as 12). By simply importing the scales library, setting out desired scale limits, and calling the squish function for the "out of bounds" (oob) argument, we now have a much better visualization:



No comments:
Post a Comment