D3.js is a famous JavaScript library that allows one to create extremely flexible SVG graphics however D3 has (at least according to me) a pretty steep learning curve. Further, in order to understand some core concepts, one need to have some basics in HTML, CSS and JavaScript. ddplot aims to simply the process using a set of functions that render several graphics using a simple R API. Finally, ddplot is built upon the amazing r2d3 package which makes it a breeze to interface D3.js with R, so a big thanks to the developers.

scatterPlot()

Let’s work with the mpg data frame from the ggplot2 package.

library(ggplot2) # needed for the mpg data frame

scatterPlot(
  data = mpg,
  x = "hwy",
  y = "cty",
  xtitle = "hwy variable",
  ytitle = "cty variable",
  title = "cty and hwy relationship",
  titleFontSize = 20
)

In comparison to ggplot2, graphics’ customization in ddplot is limited nonetheless you get a fully vectorized SVG which is cool.

scatterPlot(
  data = mpg,
  x = "displ",
  y = "cty",
  col = "tomato",
  bgcol = "pink",
  size = 3,
  stroke = "royalblue",
  strokeWidth = 1,
  xtitle = "displ variable",
  ytitle = "cty variable",
  xticks = 3,
  yticks = 3)

histogram()

The histogram() function allows you to visualize the distribution of a vector of data:

histogram(
  x = mpg$hwy,
  bins = 20,
  fill = "crimson",
  stroke = "white",
  strokeWidth = 1,
  title = "Distribution of the hwy variable",
  width = "20",
  height = "10"
)

animatedHistogram()

This function allows you to create a one-click histogram animation. Useful for presentation purposes. Click on the following empty plot and see what happens:

animatedHistogram(
  x = mpg$hwy,
  duration = 2000,
  delay = 100,
  fill = "lime",
  stroke = "white",
  bgcol = "white"
  )

Note that you can customize the animation using the two parameters duration and delay.

barChart()

The barChat() function allows you to create bar charts however you need to make the aggregation beforehand. In the following example, we will plot the average cty for each manufacturer using the dplyr package.

library(dplyr)

mpg %>% group_by(manufacturer) %>%
  summarise(mean_cty = mean(cty)) %>%
  barChart(
    x = "manufacturer",
    y = "mean_cty",
    xFontSize = 10,
    yFontSize = 10,
    fill = "orange",
    strokeWidth = 2,
    ytitle = "average cty value",
    title = "Average City Miles per Gallon by manufacturer"
  )

The bars can be easily sorted in ascending or descending order using the sort parameter:

mpg %>% group_by(manufacturer) %>%
  summarise(mean_cty = mean(cty)) %>%
  barChart(
    x = "manufacturer",
    y = "mean_cty",
    sort = "ascending",
    xFontSize = 10,
    yFontSize = 10,
    fill = "orange",
    strokeWidth = 1,
    ytitle = "average cty value",
    title = "Average City Miles per Gallon by manufacturer",
    titleFontSize = 16
  )

horzBarChart()

If you’ve many categories, it might be a good idea to go for a horizontal bar chart. It has the same parameters as the barChart() function except that the x-axis parameter is named value and the y-axis parameter named label, this naming convention aims to mitigate some confusion that can arise.

If we want to replicate the above graphic in a horizontal way, we can do:

mpg %>% group_by(manufacturer) %>%
  summarise(mean_cty = mean(cty)) %>%
  horzBarChart(
    label = "manufacturer",
    value = "mean_cty",
    sort = "ascending",
    labelFontSize  = 10,
    valueFontSize = 10,
    fill = "orange",
    stroke = "crimson",
    strokeWidth = 1,
    valueTitle  = "average cty value",
    title = "Average City Miles per Gallon by manufacturer",
    titleFontSize = 16
  )

As in barChart(), we can aslo sort in descending order:

mpg %>% group_by(manufacturer) %>%
  summarise(mean_cty = mean(cty)) %>%
  horzBarChart(
    label = "manufacturer",
    value = "mean_cty",
    sort = "descending",
    labelFontSize  = 10,
    valueFontSize = 10,
    bgcol = "black",
    axisCol = "white",
    fill = "white",
    stroke = "white",
    strokeWidth = 1,
    valueTitle  = "average cty value",
    labelTitle = "Manufacturers",
    title = "Average City Miles per Gallon by manufacturer",
    titleFontSize = 16
  )

lollipopChart()

lollipop chart follows the same behavior as bar charts but instead of bars you get lollipops, hence the name. Below an example of a lollipop chart with ddplot:

mpg %>% group_by(drv) %>%
  summarise(median_cty = median(cty)) %>%
  lollipopChart(
    x = "drv",
    y = "median_cty",
    sort = "ascending",
    xtitle = "drv variable",
    ytitle = "median cty",
    title = "Median cty per drv",
    xFontSize = 20
  )

It’s possible to grasp the distribution of some variable according to a specific categorical variable using the same function:


mpg %>% filter(year == 2008) %>%
lollipopChart(
    x = "manufacturer",
    y = "hwy",
    circleFill = 'red',
    circleStroke = 'orange',
    circleRadius = 5,
    sort = "none",
    xFontSize = 10
  )

From above, it’s quite easy to notice that although Toyota has two cars with high highway miles per galon (hwy), it also produces many other vehicles with poor hwy.

horzLollipop()

Same with bar charts, if you have a variable that has many categorical values, you can work with the reversed version of lollipopChart() which is horzLollipop():

mpg %>% group_by(manufacturer) %>%
  summarise(median_cty = median(cty)) %>%
  horzLollipop(
    label = "manufacturer",
    value = "median_cty",
    sort = "descending")

You can also do:

mpg %>% filter(year == 2008) %>%
horzLollipop(
    label = "manufacturer",
    value = "hwy",
    circleFill = 'red',
    circleStroke = 'orange',
    circleRadius = 5,
    sort = "none"
  )

pieChart()

Pie charts and donut charts are pretty straightforward to set up. We’ll use a sample from the starwars data frame to plot a simple pie chart.

# starwars is part of the dplyr data frame
mini_starwars <- starwars %>% tidyr::drop_na(mass) %>%
  sample_n(size = 5) # getting 5 random values

pieChart(
  data = mini_starwars,
  value = "mass",
  label = "name"
)

Using the padRadius, padAngle and cornerRadius parameters, one can get fanciers pie charts:

pieChart(
  data = mini_starwars,
  value = "mass",
  label = "name",
  padRadius = 200,
  padAngle = 0.1,
  cornerRadius = 50,
  innerRadius = 10
)

If you need a donut chart, you just need to play with the innerRadius parameter:

pieChart(
  data = mini_starwars,
  value = "mass",
  label = "name",
  innerRadius = 120,
  cornerRadius = 20,
  title = "5 Starwars characters ranked by their mass",
  titleFontSize = 16,
  bgcol = "yellow"
)

lineChart()

The lineChart() function is used to plot time series data. The use must provide a date variable that has the yyyy-mm-dd format. In the following example, we’ll use the Air Passenger built-in ts data and convert it to a classical data frame:

# 1. converting AirPassengers to a tidy data frame
airpassengers <- data.frame(
  passengers = as.matrix(AirPassengers),
  date= zoo::as.Date(time(AirPassengers))
)

# 2. plotting the line chart
lineChart(
  data = airpassengers,
  x = "date",
  y = "passengers"
)

You can modify the line interpolation using the curve parameter:

lineChart(
  data = airpassengers,
  x = "date",
  y = "passengers",
  curve = "curveStep"
)
lineChart(
  data = airpassengers,
  x = "date",
  y = "passengers",
  curve = "curveCardinal"
)
lineChart(
  data = airpassengers,
  x = "date",
  y = "passengers",
  curve = "curveBasis"
)

animLineChart()

Heavily inspired from Jure Stabuc’s example, the animLineChart() function create an empty SVG but when each time you click on it a line chart animation starts. Note that the line lasts after the end of the animation. Go ahead, click on the empty graphic below:

animLineChart(
  data = airpassengers,
  x = "date",
  y = "passengers",
  duration = 10000, # in milliseconds (10 seconds)
  curve = "curveCardinal"
  )

areaChart()

areaChart() works similarly except that instead of a line you get an area.

# 1. converting AirPassengers to a tidy data frame
airpassengers <- data.frame(
  passengers = as.matrix(AirPassengers),
  date= zoo::as.Date(time(AirPassengers))
)

# 2. plotting the area chart
areaChart(
  data = airpassengers,
  x = "date",
  y = "passengers",
  fill = "purple",
  bgcol = "white"
)

areaBand()

areaBand() lets you plot a filled area between two y-values. For the sake of the example, let’s create an additional column passengers_upper that has an additional 40 passengers for each observation:

airpassengers <- data.frame(
  passengers_lower = as.matrix(AirPassengers),
  passengers_upper = as.matrix(AirPassengers) + 40,
  date= zoo::as.Date(time(AirPassengers))
)

areaBand(
  data = airpassengers,
  x = "date",
  yLower = "passengers_lower",
  yUpper = "passengers_upper",
  fill = "yellow",
  stroke = "black"
)

stackedAreaChart()

This function allows you to create a stacked area chart. You need two components:

  • A data frame in wide format (see an example below). If it’s in wide format, you can still use pivot_wider() from the tidyr package to make wider.
  • A date variable in yyyy-mm-dd format that will plotted in the x-axis.

Let’s work with the following data frame (shortened) provided by Mike Bostock in his stacked area chart example:

data <- data.frame(
  date = c(
    "2000-01-01", "2000-02-01", "2000-03-01", "2000-04-01",
    "2000-05-01", "2000-06-01", "2000-07-01",
    "2000-08-01", "2000-09-01", "2000-10-01"
  ),
  Trade = c(
    2000,1023, 983, 2793, 1821, 1837, 1792, 1853, 791, 739
  ),
  Manufacturing = c(
    734, 694, 739, 736, 685, 621, 708, 685, 667, 693
  ),
  Leisure = c(
    1782, 1779, 1789, 658, 675, 833, 786, 675, 636, 691
  ),
  Agriculture = c(
    655, 587,623, 517, 561, 2545, 636, 584, 559, 2504
  )
)

data
#>          date Trade Manufacturing Leisure Agriculture
#> 1  2000-01-01  2000           734    1782         655
#> 2  2000-02-01  1023           694    1779         587
#> 3  2000-03-01   983           739    1789         623
#> 4  2000-04-01  2793           736     658         517
#> 5  2000-05-01  1821           685     675         561
#> 6  2000-06-01  1837           621     833        2545
#> 7  2000-07-01  1792           708     786         636
#> 8  2000-08-01  1853           685     675         584
#> 9  2000-09-01   791           667     636         559
#> 10 2000-10-01   739           693     691        2504

Note that when running stackedAreaChart() all the variables available within the considered data frame will be plotted. If you want to restrict the plotting to only specific variables, just drop the unneeded columns:

stackedAreaChart(
  data = data,
  x = "date",
  legendTextSize = 14
  )

You can modify the color scheme using the colorCategory parameter:

stackedAreaChart(
  data = data,
  x = "date",
  legendTextSize = 14,
  curve = "curveCardinal",
  colorCategory = "Accent",
  bgcol = "white",
  stroke = "black",
  strokeWidth = 1
  )
stackedAreaChart(
  data = data,
  x = "date",
  legendTextSize = 14,
  curve = "curveBasis",
  colorCategory = "Set3",
  bgcol = "black",
  axisCol = "white",
  xticks = 4,
  stroke = "black"
  )

You can find list of D3 categorical color schemes here

Finally, if you hover over the chart you’ll notice a tooltip that identified the different area categories.

barChartRace()

This function allows you to create an animated bar chart race. barChartRace() is similar to barChart() but takes a third variable mapped to the time dimension, with options for styling transitions.

Let’s make a bar chart race of population growth among various countries using a subset of the gapminder dataset from the {gapminder} package:

<<<<<<< HEAD
gapminder_subset <- gapminder::gapminder %>%
  select(country, year, pop) %>% 
  filter(country %in% c("Japan", "Mexico", "Germany", "Brazil", "Philippines", "Vietnam")) %>%
  mutate(pop = pop/1e6)
=======
gapminder_subset <- gapminder::gapminder %>% select(country, year, pop) %>%
    filter(country %in% c("Japan", "Mexico", "Germany", "Brazil", "Mexico", "Philippines", "Vietnam")) %>%
    mutate(pop = pop/1e6)
>>>>>>> 6bab1415a132b17bda7192e7e2e63758614d5161

gapminder_subset %>%
  slice_sample(n = 10)

#>    year       pop     country
#> 1  2007  91.07729 Philippines
#> 2  1997  76.04900     Vietnam
#> 3  1972 107.18827       Japan
#> 4  1967  39.46391     Vietnam
#> 5  1952  30.14432      Mexico
#> 6  1987 142.93808      Brazil
#> 7  1997 168.54672      Brazil
#> 8  1962  41.12148      Mexico
#> 9  1952  69.14595     Germany
#> 10 1957  91.56301       Japan

In this example, we simply pass call barChartRace() like barChart(), but with an additional variable mapped to the time dimension specified with time = year:

gapminder_subset %>%
  barChartRace(
    x = "pop",
    y = "country",
    time = "year",
    ytitle = "Country",
    xtitle = "Population (in millions)",
    title = "Bar chart race of country populations"
  )

You can also stylize transitions with the frameDur, transitionDur, and ease arguments. For example, setting the time spent pausing on each frame to zero with frameDur = 0 will create a smooth animation:

gapminder_subset %>%
  barChartRace(
    x = "pop",
    y = "country",
    time = "year",
    transitionDur = 1000,
    frameDur = 0,
    ytitle = "Country",
    xtitle = "Population (in millions)",
    title = "Bar chart race of country populations"
  )

As you might have noticed, the value of the column passed to the time argument is automatically labelled at the bottom-right corner of the plot panel. We can stylize this with a list of options passed to the timeLabelOpts argument (or turn it off with timeLabel = FALSE). We also give the bars a little bounce here with ease = "BackInOut" for fun.

gapminder_subset %>%
  barChartRace(
    x = "pop",
    y = "country",
    time = "year",
    ease = "BackInOut",
    ytitle = "Country",
    xtitle = "Population (in millions)",
    title = "Bar chart race of country populations",
    timeLabelOpts = list(
      size = 40,
      prefix = "Year: ",
      xOffset = 0.2
    )
  )

More to Come …