Contents

Tidy Tuesday Independence Days

Intro

This code was written live on twitch.tv @ https://twitch.tv/theeatgamelove. This post mostly consist of questions from Lexi, myself (Kyle), and our chat. Follow us on our socials to follow along during our next session!

Dataset - Independence Days

The data this week comes from Wikipedia and thank you to Isabella Velasquez for prepping this week’s dataset.

An independence day is an annual event commemorating the anniversary of a nation’s independence or statehood, usually after ceasing to be a group or part of another nation or state, or more rarely after the end of a military occupation. Many countries commemorate their independence from a colonial empire. American political commentator Walter Russell Mead notes that, “World-wide, British Leaving Day is never out of season.

Packages

1
2
3
4
5
6
7
8
9
library(tidytuesdayR)
library(tidyverse)
library(lubridate)
library(gghighlight)
library(hrbrthemes)
library(DataExplorer)
library(ghibli)
## Zodiac functions + palette
library(kowr)

Set our ggplot theme

1
theme_set(theme_ipsum())

Load in the data

1
2
## 
## 	Downloading file 1 of 1: `holidays.csv`

Transform data

We will be looking at decade and zodiac from the date of independence.

1
2
3
4
5
6
d <-
  tt$holidays %>% 
  mutate(
    decade = year(date_parsed) %/% 10 * 10,
    zodiac = zodiac_sign(date_parsed)
  )

Does year and year_of_event have different values across the rows?

1
all(d$year == d$year_of_event, na.rm = TRUE)
1
## [1] TRUE

Nope!

Peaking at missing data

1
profile_missing(d)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## # A tibble: 14 x 3
##    feature                      num_missing pct_missing
##    <fct>                              <int>       <dbl>
##  1 country                                0       0    
##  2 date_parsed                           27       0.125
##  3 weekday                               27       0.125
##  4 day                                   27       0.125
##  5 month                                 27       0.125
##  6 name_of_holiday                       39       0.181
##  7 date_of_holiday                       27       0.125
##  8 year_of_event                         42       0.194
##  9 independence_from                     35       0.162
## 10 event_commemorated_and_notes          71       0.329
## 11 year                                  27       0.125
## 12 date_mdy                              27       0.125
## 13 decade                                27       0.125
## 14 zodiac                                27       0.125

What decade is the most common for independence

Check out decades

1
2
d %>% 
  count(decade, sort = TRUE)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## # A tibble: 27 x 2
##    decade     n
##     <dbl> <int>
##  1   1960    43
##  2     NA    27
##  3   1970    25
##  4   1990    22
##  5   1910    15
##  6   1940    15
##  7   1950    15
##  8   1820    13
##  9   1980     8
## 10   1810     7
## # … with 17 more rows

1960 is the most common decade

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
d %>% 
  count(decade, sort = TRUE) %>% 
  filter(!is.na(decade)) %>% 
  top_n(5) %>% 
  mutate(decade = as.character(decade)) %>% 
  ggplot() +
  aes(
    x = decade,
    y = n,
    fill = decade
  ) +
  geom_col(
    color = "black",
    show.legend = FALSE
  ) +
  scale_fill_ghibli_d("MononokeMedium") +
  labs(
    x = "Decade",
    y = "Number of Countries",
    title = "Top 6 decades when countries gained independence"
  )

What country is the most common

1
2
d %>% 
  count(independence_from, sort = TRUE)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## # A tibble: 51 x 2
##    independence_from                                n
##    <chr>                                        <int>
##  1 United Kingdom                                  51
##  2 <NA>                                            35
##  3 France                                          26
##  4 Spanish Empire                                  17
##  5 Soviet Union                                    11
##  6 Ottoman Empire                                   7
##  7 Portugal                                         7
##  8 Russian Soviet Federative Socialist Republic     6
##  9 Spain                                            4
## 10 Belgium                                          3
## # … with 41 more rows

What decade and country re the most common

1
2
d %>% 
  count(decade, independence_from, sort = TRUE)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## # A tibble: 84 x 3
##    decade independence_from                                n
##     <dbl> <chr>                                        <int>
##  1     NA <NA>                                            27
##  2   1960 United Kingdom                                  21
##  3   1960 France                                          14
##  4   1970 United Kingdom                                  14
##  5   1990 Soviet Union                                    11
##  6   1820 Spanish Empire                                   7
##  7   1950 France                                           7
##  8   1910 Russian Soviet Federative Socialist Republic     6
##  9   1810 Spanish Empire                                   5
## 10   1940 United Kingdom                                   5
## # … with 74 more rows

Whats the newest country?

1
2
3
4
d %>% 
  filter(!is.na(date_parsed)) %>% 
  filter(date_parsed == max(date_parsed)) %>% 
  select(country, date_parsed, independence_from)
1
2
3
4
## # A tibble: 1 x 3
##   country     date_parsed independence_from
##   <chr>       <date>      <chr>            
## 1 South Sudan 2011-07-09  Sudan

Wow! Happy [soon to be] independence day South Sudan!

Whats the oldest country?

1
2
3
4
d %>% 
  filter(!is.na(date_parsed)) %>% 
  filter(date_parsed == min(date_parsed)) %>% 
  select(country, date_parsed, independence_from)
1
2
3
4
## # A tibble: 1 x 3
##   country     date_parsed independence_from
##   <chr>       <date>      <chr>            
## 1 Switzerland 1291-08-01  <NA>

Whats the range of decades?

1
summary(d$decade)
1
2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1290    1910    1960    1924    1970    2010      27
1
2
3
4
5
d %>% 
  filter(!is.na(decade)) %>% 
  ggplot() +
  aes(x = decade) +
  geom_histogram(color = "black")

Are there any countries that gained independence more than once?

1
2
3
4
5
6
multiples <-
  d %>% 
  count(country, sort = TRUE) %>% 
  filter(n > 1)

multiples
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## # A tibble: 21 x 2
##    country                      n
##    <chr>                    <int>
##  1 Armenia                      2
##  2 Azerbaijan                   2
##  3 Bulgaria                     2
##  4 Burkina Faso                 2
##  5 Central African Republic     2
##  6 Chad                         2
##  7 Congo, Republic of the       2
##  8 Cyprus                       2
##  9 Czech Republic               2
## 10 Ecuador                      2
## # … with 11 more rows
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
d_multiples <-
  d %>% 
  filter(country %in% multiples$country)

multiples_summarized <-
  d_multiples %>% 
  group_by(country) %>% 
  summarize(
    min_year = min(year),
    max_year = max(year),
    year_diff = max_year - min_year
  ) %>% 
  arrange(desc(year_diff))

multiples_summarized
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## # A tibble: 21 x 4
##    country        min_year max_year year_diff
##    <chr>             <dbl>    <dbl>     <dbl>
##  1 Qatar              1878     1971        93
##  2 Norway             1814     1905        91
##  3 Panama             1821     1903        82
##  4 Czech Republic     1918     1993        75
##  5 Armenia            1918     1991        73
##  6 Azerbaijan         1918     1991        73
##  7 Estonia            1918     1991        73
##  8 Georgia            1918     1991        73
##  9 Latvia             1918     1990        72
## 10 Lithuania          1918     1990        72
## # … with 11 more rows
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
multiples_summarized %>% 
  ggplot() +
  aes(
    x = min_year,
    y = max_year,
    size = year_diff,
    color = country
  ) +
  geom_point(
    show.legend = FALSE,
    alpha = 0.8
  )

Lexi says, small circles = HUGE independence. Nice. We can make a plot showing off the year ranges for these counties as well.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
multiples_summarized %>%
  mutate(
    country = fct_reorder(country, year_diff),
    display = year_diff %in% c(3, 11, 13, 26, 30, 75, 91, 93),
    row_id = as.character(row_number()),
    row_id = ifelse(display, row_id, 30)
  ) %>%
  ggplot() +
  aes(
    x = country,
    color = row_id
  ) +
  geom_linerange(
    mapping = aes(ymin = min_year, ymax = max_year),
    linetype = "dashed",
    size = 1.2
  ) +
  geom_point(aes(y = min_year)) +
  geom_point(aes(y = max_year)) +
  gghighlight(
    year_diff %in% c(3, 11, 13, 26, 30, 75, 91, 93),
    unhighlighted_params = list(colour = "black", alpha = 0.5),
    label_key = year_diff
  ) +
  coord_flip() +
  labs(
    x = "Country",
    y = "Year",
    title = "Countries who have gained \nindependence more than once",
    subtitle = "Total years between the two occurances"
  )

What zodiac signs are most common across countries who have gained independence

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
d %>% 
  count(zodiac, sort = TRUE) %>% 
  mutate(
    zodiac = factor(
      x = zodiac,
      levels = rev(c("Aries", "Taurus", "Gemini", "Cancer", "Leo", "Virgo", "Libra", "Scorpio", "Sagittarius", "Capricorn", "Aquarius", "Pisces"))
    )
  ) %>% 
  filter(!is.na(zodiac)) %>% 
  ggplot() +
  aes(
    x = zodiac,
    y = n,
    fill = zodiac
  ) +
  geom_col(
    color = "black",
    show.legend = FALSE
  ) +
  coord_flip() +
  labs(
    x = "",
    y = "",
    title = "Most common zodiac signs of independent countries",
    subtitle = "Based on when they gained their independence"
  ) +
  scale_fill_zodiac(use_factor_order = TRUE)

Exploring UK

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
uk <-
  d %>% 
  filter(independence_from == "United Kingdom")

uk %>% 
  count(decade) %>% 
  ggplot() +
  aes(
    x = decade,
    y = n
  ) +
  geom_point() +
  geom_smooth(se = FALSE)

Top 5 zodiac signs

1
2
3
4
5
d_top_zodiac <-
  d %>% 
  count(zodiac, sort = TRUE) %>% 
  filter(!is.na(zodiac)) %>% 
  top_n(4)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
d %>% 
  count(decade, zodiac, sort = TRUE) %>% 
  filter(
    zodiac %in% d_top_zodiac$zodiac,
    decade >= 1800
  ) %>% 
  ggplot() +
  aes(
    x = decade,
    y = n,
    color = zodiac,
    group = zodiac
  ) +
  geom_line() 

Is there a country that gained independence that had a country gain independence from them

1
2
3
4
5
6
7
8
woah <-
  d %>% 
  filter(
    independence_from %in% country
  )

woah %>% 
  count(independence_from, sort = TRUE)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## # A tibble: 16 x 2
##    independence_from     n
##    <chr>             <int>
##  1 France               26
##  2 Portugal              7
##  3 Spain                 4
##  4 Belgium               3
##  5 Denmark               2
##  6 Colombia              1
##  7 Ethiopia              1
##  8 Israel                1
##  9 Italy                 1
## 10 Malaysia              1
## 11 New Zealand           1
## 12 Pakistan              1
## 13 South Africa          1
## 14 Sudan                 1
## 15 Sweden                1
## 16 United States         1
1
2
3
4
5
d_france <-
  d %>% 
  filter(
    country == "France" | independence_from == "France"
  )
1
2
3
4
5
d_france %>% 
  count(decade) %>% 
  ggplot() +
  aes(x = decade, y = n) +
  geom_col()

France is the first observation. The rest of the observations are countries gaining their independence from France.

Common day of the week

1
2
d %>% 
  count(weekday, sort = TRUE)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## # A tibble: 8 x 2
##   weekday       n
##   <chr>     <int>
## 1 Saturday     31
## 2 Friday       30
## 3 Monday       29
## 4 <NA>         27
## 5 Thursday     26
## 6 Tuesday      25
## 7 Sunday       24
## 8 Wednesday    24

Reproducibility Receipt

Time Info
1
2
3

[1] "2021-07-17 19:38:09 CDT"


Repo Info
1
2
3
4
5

Local:    master /Users/Kow/projects/eatGameLove
Remote:   master @ origin (https://github.com/alexismeskowski/eatGameLove.git)
Head:     [04a4a32] 2021-07-12: updates


Session Info
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.3 (2020-10-10)
 os       macOS Big Sur 10.16         
 system   x86_64, darwin17.0          
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Chicago             
 date     2021-07-17Packages ───────────────────────────────────────────────────────────────────
 package      * version    date       lib source                        
 assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.0.2)                
 backports      1.1.10     2020-09-15 [1] CRAN (R 4.0.2)                
 blob           1.2.1      2020-01-20 [1] CRAN (R 4.0.2)                
 blogdown       0.21       2020-10-11 [1] CRAN (R 4.0.3)                
 bookdown       0.21       2020-10-13 [1] CRAN (R 4.0.3)                
 broom          0.7.1      2020-10-02 [1] CRAN (R 4.0.2)                
 bslib          0.2.4      2021-01-25 [1] CRAN (R 4.0.2)                
 cellranger     1.1.0      2016-07-27 [1] CRAN (R 4.0.2)                
 cli            2.3.1      2021-02-23 [1] CRAN (R 4.0.2)                
 clipr          0.7.1      2020-10-08 [1] CRAN (R 4.0.2)                
 colorspace     1.4-1      2019-03-18 [1] CRAN (R 4.0.2)                
 crayon         1.4.1      2021-02-08 [1] CRAN (R 4.0.2)                
 curl           4.3        2019-12-02 [1] CRAN (R 4.0.1)                
 data.table     1.14.0     2021-02-21 [1] CRAN (R 4.0.2)                
 DataExplorer * 0.8.2      2020-12-15 [1] CRAN (R 4.0.2)                
 DBI            1.1.0      2019-12-15 [1] CRAN (R 4.0.2)                
 dbplyr         1.4.4      2020-05-27 [1] CRAN (R 4.0.2)                
 desc           1.3.0      2021-03-05 [1] CRAN (R 4.0.3)                
 details      * 0.2.1      2020-01-12 [1] CRAN (R 4.0.2)                
 digest         0.6.27     2020-10-24 [1] CRAN (R 4.0.2)                
 dplyr        * 1.0.7      2021-06-18 [1] CRAN (R 4.0.2)                
 ellipsis       0.3.1      2020-05-15 [1] CRAN (R 4.0.2)                
 evaluate       0.14       2019-05-28 [1] CRAN (R 4.0.1)                
 extrafont      0.17       2014-12-08 [1] CRAN (R 4.0.2)                
 extrafontdb    1.0        2012-06-11 [1] CRAN (R 4.0.2)                
 fansi          0.4.2      2021-01-15 [1] CRAN (R 4.0.2)                
 farver         2.0.3      2020-01-16 [1] CRAN (R 4.0.2)                
 forcats      * 0.5.0      2020-03-01 [1] CRAN (R 4.0.2)                
 fs             1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                
 gdtools        0.2.3      2021-01-06 [1] CRAN (R 4.0.2)                
 generics       0.0.2      2018-11-29 [1] CRAN (R 4.0.2)                
 gghighlight  * 0.3.2      2021-06-05 [1] CRAN (R 4.0.2)                
 ggplot2      * 3.3.2      2020-06-19 [1] CRAN (R 4.0.2)                
 ggrepel        0.9.1      2021-01-15 [1] CRAN (R 4.0.2)                
 ghibli       * 0.3.2      2020-04-16 [1] CRAN (R 4.0.2)                
 git2r          0.27.1     2020-05-03 [1] CRAN (R 4.0.2)                
 glue           1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                
 gridExtra      2.3        2017-09-09 [1] CRAN (R 4.0.2)                
 gtable         0.3.0      2019-03-25 [1] CRAN (R 4.0.2)                
 haven          2.3.1      2020-06-01 [1] CRAN (R 4.0.2)                
 highr          0.8        2019-03-20 [1] CRAN (R 4.0.2)                
 hms            0.5.3      2020-01-08 [1] CRAN (R 4.0.2)                
 hrbrthemes   * 0.8.0      2020-03-06 [1] CRAN (R 4.0.2)                
 htmltools      0.5.1.1    2021-01-22 [1] CRAN (R 4.0.2)                
 htmlwidgets    1.5.2      2020-10-03 [1] CRAN (R 4.0.2)                
 httr           1.4.2      2020-07-20 [1] CRAN (R 4.0.2)                
 igraph         1.2.6      2020-10-06 [1] CRAN (R 4.0.2)                
 jquerylib      0.1.3      2020-12-17 [1] CRAN (R 4.0.2)                
 jsonlite       1.7.2      2020-12-09 [1] CRAN (R 4.0.2)                
 knitr          1.31       2021-01-27 [1] CRAN (R 4.0.2)                
 kowr         * 0.0.0.9000 2021-07-18 [1] Github (koderkow/kowr@945c2d5)
 labeling       0.3        2014-08-23 [1] CRAN (R 4.0.2)                
 lattice        0.20-41    2020-04-02 [1] CRAN (R 4.0.3)                
 lifecycle      1.0.0      2021-02-15 [1] CRAN (R 4.0.2)                
 lubridate    * 1.7.9      2020-06-08 [1] CRAN (R 4.0.2)                
 magrittr       2.0.1      2020-11-17 [1] CRAN (R 4.0.2)                
 Matrix         1.2-18     2019-11-27 [1] CRAN (R 4.0.3)                
 mgcv           1.8-33     2020-08-27 [1] CRAN (R 4.0.3)                
 modelr         0.1.8      2020-05-19 [1] CRAN (R 4.0.2)                
 munsell        0.5.0      2018-06-12 [1] CRAN (R 4.0.2)                
 networkD3      0.4        2017-03-18 [1] CRAN (R 4.0.2)                
 nlme           3.1-149    2020-08-23 [1] CRAN (R 4.0.3)                
 pillar         1.5.1      2021-03-05 [1] CRAN (R 4.0.3)                
 pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.0.2)                
 png            0.1-7      2013-12-03 [1] CRAN (R 4.0.2)                
 prismatic      1.0.0      2021-01-05 [1] CRAN (R 4.0.2)                
 purrr        * 0.3.4      2020-04-17 [1] CRAN (R 4.0.2)                
 R6             2.5.0      2020-10-28 [1] CRAN (R 4.0.2)                
 Rcpp           1.0.6      2021-01-15 [1] CRAN (R 4.0.2)                
 readr        * 1.4.0      2020-10-05 [1] CRAN (R 4.0.2)                
 readxl         1.3.1      2019-03-13 [1] CRAN (R 4.0.2)                
 reprex         0.3.0      2019-05-16 [1] CRAN (R 4.0.2)                
 rlang          0.4.10     2020-12-30 [1] CRAN (R 4.0.2)                
 rmarkdown      2.9        2021-06-15 [1] CRAN (R 4.0.2)                
 rprojroot      2.0.2      2020-11-15 [1] CRAN (R 4.0.2)                
 rstudioapi     0.13       2020-11-12 [1] CRAN (R 4.0.2)                
 Rttf2pt1       1.3.8      2020-01-10 [1] CRAN (R 4.0.2)                
 rvest          0.3.6      2020-07-25 [1] CRAN (R 4.0.2)                
 sass           0.3.1      2021-01-24 [1] CRAN (R 4.0.2)                
 scales         1.1.1      2020-05-11 [1] CRAN (R 4.0.2)                
 selectr        0.4-2      2019-11-20 [1] CRAN (R 4.0.2)                
 sessioninfo    1.1.1      2018-11-05 [1] CRAN (R 4.0.2)                
 stringi        1.5.3      2020-09-09 [1] CRAN (R 4.0.2)                
 stringr      * 1.4.0      2019-02-10 [1] CRAN (R 4.0.2)                
 systemfonts    1.0.2      2021-05-11 [1] CRAN (R 4.0.2)                
 tibble       * 3.1.0      2021-02-25 [1] CRAN (R 4.0.2)                
 tidyr        * 1.1.2      2020-08-27 [1] CRAN (R 4.0.2)                
 tidyselect     1.1.0      2020-05-11 [1] CRAN (R 4.0.2)                
 tidytuesdayR * 1.0.1      2020-07-10 [1] CRAN (R 4.0.2)                
 tidyverse    * 1.3.0      2019-11-21 [1] CRAN (R 4.0.2)                
 usethis        2.0.1      2021-02-10 [1] CRAN (R 4.0.2)                
 utf8           1.1.4      2018-05-24 [1] CRAN (R 4.0.2)                
 vctrs          0.3.8      2021-04-29 [1] CRAN (R 4.0.2)                
 withr          2.4.1      2021-01-26 [1] CRAN (R 4.0.2)                
 xfun           0.24       2021-06-15 [1] CRAN (R 4.0.2)                
 xml2           1.3.2      2020-04-23 [1] CRAN (R 4.0.2)                
 yaml           2.2.1      2020-02-01 [1] CRAN (R 4.0.2)                

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library