In this page, we are going to discuss the UFO sightings trend over time. First we will look into the pattern of sightings by states over years, months, weekdays and hours. To get rid off the population effect, we also calculate the sightings per 1 million of each states and then examine the trend over year and month. Then we will explore the sightings by shape over years.
library(tidyverse)
library(cowplot)
theme_set(theme_minimal() + theme(legend.position = "bottom"))
options(
ggplot2.continuous.colour = "viridis",
ggplot2.continuous.fill = "viridis"
)
scale_colour_discrete = scale_colour_viridis_d
scale_fill_discrete = scale_fill_viridis_d
Sys.setlocale("LC_TIME", "en_US.UTF-8")
## [1] "en_US.UTF-8"
## read in census data for state and population
census <- read_csv("./data/us_census.csv")|>
rename(state = abbrv)|>
mutate(
state_pop = (census_2000+census_2010+census_2020)/3
)|>
select(state,state_pop)
## get state for 51 states
state_us <- census|>pull(state)|>unique()
## generate date information separately for latter trend analysis
ufo_trend <- read_csv("data/ufo_clean.csv")|>
filter(state %in% state_us)|>
mutate(
year = year(date_time),
month = month(date_time),
hour = hour(date_time),
weekday = factor(weekdays(date_time),
levels = c("Sunday",
"Monday",
"Tuesday" ,
"Wednesday",
"Thursday",
"Friday",
"Saturday"))
)
To look into the UFO sightings trend over years by states, here we take spaghetti plots to view the total sightings of each state throughout years.
## function to generate data for trend analysis
ufo_trend_var <- function(df,var1,var2){
tidy_data <- df |>
group_by(!!sym(var1), !!sym(var2))|>
summarise(
obs = n()
)|>
ungroup()
return(tidy_data)
}
ufo_trend_plot <- function(df,var_time,var_group){
trendplot <- df|>
ggplot(aes(x = !!sym(var_time), y = obs, color = !!sym(var_group)))+
geom_line()+
theme(legend.position = "none")+
geom_smooth(color = "red")
return(trendplot)
}
ufo_shape_plot <- function(df,var_time){
trendplot <- df|>
ggplot(aes(x = !!sym(var_time), y = obs, color = shape))+
geom_line()
return(trendplot)
}
ufo_per_trend <- function(df){
df_per_trend <- df|>
left_join(census,by = "state")|>
mutate(
obs = obs/state_pop*1000000
)
return(df_per_trend)
}
ufo_year_trend <- ufo_trend_var(ufo_trend,var1="state",var2="year")
ufo_year_trend_plot <- ufo_trend_plot(ufo_year_trend,"year","state")+
labs(title = "Trend over years by states")
ufo_year_trend_plot_rm <-
ufo_year_trend |>
filter(!(state %in% c("CA")) )|>
ufo_trend_plot("year","state")+
labs(title = "Trend over years by states(CA Removed)")
plot_grid(ufo_year_trend_plot,ufo_year_trend_plot_rm)
From the plot of each state’s UFO sightings over year, we can see that:
To avoid the influence of the extreme large sightings of CA on the overall trend, we may filter out this state and see the overall trend then.
The overall sightings trend changed little compared to the previous one.
Then we group the original data by state and month to see if there if any trend throughout months.
ufo_month_trend <- ufo_trend_var(ufo_trend,var1="state",var2="month")
ufo_month_trend_plot <-
ufo_month_trend |>
ufo_trend_plot("month","state")+
scale_x_continuous(breaks = 1:12, labels = month.abb)+
scale_color_manual(name = "state", values = c("FL" = "orange","CA" = "blue","WA"="yellow")) +
theme(legend.position = "bottom")+
labs(title = "Trend over month by states")
ufo_month_trend_plot_rm <-
ufo_month_trend |>
filter(!(state %in% c("CA")))|>
ufo_trend_plot("month","state")+
scale_x_continuous(breaks = 1:12, labels = month.abb)+
scale_color_manual(name = "state", values = c("FL" = "orange","WA"="yellow")) +
theme(legend.position = "bottom")+
labs(title = "Trend over month by states(CA Removed)")
plot_grid(ufo_month_trend_plot,ufo_month_trend_plot_rm)
Like what we did in the year trend, we want to reduce the impact of outlier. Therefore, we remove sightings of California. The trend is more clear that sightings in most states peaked at July and then declined and reached the lowest at around February.
ufo_hour_trend <- ufo_trend_var(ufo_trend,var1="state",var2="hour")
ufo_hour_trend_plot <-
ufo_hour_trend |>
ufo_trend_plot("hour","state")+
labs(title = "Trend over hours by states")
ufo_hour_trend_plot_rm <-
ufo_hour_trend |>
filter(!(state %in% c("CA")) )|>
ufo_trend_plot("hour","state")+
labs(title = "Trend over hours by states(CA Removed)")
plot_grid(ufo_hour_trend_plot,ufo_hour_trend_plot_rm)
After removing the outlier of California state, the trend is more clear.
ufo_weekday_trend <- ufo_trend_var(ufo_trend,var1="state",var2="weekday")
ufo_weekday_plot <- ufo_weekday_trend |>
mutate(
weekday = as.numeric(weekday)
)|>
ufo_trend_plot("weekday","state")+
scale_x_continuous(breaks = 1:7, labels = c("Sun","Mon","Tue","Wed","Thur","Fri","Sat"))+
labs(
title = "Trend over weekdays by states"
)
ufo_weekday_plot_rm<-
ufo_weekday_trend |>
filter(!(state %in% c("CA")) )|>
mutate(
weekday = as.numeric(weekday)
)|>
ufo_trend_plot("weekday","state")+
scale_x_continuous(breaks = 1:7, labels = c("Sun","Mon","Tue","Wed","Thur","Fri","Sat"))+
labs(title = "Trend over weekdays by states (CA Removed)")
plot_grid(ufo_weekday_plot,ufo_weekday_plot_rm)
The overall trend for all states is not clear. For California,the sightings declined from Sunday to Monday, then increased during weekdays and peaked at Saturday.
We may remove state California to check the overall trend again.
Now we may notice a trend showed by the red smooth line that the average sightings went through an slow increase started Monday through Friday and peaked at Saturday, then reduced during Sunday.
To make this statement clearer, we can view the bar plot of the sightings over weekdays.
ufo_weekday_trend |>
mutate(
weekday = as.numeric(weekday)
)|>
ggplot(aes(x = weekday, y = obs))+
geom_col()+
scale_x_continuous(breaks = 1:7, labels = c("Sun","Mon","Tue","Wed","Thur","Fri","Sat"))+
labs(
title = "Bar plot of sightings trend of states over weekdays"
)
The bar plot visualized the trend more clear. The overall sightings decreased from Sunday to Monday, then increased day by day and reached the highest values at Saturday.
In the previous part of this page, we discuss about the population effect on the overall trend over year. Here, we take the average of population of 2000,2010,2020 of each state, and use this average population to calculate the sightings per one million from 1995 to 2022 to get rid of the population effect, then we examine the average trend over time. We mainly discuss the trend of sightings per 1 million population over year and month here.
ufo_per <- ufo_trend|>
filter(
year >= 1995
)
ufo_year_per <- ufo_trend_var(ufo_per,"state","year")|>
ufo_per_trend()
ufo_year_per_plot <- ufo_year_per|>
ufo_trend_plot("year","state")+
labs(
title = "sightings by states over years(per 1M)",
y = "obs per 1M" )
ufo_year_per_plot_spec <- ufo_year_per|>
filter(state %in% c("VT","CA"))|>
ggplot(aes(y = obs,x = year, color = state))+
geom_line()+
labs(title = "sightings of VT and CA over years(per 1M)",
y = "obs per 1M")
plot_grid(ufo_year_per_plot,ufo_year_per_plot_spec)
ufo_month_per <- ufo_trend_var(ufo_per,"state","month")|>
ufo_per_trend()
ufo_month_per_plot <- ufo_month_per|>
ufo_trend_plot("month","state")+
scale_x_continuous(breaks = 1:12, labels = month.abb)+
labs(
title = "sightings over months(per 1M)",
y = "obs per 1M" )
ufo_month_per_plot_spec <- ufo_month_per|>
filter(state %in% c("CA","FL","AK"))|>
ggplot(aes(y = obs,x = month, color = state))+
geom_line()+
scale_x_continuous(breaks = 1:12, labels = month.abb)+
labs(title = "sightings over months (3 states, per 1M)",
y = "obs per 1M")
plot_grid(ufo_month_per_plot,ufo_month_per_plot_spec)
obs_shape <- ufo_trend |>
group_by(shape)|>
summarise(
obs = n()
)|>
arrange(desc(obs))|>
ungroup()
shape10 <- obs_shape |>
head(10)|>
pull(shape)
ufo_shape <- ufo_trend|>
filter(shape %in% shape10)
In total there are 23 shape categories in the original dataset. Here we extract the most observed 10 shapes to see the trend over year.
ufo_year_shape <- ufo_trend_var(ufo_shape,var1="shape",var2="year")
ufo_year_shape |>
mutate(
shape = factor(shape,levels = shape10)
)|>
ggplot(aes(x = year, y = obs))+
geom_line()+
facet_grid(cols = vars(shape))+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs(title = "Spaghetti plot of sightings trend over years by shapes")
For most observed shapes, they have similar trend over year: the sightings started to increase rapidly in 1990s and, then peaked at around 2014, followed by an sharp decrease in 2018 and another peak in 2020.
Light, circle and triangle were the most observed shape of UFO. The light shape had the most sharp change rate of all shapes.
However, trends of some shapes had different characteristics.
As for fireball
, there was a peak before 2000 and a
relatively slow increase rate after that compared to others
As for disk
, this is a traditional shape that comes to
people’s mind when talking about UFO. The sightings of disk went through
an increase in 1970s right after the foundation of NUFORC. And the
increase rate of disk sightings after 1990s was relatively low. The
highest value of sightings in one year of was never over 250. (We might
consider not taking disk as a classic shape of UFO.)