In this page we will explore where UFO sightings are reported in the United States. First we will visualize where throughout the U.S. these reports are being made, and then compare reports made in urban/suburban areas (i.e. close to major cities), and those made in rural areas (i.e. far from major cities). We separate these two groups using a cutoff criteria of being within 5 miles of a city with more than 200,000 residents.
## Default packages & settings
# Load packages
library(tidyverse)
library(knitr)
library(leaflet)
library(usmap)
# Set seed for reproducibility
set.seed(1)
# Set default figure options
knitr::opts_chunk$set(
fig.width = 6,
out.width = "90%"
)
theme_set(theme_bw() + theme(legend.position = "bottom"))
options(
ggplot2.continuous.colour = "viridis",
ggplot2.continuous.fill = "viridis"
)
scale_colour_discrete = scale_colour_viridis_d
scale_fill_discrete = scale_fill_viridis_d
## Data import
# Import UFO data
df_ufo = read_csv("data/ufo_clean.csv")
# Add population data
df_pop =
read_csv("data/us_census.csv") |>
rename(state = abbrv) |>
select(state, census_2010)
To begin, we simply visualize a random 2% of these reports to get a sense of where these sightings are throughout the country.
df_ufo |>
sample_frac(0.02) |>
leaflet() |>
addProviderTiles(providers$CartoDB.Positron) |>
addCircleMarkers(~city_longitude,~city_latitude, radius = 1)
At first glance, these reports seem to cluster along the coasts and in other regions of high population density. So, we will adjust for population at the state level and re-visualize the results.
df_ufo |>
group_by(state) |>
summarize(n_obs = n()) |>
left_join(df_pop, by = join_by(state)) |>
mutate(obs_per = n_obs/census_2010*100000) |>
plot_usmap(data = _, values = "obs_per", color = "#333333") +
labs(
title = "UFO Reports per 100,000 Population"
) +
scale_fill_continuous(name = "Reports per 100k") +
theme(legend.position = "bottom")
This heatmap reveals three clusters with particularly high UFO reports. The Northwest, including Washington, Oregon, Idaho, Montana, and Alaska are the 5 states with the highest population-adjusted UFO reports ranging from 66 (AK) to 81 (WA) UFO reports per 100,000 residents. Next is northern New England, with New Hampshire, Vermont, and Maine taking positions 6-8 in the list. These states have between 61 (ME) and 66 (NH) UFO reports per 100,000. Rounding out the top 10, we have Arizona and New Mexico, with 61 and 59 reports per 100,000 residents, respectively.
## Table form
df_ufo |>
group_by(state) |>
summarize(n_obs = n()) |>
left_join(df_pop, by = join_by(state)) |>
mutate(obs_per = n_obs/census_2010*100000) |>
arrange(desc(obs_per)) |>
head(10) |>
kable(
col.names = c("State", "Number of UFO Sightings",
"Population (2010 census)", "UFO sightings per 100k"),
digits = 1)
State | Number of UFO Sightings | Population (2010 census) | UFO sightings per 100k |
---|---|---|---|
WA | 5443 | 6724540 | 80.9 |
MT | 768 | 989415 | 77.6 |
OR | 2818 | 3831074 | 73.6 |
ID | 1062 | 1567582 | 67.7 |
AK | 473 | 710231 | 66.6 |
NH | 871 | 1316470 | 66.2 |
VT | 405 | 625741 | 64.7 |
ME | 818 | 1328361 | 61.6 |
AZ | 3894 | 6392017 | 60.9 |
NM | 1223 | 2059179 | 59.4 |
If we cluster the number of observations by nearest large city, we see a similar result. The two cities with the most UFO sightings nearby are are Seattle, WA and Portland, OR. These are large cities but not the largest in the U.S. The West Coast seems to have more reports than other areas, though St. Louis is a bit of an outlier in this regard.
df_ufo |>
group_by(closest_city) |>
summarize(n_obs = n()) |>
arrange(desc(n_obs)) |>
head(10) |>
kable()
closest_city | n_obs |
---|---|
Portland | 2646 |
Seattle | 2472 |
Chicago | 1842 |
Los Angeles | 1378 |
St. Louis | 1361 |
Atlanta | 1332 |
Sacramento | 1254 |
Arlington | 1211 |
Philadelphia | 1191 |
Spokane | 1174 |
We also plotted the distribution of the dist
variable,
the distance to the nearest city of at least 200,000 residents. This
histogram shows that the most reports occur close to large cities, but
there is a long tail of reports that were made further away. There are
also 1065 reports that were made further than 200 miles from a large
city, and 90 that were more than 400 miles away.
df_ufo |>
filter(dist <= 200) |>
ggplot(aes(x = dist)) +
geom_histogram(bins = 21, alpha = 0.7, col = I("black")) +
labs(
title = "Distribution of reports by distance",
y = "Number of reports",
x = "Distance to large city (miles)"
)
With this preliminary understanding of the geographic distribution of
UFO reports, we wanted to explore if the distance from large cities is
related to characteristics of the report. To do this, we separate
reports into urban/suburban
and rural
categories, using a cutoff value of 5 miles to a large city. This is not
a perfect division, as this distinction should really be done by
population density, but should be servicable to identify major
differences, as will be seen below.
Our first question was whether the type of object reported differed between these group, as determined by the shape of the UFO.
df_loc_shape =
df_ufo |>
group_by(location, shape) |>
summarize(n_obs = n()) |>
arrange(desc(n_obs)) |>
pivot_wider(
names_from = "location",
values_from = "n_obs"
)
df_loc_shape |>
head(10) |>
kable(col.names = c("Shape", "Rural", "Urban/Suburban"))
Shape | Rural | Urban/Suburban |
---|---|---|
light | 16728 | 5065 |
circle | 8559 | 2632 |
triangle | 7531 | 2210 |
fireball | 5714 | 1864 |
unknown | 5482 | 1648 |
other | 5298 | 1749 |
sphere | 5130 | 1722 |
disk | 4098 | 1381 |
oval | 3360 | 1116 |
formation | 2677 | 848 |
Both rural and urban reports look similar, with the same ranking of
the reported shapes. The most common descriptor of the UFO was just a
light
, followed by either circle
or
triangle
. To analyze whether descriptions of UFOs varied
between these groups, we conducted a Chi-Squared test of homogeneity.
This revealed that the distribution of shapes did differ significantly
between the urban/suburban and rural categories.
df_loc_shape |>
select(-shape) |>
as.matrix() |>
chisq.test()
##
## Pearson's Chi-squared test
##
## data: as.matrix(select(df_loc_shape, -shape))
## X-squared = 72.097, df = 22, p-value = 3.082e-07
This result is not being influenced by the comparatively few observations for some shapes. If the data set is restricted to only the top 10 or top 15 shapes, the p-value is still far below 0.05.
Next, we wish to consider whether the length of UFO encounters is different between the urban/suburban and rural groups. First, we will look at the data, which skewed right (more observations on the second scale than the minute scale, and more on the minute scale than the hour scale). So to understand the distribution, we will look at the data on the log scale.
df_ufo |>
drop_na(duration_clean) |>
filter(duration_clean != 0) |>
ggplot(aes(x = duration_clean, fill = location)) +
geom_histogram(alpha = 0.5, bins = 41, col = I("black")) +
scale_x_continuous(
trans = "log10",
breaks = scales::trans_breaks("log10", function(x) 10^x),
labels = scales::trans_format("log10", scales::math_format(10^.x))
) +
labs(
title = "Distribution of encounter duration",
y = "Number of reports",
x = "Encounter Duration (seconds)",
fill = "Location"
)
Even though the data are highly right-skewed, there are enough observations that it is still valid to compare the data using a t-test. There are 1266 reports with a duration over 2 hours, comprising about 1% of the sample. These reports range from several hours to several months. I will restrict the comparison of the data to those with encounters lasting at most hours.
df_ufo |>
drop_na(duration_clean) |>
filter(duration_clean <= 7200) |>
t.test(duration_clean ~ location, data = _)
##
## Welch Two Sample t-test
##
## data: duration_clean by location
## t = 6.2378, df = 38429, p-value = 4.485e-10
## alternative hypothesis: true difference in means between group rural and group urban is not equal to 0
## 95 percent confidence interval:
## 38.34605 73.48570
## sample estimates:
## mean in group rural mean in group urban
## 625.4033 569.4874
The results of this test suggest that there is a significant difference between the average duration of the rural UFO sightings and the urban UFO sightings. We estimate that sightings in rural areas last for approximately one minute longer than sightings in urban areas (10.5 minutes vs. 9.5 minutes).