
Sample points from position probability surfaces
Source:R/sample_points_from_probabilities.R
sample_points_from_probabilities.Rd
Generates random points based on probability surfaces from fish positioning results. Points are sampled with probability proportional to the specified probability column, allowing for Monte Carlo sampling from position estimates. Can process multiple fish and time periods simultaneously.
Usage
sample_points_from_probabilities(
positioning_results,
prob_column = "weighted_mean_DE_normalized_scaled",
n_points = 100,
fish_select = NULL,
time_select = NULL,
min_prob_threshold = 0.001,
max_prob_threshold = 1,
seed = NULL,
by_group = TRUE,
crs = NULL
)
Arguments
- positioning_results
A list returned by
calculate_fish_positions
containing position probabilities and summary statistics.- prob_column
Character. Name of the probability column to use for sampling. Default is "weighted_mean_DE_normalized_scaled". Available options include:
weighted_mean_DE_normalized - Raw detection probability
weighted_mean_DE_normalized_scaled - Scaled detection probability (0-1)
non_det_DE_normalized - Non-detection probability
non_det_DE_normalized_scaled - Scaled non-detection probability
integrated_prob - Integrated position probability
- n_points
Integer. Number of points to sample per fish-time combination. Default is 100.
- fish_select
Integer, character vector, or NULL. Fish ID(s) to sample from. If NULL, samples from all fish. Default is NULL.
- time_select
Numeric vector, character vector, POSIXct vector, or NULL. Time period(s) to sample from. If NULL, samples from all time periods. Default is NULL.
- min_prob_threshold
Numeric. Minimum probability threshold (0-1). Only cells with probability above this threshold are eligible for sampling. Default is 0.001 to exclude zero-probability cells.
- max_prob_threshold
Numeric. Maximum probability threshold (0-1). Only cells with probability below this threshold are eligible for sampling. Default is 1.0 (no upper limit). Set to lower values (e.g., 0.05) to exclude high-probability cells and focus on uncertainty regions.
- seed
Integer. Random seed for reproducible sampling. Default is NULL.
- by_group
Logical. If TRUE, samples n_points for each fish-time combination. If FALSE, samples n_points total distributed across all combinations. Default is TRUE.
- crs
Coordinate reference system for the output sf object. Can be:
NULL (default) - uses CRS from positioning_results
Numeric EPSG code (e.g., 4326 for WGS84, 32618 for UTM Zone 18N)
Character proj4 string
An sf/sfc object from which to extract CRS
Value
An sf object containing the sampled points with columns:
- fish_id
Fish identifier
- time_period
Time period identifier
- time_period_posix
POSIXct datetime for the time period
- time_period_label
Human-readable time period label
- x
X coordinates
- y
Y coordinates
- probability
The probability value used for sampling
- sample_id
Sequential sample identifier
- group_id
Unique identifier for each fish-time combination
- geometry
sf point geometry
Details
This function performs weighted random sampling where each spatial cell has a probability of being selected proportional to its probability value in the specified column. This is useful for:
Monte Carlo analysis of position uncertainty
Generating representative sample locations for further analysis
Creating random tracks based on position probability surfaces
Uncertainty propagation in downstream analyses
The sampling process:
Filters data by fish_select and time_select if specified
Removes cells below min_prob_threshold
Normalizes probabilities to sum to 1 within each time period
Performs weighted sampling without replacement (or with if n_points > available cells)
Examples
if (FALSE) {
# Calculate fish positions first
positioning_results <- calculate_fish_positions(station_detections, distances, stations)
# Sample 100 points per fish-time combination for all data
sampled_points <- sample_points_from_probabilities(
positioning_results,
prob_column = "weighted_mean_DE_normalized_scaled",
n_points = 100,
seed = 123
)
# Sample from multiple specific fish and time periods
multi_samples <- sample_points_from_probabilities(
positioning_results,
prob_column = "integrated_prob",
n_points = 50,
fish_select = c(1, 2, 3),
time_select = c("2025-07-15", "2025-07-16", "2025-07-17"),
by_group = TRUE # 50 points per fish-time combination
)
# Sample 1000 points total distributed across all groups
distributed_samples <- sample_points_from_probabilities(
positioning_results,
n_points = 1000,
by_group = FALSE # Distribute 1000 points across all combinations
)
# Sample only from low-probability uncertainty regions
uncertainty_samples <- sample_points_from_probabilities(
positioning_results,
prob_column = "integrated_prob",
n_points = 200,
min_prob_threshold = 0.001,
max_prob_threshold = 0.05, # Exclude high-probability areas
seed = 456
)
# Sample with specific CRS (UTM Zone 18N)
utm_samples <- sample_points_from_probabilities(
positioning_results,
n_points = 100,
crs = 32618 # EPSG code for UTM Zone 18N
)
# Plot sampled points colored by time
library(ggplot2)
ggplot() +
geom_sf(data = multi_samples, aes(color = time_period_posix)) +
scale_color_viridis_c() +
facet_wrap(~fish_id) +
theme_minimal() +
labs(title = "Sampled Position Points by Fish and Time")
# Temporal centroid analysis
library(dplyr)
temporal_centroids <- multi_samples %>%
group_by(fish_id, time_period_posix, time_period_label) %>%
summarise(
n_samples = n(),
mean_prob = mean(probability),
centroid = sf::st_union(geometry) %>% sf::st_centroid(),
.groups = 'drop'
) %>%
sf::st_as_sf()
# Create animated track from sampled points
track_samples <- multi_samples %>%
arrange(fish_id, time_period_posix) %>%
group_by(fish_id) %>%
summarise(
track = sf::st_cast(sf::st_union(geometry), "LINESTRING")
)
}