Sample points from position probability surfaces

Generates random points based on probability surfaces from fish positioning results. Points are sampled with probability proportional to the specified probability column, allowing for Monte Carlo sampling from position estimates. Can process multiple fish and time periods simultaneously.

Usage

sample_points_from_probabilities(
  positioning_results,
  prob_column = "weighted_mean_DE_normalized_scaled",
  n_points = 100,
  fish_select = NULL,
  time_select = NULL,
  min_prob_threshold = 0.001,
  max_prob_threshold = 1,
  seed = NULL,
  by_group = TRUE,
  crs = NULL
)

Arguments

positioning_results

A list returned by calculate_fish_positions containing position probabilities and summary statistics.

prob_column

Character. Name of the probability column to use for sampling. Default is "weighted_mean_DE_normalized_scaled". Available options include:

weighted_mean_DE_normalized - Raw detection probability
weighted_mean_DE_normalized_scaled - Scaled detection probability (0-1)
non_det_DE_normalized - Non-detection probability
non_det_DE_normalized_scaled - Scaled non-detection probability
integrated_prob - Integrated position probability

n_points

Integer. Number of points to sample per fish-time combination. Default is 100.

fish_select

Integer, character vector, or NULL. Fish ID(s) to sample from. If NULL, samples from all fish. Default is NULL.

time_select

Numeric vector, character vector, POSIXct vector, or NULL. Time period(s) to sample from. If NULL, samples from all time periods. Default is NULL.

min_prob_threshold

Numeric. Minimum probability threshold (0-1). Only cells with probability above this threshold are eligible for sampling. Default is 0.001 to exclude zero-probability cells.

max_prob_threshold

Numeric. Maximum probability threshold (0-1). Only cells with probability below this threshold are eligible for sampling. Default is 1.0 (no upper limit). Set to lower values (e.g., 0.05) to exclude high-probability cells and focus on uncertainty regions.

seed

Integer. Random seed for reproducible sampling. Default is NULL.

by_group

Logical. If TRUE, samples n_points for each fish-time combination. If FALSE, samples n_points total distributed across all combinations. Default is TRUE.

crs

Coordinate reference system for the output sf object. Can be:

NULL (default) - uses CRS from positioning_results
Numeric EPSG code (e.g., 4326 for WGS84, 32618 for UTM Zone 18N)
Character proj4 string
An sf/sfc object from which to extract CRS

Value

An sf object containing the sampled points with columns:

fish_id: Fish identifier
time_period: Time period identifier
time_period_posix: POSIXct datetime for the time period
time_period_label: Human-readable time period label
x: X coordinates
y: Y coordinates
probability: The probability value used for sampling
sample_id: Sequential sample identifier
group_id: Unique identifier for each fish-time combination
geometry: sf point geometry

Details

This function performs weighted random sampling where each spatial cell has a probability of being selected proportional to its probability value in the specified column. This is useful for:

Monte Carlo analysis of position uncertainty
Generating representative sample locations for further analysis
Creating random tracks based on position probability surfaces
Uncertainty propagation in downstream analyses

The sampling process:

Filters data by fish_select and time_select if specified
Removes cells below min_prob_threshold
Normalizes probabilities to sum to 1 within each time period
Performs weighted sampling without replacement (or with if n_points > available cells)

Examples

if (FALSE) {
# Calculate fish positions first
positioning_results <- calculate_fish_positions(station_detections, distances, stations)

# Sample 100 points per fish-time combination for all data
sampled_points <- sample_points_from_probabilities(
  positioning_results,
  prob_column = "weighted_mean_DE_normalized_scaled",
  n_points = 100,
  seed = 123
)

# Sample from multiple specific fish and time periods
multi_samples <- sample_points_from_probabilities(
  positioning_results,
  prob_column = "integrated_prob",
  n_points = 50,
  fish_select = c(1, 2, 3),
  time_select = c("2025-07-15", "2025-07-16", "2025-07-17"),
  by_group = TRUE  # 50 points per fish-time combination
)

# Sample 1000 points total distributed across all groups
distributed_samples <- sample_points_from_probabilities(
  positioning_results,
  n_points = 1000,
  by_group = FALSE  # Distribute 1000 points across all combinations
)

# Sample only from low-probability uncertainty regions
uncertainty_samples <- sample_points_from_probabilities(
  positioning_results,
  prob_column = "integrated_prob",
  n_points = 200,
  min_prob_threshold = 0.001,
  max_prob_threshold = 0.05,  # Exclude high-probability areas
  seed = 456
)

# Sample with specific CRS (UTM Zone 18N)
utm_samples <- sample_points_from_probabilities(
  positioning_results,
  n_points = 100,
  crs = 32618  # EPSG code for UTM Zone 18N
)

# Plot sampled points colored by time
library(ggplot2)
ggplot() +
  geom_sf(data = multi_samples, aes(color = time_period_posix)) +
  scale_color_viridis_c() +
  facet_wrap(~fish_id) +
  theme_minimal() +
  labs(title = "Sampled Position Points by Fish and Time")

# Temporal centroid analysis
library(dplyr)
temporal_centroids <- multi_samples %>%
  group_by(fish_id, time_period_posix, time_period_label) %>%
  summarise(
    n_samples = n(),
    mean_prob = mean(probability),
    centroid = sf::st_union(geometry) %>% sf::st_centroid(),
    .groups = 'drop'
  ) %>%
  sf::st_as_sf()

# Create animated track from sampled points
track_samples <- multi_samples %>%
  arrange(fish_id, time_period_posix) %>%
  group_by(fish_id) %>%
  summarise(
    track = sf::st_cast(sf::st_union(geometry), "LINESTRING")
  )
}

Usage

Arguments

Value

Details

See also

Examples