Skip to contents

Generates random points based on probability surfaces from fish positioning results. Points are sampled with probability proportional to the specified probability column, allowing for Monte Carlo sampling from position estimates. Can process multiple fish and time periods simultaneously.

Usage

sample_points_from_probabilities(
  positioning_results,
  prob_column = "weighted_mean_DE_normalized_scaled",
  n_points = 100,
  fish_select = NULL,
  time_select = NULL,
  min_prob_threshold = 0.001,
  max_prob_threshold = 1,
  seed = NULL,
  by_group = TRUE,
  crs = NULL
)

Arguments

positioning_results

A list returned by calculate_fish_positions containing position probabilities and summary statistics.

prob_column

Character. Name of the probability column to use for sampling. Default is "weighted_mean_DE_normalized_scaled". Available options include:

  • weighted_mean_DE_normalized - Raw detection probability

  • weighted_mean_DE_normalized_scaled - Scaled detection probability (0-1)

  • non_det_DE_normalized - Non-detection probability

  • non_det_DE_normalized_scaled - Scaled non-detection probability

  • integrated_prob - Integrated position probability

n_points

Integer. Number of points to sample per fish-time combination. Default is 100.

fish_select

Integer, character vector, or NULL. Fish ID(s) to sample from. If NULL, samples from all fish. Default is NULL.

time_select

Numeric vector, character vector, POSIXct vector, or NULL. Time period(s) to sample from. If NULL, samples from all time periods. Default is NULL.

min_prob_threshold

Numeric. Minimum probability threshold (0-1). Only cells with probability above this threshold are eligible for sampling. Default is 0.001 to exclude zero-probability cells.

max_prob_threshold

Numeric. Maximum probability threshold (0-1). Only cells with probability below this threshold are eligible for sampling. Default is 1.0 (no upper limit). Set to lower values (e.g., 0.05) to exclude high-probability cells and focus on uncertainty regions.

seed

Integer. Random seed for reproducible sampling. Default is NULL.

by_group

Logical. If TRUE, samples n_points for each fish-time combination. If FALSE, samples n_points total distributed across all combinations. Default is TRUE.

crs

Coordinate reference system for the output sf object. Can be:

  • NULL (default) - uses CRS from positioning_results

  • Numeric EPSG code (e.g., 4326 for WGS84, 32618 for UTM Zone 18N)

  • Character proj4 string

  • An sf/sfc object from which to extract CRS

Value

An sf object containing the sampled points with columns:

fish_id

Fish identifier

time_period

Time period identifier

time_period_posix

POSIXct datetime for the time period

time_period_label

Human-readable time period label

x

X coordinates

y

Y coordinates

probability

The probability value used for sampling

sample_id

Sequential sample identifier

group_id

Unique identifier for each fish-time combination

geometry

sf point geometry

Details

This function performs weighted random sampling where each spatial cell has a probability of being selected proportional to its probability value in the specified column. This is useful for:

  • Monte Carlo analysis of position uncertainty

  • Generating representative sample locations for further analysis

  • Creating random tracks based on position probability surfaces

  • Uncertainty propagation in downstream analyses

The sampling process:

  1. Filters data by fish_select and time_select if specified

  2. Removes cells below min_prob_threshold

  3. Normalizes probabilities to sum to 1 within each time period

  4. Performs weighted sampling without replacement (or with if n_points > available cells)

Examples

if (FALSE) {
# Calculate fish positions first
positioning_results <- calculate_fish_positions(station_detections, distances, stations)

# Sample 100 points per fish-time combination for all data
sampled_points <- sample_points_from_probabilities(
  positioning_results,
  prob_column = "weighted_mean_DE_normalized_scaled",
  n_points = 100,
  seed = 123
)

# Sample from multiple specific fish and time periods
multi_samples <- sample_points_from_probabilities(
  positioning_results,
  prob_column = "integrated_prob",
  n_points = 50,
  fish_select = c(1, 2, 3),
  time_select = c("2025-07-15", "2025-07-16", "2025-07-17"),
  by_group = TRUE  # 50 points per fish-time combination
)

# Sample 1000 points total distributed across all groups
distributed_samples <- sample_points_from_probabilities(
  positioning_results,
  n_points = 1000,
  by_group = FALSE  # Distribute 1000 points across all combinations
)

# Sample only from low-probability uncertainty regions
uncertainty_samples <- sample_points_from_probabilities(
  positioning_results,
  prob_column = "integrated_prob",
  n_points = 200,
  min_prob_threshold = 0.001,
  max_prob_threshold = 0.05,  # Exclude high-probability areas
  seed = 456
)

# Sample with specific CRS (UTM Zone 18N)
utm_samples <- sample_points_from_probabilities(
  positioning_results,
  n_points = 100,
  crs = 32618  # EPSG code for UTM Zone 18N
)

# Plot sampled points colored by time
library(ggplot2)
ggplot() +
  geom_sf(data = multi_samples, aes(color = time_period_posix)) +
  scale_color_viridis_c() +
  facet_wrap(~fish_id) +
  theme_minimal() +
  labs(title = "Sampled Position Points by Fish and Time")

# Temporal centroid analysis
library(dplyr)
temporal_centroids <- multi_samples %>%
  group_by(fish_id, time_period_posix, time_period_label) %>%
  summarise(
    n_samples = n(),
    mean_prob = mean(probability),
    centroid = sf::st_union(geometry) %>% sf::st_centroid(),
    .groups = 'drop'
  ) %>%
  sf::st_as_sf()

# Create animated track from sampled points
track_samples <- multi_samples %>%
  arrange(fish_id, time_period_posix) %>%
  group_by(fish_id) %>%
  summarise(
    track = sf::st_cast(sf::st_union(geometry), "LINESTRING")
  )
}