Sample points from track data grid cell usage

Generates random points based on grid cell usage from track data. Points are sampled with probability proportional to the number of track points in each grid cell, providing a usage-weighted spatial sampling approach.

Usage

sample_points_from_track_grid(
  track_data,
  grid_resolution = 100,
  n_points = 100,
  by_fish = TRUE,
  by_time_period = TRUE,
  time_aggregation = "day",
  fish_select = NULL,
  time_select = NULL,
  min_count_threshold = 1,
  max_count_threshold = Inf,
  seed = NULL,
  by_group = TRUE,
  crs = NULL,
  reference_raster = NULL
)

Arguments

track_data

Data frame with fish tracks containing columns: fish_id/path_id, datetime, x, y. Can be from fish_simulation$tracks or similar track data.

grid_resolution

Numeric. Grid cell size in map units (typically meters). Default is 100 meters.

n_points

Integer. Number of points to sample per fish-time combination. Default is 100.

by_fish

Logical. Whether to group by fish_id. Default is TRUE.

by_time_period

Logical. Whether to group by time periods. Default is TRUE.

time_aggregation

Character. How to aggregate time periods: "hour", "day", "month", or "none". Default is "day".

fish_select

Integer, character vector, or NULL. Fish ID(s) to sample from. If NULL, samples from all fish. Default is NULL.

time_select

Character vector, POSIXct vector, or NULL. Time period(s) to sample from. If NULL, samples from all time periods. Default is NULL.

min_count_threshold

Numeric. Minimum count threshold. Only cells with count above this threshold are eligible for sampling. Default is 1 to exclude empty cells.

max_count_threshold

Numeric. Maximum count threshold. Only cells with count below this threshold are eligible for sampling. Default is Inf (no upper limit).

seed

Integer. Random seed for reproducible sampling. Default is NULL.

by_group

Logical. If TRUE, samples n_points for each fish-time combination. If FALSE, samples n_points total distributed across all combinations. Default is TRUE.

crs

Coordinate reference system for the output sf object. Can be:

NULL (default) - attempts to detect from input data or uses WGS84
Numeric EPSG code (e.g., 4326 for WGS84, 32618 for UTM Zone 18N)
Character proj4 string
An sf/sfc object from which to extract CRS

reference_raster

Optional raster object to use for defining grid cells. If provided, uses actual raster cell boundaries instead of arbitrary grid.

Value

An sf object containing the sampled points with columns:

fish_id: Fish identifier
time_period_label: Human-readable time period label (if by_time_period = TRUE)
time_period_posix: POSIXct datetime for the time period (if available)
x: X coordinates (grid cell centers)
y: Y coordinates (grid cell centers)
count: The count value used for sampling weights
sample_id: Sequential sample identifier
group_id: Unique identifier for each fish-time combination
geometry: sf point geometry

Details

This function creates a grid overlay on track data, counts the number of track points in each grid cell, then samples points with probability proportional to usage intensity. The process:

Aggregates track data by time periods if requested
Creates regular grid cells based on grid_resolution
Counts track points in each cell for each fish-time combination
Samples points weighted by track point counts
Returns usage-weighted spatial sampling

This is useful for:

Creating habitat models weighted by usage intensity
Sampling environmental variables proportionate to space use
Generating representative locations for resource selection analysis
Monte Carlo analysis of habitat preferences
Comparing used vs available habitat

Examples

if (FALSE) {
# Basic usage-weighted sampling
usage_points <- sample_points_from_track_grid(
  track_data = fish_simulation$tracks,
  grid_resolution = 100,
  n_points = 500,
  seed = 123
)

# Sample from specific fish and time periods
selected_usage <- sample_points_from_track_grid(
  track_data = fish_simulation$tracks,
  grid_resolution = 50,  # Finer resolution
  n_points = 200,
  fish_select = c(1, 2, 3),
  time_select = c("2025-07-15", "2025-07-16"),
  min_count_threshold = 3  # Only heavily used cells
)

# Sample without time grouping
overall_usage <- sample_points_from_track_grid(
  track_data = fish_simulation$tracks,
  by_time_period = FALSE,
  n_points = 1000
)

# Use with reference raster for consistent grid
raster_usage <- sample_points_from_track_grid(
  track_data = fish_simulation$tracks,
  reference_raster = depth_raster,
  n_points = 300,
  crs = 32617
)

# Plot usage-weighted sampling
library(ggplot2)
ggplot() +
  geom_sf(data = usage_points, aes(color = count, size = count), 
          alpha = 0.6) +
  scale_color_viridis_c(name = "Track\nPoints", trans = "sqrt") +
  scale_size_continuous(name = "Track\nPoints", range = c(0.5, 3), trans = "sqrt") +
  facet_wrap(~fish_id + time_period_label) +
  theme_minimal() +
  labs(title = "Usage-Weighted Spatial Sampling")

# Compare usage intensity across fish
usage_summary <- usage_points %>%
  st_drop_geometry() %>%
  group_by(fish_id) %>%
  summarise(
    mean_usage = mean(count),
    max_usage = max(count),
    total_samples = n(),
    .groups = 'drop'
  )
}

Usage

Arguments

Value

Details

See also

Examples