Skip to contents

Generates random points based on grid cell usage from track data. Points are sampled with probability proportional to the number of track points in each grid cell, providing a usage-weighted spatial sampling approach.

Usage

sample_points_from_track_grid(
  track_data,
  grid_resolution = 100,
  n_points = 100,
  by_fish = TRUE,
  by_time_period = TRUE,
  time_aggregation = "day",
  fish_select = NULL,
  time_select = NULL,
  min_count_threshold = 1,
  max_count_threshold = Inf,
  seed = NULL,
  by_group = TRUE,
  crs = NULL,
  reference_raster = NULL
)

Arguments

track_data

Data frame with fish tracks containing columns: fish_id/path_id, datetime, x, y. Can be from fish_simulation$tracks or similar track data.

grid_resolution

Numeric. Grid cell size in map units (typically meters). Default is 100 meters.

n_points

Integer. Number of points to sample per fish-time combination. Default is 100.

by_fish

Logical. Whether to group by fish_id. Default is TRUE.

by_time_period

Logical. Whether to group by time periods. Default is TRUE.

time_aggregation

Character. How to aggregate time periods: "hour", "day", "month", or "none". Default is "day".

fish_select

Integer, character vector, or NULL. Fish ID(s) to sample from. If NULL, samples from all fish. Default is NULL.

time_select

Character vector, POSIXct vector, or NULL. Time period(s) to sample from. If NULL, samples from all time periods. Default is NULL.

min_count_threshold

Numeric. Minimum count threshold. Only cells with count above this threshold are eligible for sampling. Default is 1 to exclude empty cells.

max_count_threshold

Numeric. Maximum count threshold. Only cells with count below this threshold are eligible for sampling. Default is Inf (no upper limit).

seed

Integer. Random seed for reproducible sampling. Default is NULL.

by_group

Logical. If TRUE, samples n_points for each fish-time combination. If FALSE, samples n_points total distributed across all combinations. Default is TRUE.

crs

Coordinate reference system for the output sf object. Can be:

  • NULL (default) - attempts to detect from input data or uses WGS84

  • Numeric EPSG code (e.g., 4326 for WGS84, 32618 for UTM Zone 18N)

  • Character proj4 string

  • An sf/sfc object from which to extract CRS

reference_raster

Optional raster object to use for defining grid cells. If provided, uses actual raster cell boundaries instead of arbitrary grid.

Value

An sf object containing the sampled points with columns:

fish_id

Fish identifier

time_period_label

Human-readable time period label (if by_time_period = TRUE)

time_period_posix

POSIXct datetime for the time period (if available)

x

X coordinates (grid cell centers)

y

Y coordinates (grid cell centers)

count

The count value used for sampling weights

sample_id

Sequential sample identifier

group_id

Unique identifier for each fish-time combination

geometry

sf point geometry

Details

This function creates a grid overlay on track data, counts the number of track points in each grid cell, then samples points with probability proportional to usage intensity. The process:

  1. Aggregates track data by time periods if requested

  2. Creates regular grid cells based on grid_resolution

  3. Counts track points in each cell for each fish-time combination

  4. Samples points weighted by track point counts

  5. Returns usage-weighted spatial sampling

This is useful for:

  • Creating habitat models weighted by usage intensity

  • Sampling environmental variables proportionate to space use

  • Generating representative locations for resource selection analysis

  • Monte Carlo analysis of habitat preferences

  • Comparing used vs available habitat

Examples

if (FALSE) {
# Basic usage-weighted sampling
usage_points <- sample_points_from_track_grid(
  track_data = fish_simulation$tracks,
  grid_resolution = 100,
  n_points = 500,
  seed = 123
)

# Sample from specific fish and time periods
selected_usage <- sample_points_from_track_grid(
  track_data = fish_simulation$tracks,
  grid_resolution = 50,  # Finer resolution
  n_points = 200,
  fish_select = c(1, 2, 3),
  time_select = c("2025-07-15", "2025-07-16"),
  min_count_threshold = 3  # Only heavily used cells
)

# Sample without time grouping
overall_usage <- sample_points_from_track_grid(
  track_data = fish_simulation$tracks,
  by_time_period = FALSE,
  n_points = 1000
)

# Use with reference raster for consistent grid
raster_usage <- sample_points_from_track_grid(
  track_data = fish_simulation$tracks,
  reference_raster = depth_raster,
  n_points = 300,
  crs = 32617
)

# Plot usage-weighted sampling
library(ggplot2)
ggplot() +
  geom_sf(data = usage_points, aes(color = count, size = count), 
          alpha = 0.6) +
  scale_color_viridis_c(name = "Track\nPoints", trans = "sqrt") +
  scale_size_continuous(name = "Track\nPoints", range = c(0.5, 3), trans = "sqrt") +
  facet_wrap(~fish_id + time_period_label) +
  theme_minimal() +
  labs(title = "Usage-Weighted Spatial Sampling")

# Compare usage intensity across fish
usage_summary <- usage_points %>%
  st_drop_geometry() %>%
  group_by(fish_id) %>%
  summarise(
    mean_usage = mean(count),
    max_usage = max(count),
    total_samples = n(),
    .groups = 'drop'
  )
}