
Sample points from track data grid cell usage
Source:R/sample_points_from_track_grid.R
sample_points_from_track_grid.Rd
Generates random points based on grid cell usage from track data. Points are sampled with probability proportional to the number of track points in each grid cell, providing a usage-weighted spatial sampling approach.
Usage
sample_points_from_track_grid(
track_data,
grid_resolution = 100,
n_points = 100,
by_fish = TRUE,
by_time_period = TRUE,
time_aggregation = "day",
fish_select = NULL,
time_select = NULL,
min_count_threshold = 1,
max_count_threshold = Inf,
seed = NULL,
by_group = TRUE,
crs = NULL,
reference_raster = NULL
)
Arguments
- track_data
Data frame with fish tracks containing columns: fish_id/path_id, datetime, x, y. Can be from fish_simulation$tracks or similar track data.
- grid_resolution
Numeric. Grid cell size in map units (typically meters). Default is 100 meters.
- n_points
Integer. Number of points to sample per fish-time combination. Default is 100.
- by_fish
Logical. Whether to group by fish_id. Default is TRUE.
- by_time_period
Logical. Whether to group by time periods. Default is TRUE.
- time_aggregation
Character. How to aggregate time periods: "hour", "day", "month", or "none". Default is "day".
- fish_select
Integer, character vector, or NULL. Fish ID(s) to sample from. If NULL, samples from all fish. Default is NULL.
- time_select
Character vector, POSIXct vector, or NULL. Time period(s) to sample from. If NULL, samples from all time periods. Default is NULL.
- min_count_threshold
Numeric. Minimum count threshold. Only cells with count above this threshold are eligible for sampling. Default is 1 to exclude empty cells.
- max_count_threshold
Numeric. Maximum count threshold. Only cells with count below this threshold are eligible for sampling. Default is Inf (no upper limit).
- seed
Integer. Random seed for reproducible sampling. Default is NULL.
- by_group
Logical. If TRUE, samples n_points for each fish-time combination. If FALSE, samples n_points total distributed across all combinations. Default is TRUE.
- crs
Coordinate reference system for the output sf object. Can be:
NULL (default) - attempts to detect from input data or uses WGS84
Numeric EPSG code (e.g., 4326 for WGS84, 32618 for UTM Zone 18N)
Character proj4 string
An sf/sfc object from which to extract CRS
- reference_raster
Optional raster object to use for defining grid cells. If provided, uses actual raster cell boundaries instead of arbitrary grid.
Value
An sf object containing the sampled points with columns:
- fish_id
Fish identifier
- time_period_label
Human-readable time period label (if by_time_period = TRUE)
- time_period_posix
POSIXct datetime for the time period (if available)
- x
X coordinates (grid cell centers)
- y
Y coordinates (grid cell centers)
- count
The count value used for sampling weights
- sample_id
Sequential sample identifier
- group_id
Unique identifier for each fish-time combination
- geometry
sf point geometry
Details
This function creates a grid overlay on track data, counts the number of track points in each grid cell, then samples points with probability proportional to usage intensity. The process:
Aggregates track data by time periods if requested
Creates regular grid cells based on grid_resolution
Counts track points in each cell for each fish-time combination
Samples points weighted by track point counts
Returns usage-weighted spatial sampling
This is useful for:
Creating habitat models weighted by usage intensity
Sampling environmental variables proportionate to space use
Generating representative locations for resource selection analysis
Monte Carlo analysis of habitat preferences
Comparing used vs available habitat
Examples
if (FALSE) {
# Basic usage-weighted sampling
usage_points <- sample_points_from_track_grid(
track_data = fish_simulation$tracks,
grid_resolution = 100,
n_points = 500,
seed = 123
)
# Sample from specific fish and time periods
selected_usage <- sample_points_from_track_grid(
track_data = fish_simulation$tracks,
grid_resolution = 50, # Finer resolution
n_points = 200,
fish_select = c(1, 2, 3),
time_select = c("2025-07-15", "2025-07-16"),
min_count_threshold = 3 # Only heavily used cells
)
# Sample without time grouping
overall_usage <- sample_points_from_track_grid(
track_data = fish_simulation$tracks,
by_time_period = FALSE,
n_points = 1000
)
# Use with reference raster for consistent grid
raster_usage <- sample_points_from_track_grid(
track_data = fish_simulation$tracks,
reference_raster = depth_raster,
n_points = 300,
crs = 32617
)
# Plot usage-weighted sampling
library(ggplot2)
ggplot() +
geom_sf(data = usage_points, aes(color = count, size = count),
alpha = 0.6) +
scale_color_viridis_c(name = "Track\nPoints", trans = "sqrt") +
scale_size_continuous(name = "Track\nPoints", range = c(0.5, 3), trans = "sqrt") +
facet_wrap(~fish_id + time_period_label) +
theme_minimal() +
labs(title = "Usage-Weighted Spatial Sampling")
# Compare usage intensity across fish
usage_summary <- usage_points %>%
st_drop_geometry() %>%
group_by(fish_id) %>%
summarise(
mean_usage = mean(count),
max_usage = max(count),
total_samples = n(),
.groups = 'drop'
)
}