This file provides context for AI coding assistants (GitHub Copilot, Gemini, Claude, Cursor, etc.) about admiral ecosystem standards and best practices.
Auto-generated – see commit history for the last update date.
Sources: programming_strategy.Rmd, git_usage.Rmd, rcmd_issues.Rmd
Update workflow: create-agents-md.yml (callable from any admiral ecosystem repo)
Verification: If you have read this file, include the comment
# admiral guidelines loadedon the first line of every new R file
you create in this repository. This confirms you are following admiral-specific conventions.
When working insidetests/testthat/, also readtests/testthat/AGENTS.mdand
include the comment# admiral test guidelines loadedat the top of every new test file.
Source: https://pharmaverse.github.io/admiraldev/articles/programming_strategy.html
title: "Programming Strategy" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Programming Strategy} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
As {admiral} is intended to be contributed by the user community, this
article is meant for developers that want to either expand {admiral} functionalities or build on top of {admiral}.
In order to keep the framework robust across the whole community,
we have defined a programming strategy that should be followed in such cases.
These contributions could include, for example, company specific derivations of ADaM datasets.
- Overall programming will follow a functional approach.
- We mandate the use of tidyverse (e.g. dplyr) over similar functionality existing in base R
- Each ADaM dataset is built with a set of functions and not with free flow code.
- Each ADaM dataset has a specific programming workflow.
- Each function has a specific purpose that supports the ADaM Dataset programming workflow. It could be an
{admiral}function or a company specific function. - Admiral functions can be re-used for company specific functions.
- Each function belongs to one category defined in keywords/family.
- Each function that is used to derive one or multiple variable(s) is required to be unit tested.
- Functions have a standard naming convention.
- Double coding is not used as a QC method (only if absolutely necessary).
- ADaMs are created with readable, submission-ready code.
Firstly, it is important to explain how we decide on the need for new derivation functions.
If a derivation rule or algorithm is common and highly similar across different variables/parameters (e.g. study day or duration) then we would provide a generic function that can be used to satisfy all the times this may be needed across different ADaMs. Similarly, if we feel that a certain derivation could be useful beyond a single purpose we also would provide a generic function (e.g. instead of a last known alive date function, we have an extreme date function where a user could find the last date from a selection, or for example the first).
Otherwise, if we feel that a derivation rule is a unique need or sufficiently complex to justify then we opt for a dedicated function for that specific variable/parameter (e.g. treatment-emergent flag for AEs).
If certain variables are closely connected (e.g. an imputed date and the corresponding imputation flag) then a single function would provide both variables.
If something needed for ADaM could be achieved simply via an existing tidyverse function, then we do not wrap this into an admiral function, as that would add an unnecessary extra layer for users.
The following principles are key when designing a new function:
-
Modularity - All code follows a modular approach, i.e. the steps must be clearly separated and have a dedicated purpose. This applies to scripts creating a dataset where each module should create a single variable or parameter. But also to complex derivations with several steps. Commenting on these steps is key for readability.
-
Avoid Copy and Paste - If the same or very similar code is used multiple times, it should be put into a separate function. This improves readability and maintainability and makes unit testing easier. This should not be done for every simple programming step where tidyverse can be used. But rather for computational functions or data checks. However, also consider not to nest too many functions.
-
Checks - Whenever a function fails, a meaningful error message must be provided with a clear reference to the input which caused the failure. A user should not have to dig into detailed code if they only want to apply a function. A meaningful error message supports usability.
-
Flexibility - Functions should be as flexible as possible as long as it does not reduce the usability. For example:
-
The source variables or newly created variables and conditions for selecting observations should not be hard-coded.
-
It is useful if an argument triggers optional steps, e.g. if the
filterargument is specified, the input dataset is restricted, otherwise this step is skipped. -
However, arguments should not trigger completely different algorithms. For example
BNRINDcould be derived based onBASEor based onANRIND. It should not be implemented within one function as the algorithms are completely different. IfBASEis used, the values are categorized while ifANRINDis used, the values are merged from the baseline observation.
-
- The behavior of the function is only determined by its input, not by any global object,
i.e. all input like datasets, variable names, options, … must be provided to the function by arguments. - It is expected that the input datasets are not grouped. If any are grouped, the function must issue an error.
- If a function requires grouping, the function must provide the
by_varsargument. - The output dataset must be ungrouped.
- The functions should not sort (arrange) the output dataset at the end.
- If the function needs to create temporary variables in an input dataset, names
for these variables must be generated by
get_new_tmp_var()to avoid that variables of the input dataset are accidentally overwritten. The temporary variables must be removed from the output dataset by callingremove_tmp_vars(). - If developers find the need to use or create environment objects to achieve flexibility, use the
admiral_environmentenvironment object created inadmiral_environment.R. All objects which are stored in this environment must be documented inadmiral_environment.R. An equivalent environment object and.Rfile exist for admiraldev as well. For more details how environments work, see relevant sections on environments in R Packages and Advanced R textbooks. - In general, the function must not have any side-effects like creating or modifying global objects, printing, writing files, ...
- An exception is made for admiral options, see
get_admiral_option()andset_admiral_options(), where we have certain pre-defined defaults with added flexibility to allow for user-defined defaults on commonly used function arguments e.g.subject_keyscurrently pre-defined asexprs(STUDYID, USUBJID), but can be modified usingset_admiral_options(subject_keys = exprs(...))at the top of a script. The reasoning behind this was to relieve the user of repeatedly changing aforementioned commonly used function arguments multiple times in a script, which may be called across many admiral functions. - If this additional flexibility needs to be added for another commonly used
function argument e.g.
future_inputto be set asexprs(...)it can be added as an admiral option. In the function formals definefuture_input = get_admiral_option("future_input")then proceed to modify the body and roxygen documentation ofset_admiral_options().
- Function names should start with a verb and use snake case, e.g.
derive_var_base().
| Function name prefix | Description |
|---|---|
assert_ / warn_ / is_ |
Functions that check other functions' inputs |
derive_ |
Functions that take a dataset as input and return a new dataset with additional rows and/or columns |
derive_var_ (e.g. derive_var_trtdurd) |
Functions which add a single variable |
derive_vars_ (e.g. derive_vars_dt) |
Functions which add multiple variables |
derive_param_ (e.g. derive_param_os) |
Functions which add a single parameter |
compute_ / calculate_ / ... |
Functions that take vectors as input and return a vector |
create_ / consolidate_ |
Functions that create datasets without keeping the original observations |
get_ |
Usually utility functions that return very specific objects that get passed through other functions |
filter_ |
Functions that filter observations based on conditions associated with common clinical trial syntax |
| Function Name Suffix | Description |
|---|---|
_derivation (suffix) |
High order functions that call a user specified derivation |
_date / _time / _dt / _dtc / _dtm |
Functions associated with dates, times, datetimes, and their character equivalents. |
_source |
Functions that create source datasets that usually will be passed through other derive_ functions. |
| Other Common Function Name Terms | Description |
|---|---|
_merged_ / _joined_ / _extreme_ |
Functions that follow the generic function user-guide. |
Please note that the appropriate var/vars prefix should be used for all cases in which the function creates any variable(s), regardless of the presence of a new_var argument in the function call.
Oftentimes when creating a new derive_var or derive_param function there may be some sort of non-trivial calculation involved that you may want to write a customized function for. This is when creating a compute_ function becomes appropriate, such that the calculation portion is contained in one step as part of the overall derive_ function, reducing clutter in the main function body and assisting in debugging. In addition, a compute_ function should be implemented if the calculation could be used for more than one derivation. For example compute_bmi() could be used to derive a baseline BMI variable in ADSL (based on baseline weight and baseline height variables) and could also be used to derive a BMI parameter in ADVS (based on weight and height parameters). Please see compute_age_years() and derive_var_age_years() as another example.
The default value of optional arguments should be NULL.
There is a recommended argument order that all contributors are asked to adhere to (in order to keep consistency across functions):
dataset(and any additional datasets denoted bydataset_*)by_varsordernew_var(and any relatednew_var_*arguments)filter(and any additional filters denoted byfilter_*)- all additional arguments:
- Make sure to always mention
start_datebeforeend_date(or related).
- Make sure to always mention
Names of variables inside a dataset should be passed as symbols rather than
strings, i.e. AVAL rather than "AVAL". If an argument accepts one or more
variables or expressions as input then the variables and expressions should be
wrapped inside exprs().
For example:
new_var = TEMPBLby_vars = exprs(PARAMCD, AVISIT)filter = PARAMCD == "TEMP"order = exprs(AVISIT, desc(AESEV))new_vars = exprs(LDOSE = EXDOSE, LDOSEDT = convert_dtc_to_dt(EXSTDTC))
Each function argument needs to be tested with assert_ type of function.
Each expression needs to be tested for the following
(there are many utility functions in {admiral} available to the contributor):
- whether it is an expression (or a list of expressions, depending on the function)
- whether it is a valid expression (i.e. whether it evaluates without error)
The first argument of derive_ functions should be the input dataset and it
should be named dataset. If more than one input dataset is required, the other
input dataset should start with dataset_, e.g., dataset_ex.
Arguments for specifying items to add should start with new_. If a variable is
added, the second part of the argument name should be var, if a parameter is
added, it should be param. For example: new_var, new_var_unit,
new_param.
Arguments which expect a boolean or boolean vector must start with a verb, e.g.,
is_imputed or impute_date.
Arguments which only expect one value or variable name must be a singular version of the word(s), e.g., missing_value or new_var. Arguments which expect several values or variable names (as a list, expressions, etc.) must be a plural version of the word(s), e.g., missing_values or new_vars.
| Argument Name | Description |
|---|---|
dataset |
The input dataset. Expects a data.frame or a tibble. |
dataset_ref |
The reference dataset, e.g. ADSL. Typically includes just one observation per subject. |
dataset_add |
An additional dataset. Used in some derive_xx and filter_xx functions to access variables from an additional dataset. |
by_vars |
Variables to group by. |
order |
List of expressions for sorting a dataset, e.g., exprs(PARAMCD, AVISITN, desc(AVAL)). |
new_var |
Name of a single variable to be added to the dataset. |
new_vars |
List of variables to be added to the dataset. |
new_var_unit |
Name of the unit variable to be added. It should be the unit of the variable specified for the new_var argument. |
filter |
Expression to filter a dataset, e.g., PARAMCD == "TEMP". |
start_date |
The start date of an event/interval. Expects a date object. |
end_date |
The end date of an event/interval. Expects a date object. |
start_dtc |
(Partial) start date/datetime in ISO 8601 format. |
dtc |
(Partial) date/datetime in ISO 8601 format. |
date |
Date of an event / interval. Expects a date object. |
subject_keys |
Variables to uniquely identify a subject, defaults to exprs(STUDYID, USUBJID). In function formals, use subject_keys = get_admiral_option("subject_keys") |
set_values_to |
List of variable name-value pairs. Use process_set_values_to() for processing the value and providing user friendly error messages. |
keep_source_vars |
Specifies which variables from the selected observations should be kept. The default of the argument should be exprs(everything()). The primary difference between set_values_to and keep_source_vars is that keep_source_vars only selects and retains the variables from a source dataset, so e.g. keep_source_vars = exprs(DOMAIN) would join + keep the DOMAIN variable, whereas set_values_to can make renaming and inline function changes such as set_values_to = exprs(LALVDOM = DOMAIN) |
missing_value |
A singular value to be entered if the data is missing. |
missing_values |
A named list of expressions where the names are variables in the dataset and the values are a value to be entered if the data is missing, e.g., exprs(BASEC = "MISSING", BASE = -1). |
All source code should be formatted according to the tidyverse style guide. The lintr and styler packages are used to check and enforce this.
With regards to lintr, {admiral} and all
related packages should maintain consistent linting standards by ensuring that their
.lintr.R configuration files use the admiral_linters() function. This contains
agreed-upon preferences and conventions such as avoiding the use of stop() and
warning() in favor of cli::abort() and cli::warn().
The admiral_linters() function is stored under inst/lintr/linters.R in {admiraldev}
(so as not to expose it to users) and can be loaded within the .lintr.R configuration
file with source(system.file("lintr/linters.R", package = "admiraldev")). An example
.lintr.R configuration file is shown below:
library(lintr)
source(system.file("lintr/linters.R", package = "admiraldev"))
linters <- admiral_linters()
exclusions <- list(
"R/data.R" = Inf,
"inst" = list(undesirable_function_linter = Inf),
"vignettes" = list(undesirable_function_linter = Inf)
)
If there is a good case to be made for altering any of the default configurations,
this can be done by passing arguments to admiral_linters(), for instance:
admiral_linters(
line_length = line_length_linter(80), # (dropped down to 80 from the default 100)
object_name_linter = object_name_linter( # (activated the object name linter)
styles = c("snake_case", "symbols", "SNAKE_CASE"),
regexes = c("^pl__.*", "^_.*")
)
)
Comments should be added to help other readers than the author to understand the code. There are two main cases:
-
If the intention of a chunk of code is not clear, a comment should be added. The comment should not rephrase the code but provide additional information.
Bad
# If AVAL equals zero, set it to 0.0001. Otherwise, do not change it mutate(dataset, AVAL = if_else(AVAL == 0, 0.0001, AVAL))Good
# AVAL is to be displayed on a logarithmic scale. # Thus replace zeros by a small value to avoid gaps. mutate(dataset, AVAL = if_else(AVAL == 0, 0.0001, AVAL)) -
For long functions (>100 lines) comments can be added to structure the code and simplify navigation. In this case the comment should end with
----to add an entry to the document outline in RStudio. For example:# Check arguments ----
The formatting of the comments must follow the
tidyverse style guide. I.e.,
the comment should start with a single # and a space. No decoration (except
for outline entries) must be added.
Bad
# This is a comment #
###########################
# This is another comment #
###########################
#+++++++++++++++++++++++++++++++
# This is a section comment ----
#+++++++++++++++++++++++++++++++
Good
# This is a comment
# This is another comment
# This is a section comment ----
In line with the fail-fast design principle, function inputs should be checked for validity and, if there's an invalid input, the function should stop immediately with an error. An exception is the case where a variable to be added by a function already exists in the input dataset: here only a warning should be displayed and the function should continue executing.
Inputs should be checked using custom assertion functions defined in R/assertions.R.
These custom assertion functions should either return an error in case of an invalid input or return nothing.
For the most common types of input arguments like a single variable, a list of variables, a dataset, ... functions for checking are available (see assertions).
Arguments which expect keywords should handle them in a case-insensitive manner,
e.g., both date_imputation = "FIRST" and date_imputation = "first" should be
accepted. The assert_character_scalar() function helps with handling arguments
in a case-insensitive manner.
A argument should not be checked in an outer function if the argument name is the same as in the inner function.
This rule is applicable only if both functions are part of {admiral}.
Every function that is exported from the package must have an accompanying
header that should be formatted according to the
roxygen2 convention. We have also implemented a
custom roclet to enhance our documentation and examples for more complex
functions - see rdx_roclet(), the example function demo_fun(), and further
down for more details.
In addition to the standard roxygen2 tags, the @family and @keywords tags are also used.
The family/keywords are used to categorize the function, which is used both on our website and the internal package help pages. Please see section Categorization of functions.
An example is given below:
#' Derive Relative Day Variables
#'
#' Adds relative day variables (`--DY`) to the dataset, e.g., `ASTDY` and
#' `AENDY`.
#'
#' @param dataset Input dataset
#'
#' The columns specified by the `reference_date` and the `source_vars`
#' argument are expected.
#'
#' @permitted A dataset
#'
#' @param reference_date The start date column, e.g., date of first treatment
#'
#' A date or date-time object column is expected.
#'
#' Refer to `derive_var_dt()` to impute and derive a date from a date
#' character vector to a date object.
#'
#' @permitted An unquoted variable name of the input dataset
#'
#' @param source_vars A list of datetime or date variables created using
#' `exprs()` from which dates are to be extracted. This can either be a list of
#' date(time) variables or named `--DY` variables and corresponding --DT(M)
#' variables e.g. `exprs(TRTSDTM, ASTDTM, AENDT)` or `exprs(TRTSDT, ASTDTM,
#' AENDT, DEATHDY = DTHDT)`. If the source variable does not end in --DT(M), a
#' name for the resulting `--DY` variable must be provided.
#'
#' @permitted [var_list]
#'
#' @details The relative day is derived as number of days from the reference
#' date to the end date. If it is nonnegative, one is added. I.e., the
#' relative day of the reference date is 1. Unless a name is explicitly
#' specified, the name of the resulting relative day variable is generated
#' from the source variable name by replacing DT (or DTM as appropriate) with
#' DY.
#'
#' @returns The input dataset with `--DY` corresponding to the `--DTM` or `--DT`
#' source variable(s) added
#'
#' @keywords der_date_time
#' @family der_date_time
#'
#' @export
#'
#' @examples
#' library(lubridate)
#' library(dplyr, warn.conflicts = FALSE)
#'
#' datain <- tribble(
#' ~TRTSDTM, ~ASTDTM, ~AENDT,
#' "2014-01-17T23:59:59", "2014-01-18T13:09:O9", "2014-01-20"
#' ) %>%
#' mutate(
#' TRTSDTM = as_datetime(TRTSDTM),
#' ASTDTM = as_datetime(ASTDTM),
#' AENDT = ymd(AENDT)
#' )
#'
#' derive_vars_dy(
#' datain,
#' reference_date = TRTSDTM,
#' source_vars = exprs(TRTSDTM, ASTDTM, AENDT)
#' )
The following fields are mandatory:
-
@param: One entry per function argument. The following attributes should be described: expected data type (e.g.data.frame,logical,numericetc.), permitted values (if applicable), optionality (i.e. is this a required argument). If the expected input is a dataset then the required variables should be clearly stated. Describing the default value becomes difficult to maintain and subject to manual error when it is already declared in the function arguments. For the description of the permitted values the (custom)@permittedtag should be used (seerdx_roclet()for more details). -
@details: A natural-language description of the derivation used inside the function. -
@keyword: One applicable tag to the function - identical to family. -
@family: One applicable tag to the function - identical to keyword. -
@returns: A description of the return value of the function. Any newly added variable(-s) should be mentioned here. -
@examplesor@caption,@info,@code: Fully self-contained examples of how to use the function. Self-contained means that, if this code is executed in a new R session, it will run without errors. That means any packages need to be loaded withlibrary()and any datasets needed either to be created directly inside the example code or loaded usingpkg_name::dataset_name, e.g.,adsl <- admiral::admiral_adsl. If a dataset is created in the example, it should be done so using the functiontribble()(specifylibrary(dplyr)before calling this function). Make sure to align columns as this ensures quick code readability. If other functions are called in the example, please specifylibrary(pkg_name)then refer to the respective functionfun()as opposed to the preferredpkg_name::fun()notation as specified in Unit Test Guidance.The
@examplestag should be used for simple functions which require only a few examples and no explanation. For more complex functions, the (custom)@caption,@info, and@codetags should be used. Please see the separate vignette on Writing Custom Examples for detailed guidance on how these are constructed, andderive_extreme_records.Rin admiral for an example of this in action.
Copying descriptions should be avoided as it makes the documentation hard to
maintain. For example if the same argument with the same description is used by
more than one function, the argument should be described for one function and
the other functions should use @inheritParams <function name where the argument is described>.
Please note that if @inheritParams func_first is used in the header of the
func_second() function, those argument descriptions of func_first() are
included in the documentation of func_second() for which
- the argument is offered by
func_second()and - no
@paramtag for the argument is included in the header offunc_second().
The order of the @param tags should be the same as in the function definition.
The @inheritParams tags should be after the @param. This does not affect the
order of the argument description in the rendered documentation but makes it
easier to maintain the headers.
Variable names, expressions, functions, and any other code must be enclosed in backticks. This will render it as code.
For functions which derive a specific CDISC variable, the title must state the label of the variable without the variable name. The variable should be stated in the description.
To avoid confusion the term "parameter" should be used for CDISC parameters only. For function arguments the term "argument" should be used.
The functions are categorized by keywords and families within the roxygen header. Categorization is important
as the admiral user-facing functions base totals above 125 and is growing! However, to ease the burden for developers, we have decided that
the keywords and families should be identical in the roxygen header, which are specified via the @keywords and @family fields.
To reiterate, each function must use the same keyword and family. Also, please note that the keywords and families are case-sensitive.
The keywords allows for the reference page to be easily organized when using certain
pkgdown functions. For example, using the function has_keyword(der_bds_gen) in the _pkgdown.yml file while building
the website will collect all the BDS General Derivation functions and display them in alphabetical order on the Reference Page in a section called
BDS-Specific.
The families allow for similar functions to be displayed in the See Also section of a function's documentation. For example, a user looking at
derive_vars_dy() function documentation might be interested in other Date/Time functions. Using the @family tag der_date_time will display
all the Date/Time functions available in admiral to the user in the See Also section of derive_vars_dy() function documentation. Please take a look at the
function documentation for derive_vars_dy() to see the family tag in action.
Below are the list of available keyword/family tags to be used in admiral functions. If you think an additional keyword/family tag should be added, then please
add an issue in GitHub for discussion.
| Keyword/Family | Description |
|---|---|
com_date_time |
Date/Time Computation Functions that returns a vector |
com_bds_findings |
BDS-Findings Functions that returns a vector |
create_aux |
Functions for Creating Auxiliary Datasets |
datasets |
Example datasets used within admiral |
der_gen |
General Derivation Functions that can be used for any ADaM. |
der_date_time |
Date/Time Derivation Function |
der_bds_gen |
Basic Data Structure (BDS) Functions that can be used across different BDS ADaM (adex, advs, adlb, etc) |
der_bds_findings |
Basic Data Structure (BDS) Functions specific to the BDS-Findings ADaMs |
der_prm_bds_findings |
BDS-Findings Functions for adding Parameters |
der_adsl |
Functions that can only be used for creating ADSL. |
der_tte |
Function used only for creating a Time to Event (TTE) Dataset |
der_occds |
OCCDS specific derivation of helper Functions |
der_prm_tte |
TTE Functions for adding Parameters to TTE Dataset |
deprecated |
Function which will be removed from admiral after next release. See Deprecation Guidance. |
metadata |
Auxiliary datasets providing definitions as input for derivations, e.g. grading criteria or dose frequencies |
utils_ds_chk |
Utilities for Dataset Checking |
utils_fil |
Utilities for Filtering Observations |
utils_fmt |
Utilities for Formatting Observations |
utils_print |
Utilities for Printing Objects in the Console |
utils_help |
Utilities used within Derivation functions |
utils_examples |
Utilities used for examples and template scripts |
source_specifications |
Source Objects |
other_advanced |
Other Advanced Functions |
high_order_function |
Higher Order Functions |
internal |
Internal functions only available to admiral developers |
assertion* |
Asserts a certain type and gives warning, error to user |
warning |
Provides custom warnings to user |
what |
A function that ... |
is |
A function that ... |
get |
A function that ... |
NOTE: It is strongly encouraged that each @keyword and @family are to be identical. This eases the burden of development and maintenance for admiral functions. If you need to use multiple keywords or families, please reach out to the core development team for discussion.
Missing values (NAs) need to be explicitly shown.
Regarding character vectors converted from SAS files: SAS treats missing character values as blank.
Those are imported into R as empty strings ("") although in nature they are missing values (NA).
All empty strings that originate like this need to be converted to proper R missing values NA.
The table below describes the key directories and files in the repository. Understanding this layout helps contributors know where to find and where to place code.
| Directory or File | Purpose |
|---|---|
R/ |
R source files containing package functions. File names reflect their contents (see File Structuring below). |
man/ |
Auto-generated Rd documentation files. Do not edit manually. Run devtools::document() to regenerate. |
tests/testthat/ |
Unit test scripts. Each file follows the naming convention test-<source_file>.R. |
vignettes/ |
Developer-facing guidance vignettes (and any user-facing articles). |
inst/lintr/ |
Linting helpers/configuration used by .lintr.R (e.g., sourced via system.file(...)). |
inst/templates/ |
ADaM R script templates made available to users. |
NAMESPACE |
Auto-generated export/import declarations. Do not edit manually. Run devtools::document(). |
NEWS.md |
Package changelog. Updated with every user-facing change per PR. |
DESCRIPTION |
Package metadata and dependency declarations (Imports, Suggests). |
_pkgdown.yml |
Configuration for the package website built by {pkgdown}. |
Organizing functions into files is more of an art than a science. Thus, there are no hard rules but just recommendations. First and foremost, there are two extremes that should be avoided: putting each function into its own file and putting all functions into a single file. Apart from that the following recommendations should be taken into consideration when deciding upon file structuring:
- If a function is very long (together with its documentation), store it in a separate file
- If some functions are documented together, put them into one file
- If some functions have some sort of commonality or relevance with one another (like
dplyr::bind_rows()anddplyr::bind_cols()), put them into one file - Store functions together with their helpers and methods
- Have no more than 1000 lines in a single file, unless necessary (exceptions are, for example, classes with methods)
It is the responsibility of both the author of a new function and reviewer to ensure that these recommendations are put into practice.
Package dependencies have to be documented in the DESCRIPTION file.
If a package is used only in examples and/or unit tests then it should be listed in Suggests, otherwise in Imports.
Functions from other packages have to be explicitly imported by using the @importFrom tag in the R/admiral-package.R file.
To import the if_else() and mutate() function from dplyr the following line would have to be included in that file:
#' @importFrom dplyr if_else mutate.
By using the @importFrom tag, it is easier to track all of our dependencies in one place and improves code readability.
Some of these functions become critically important while using admiral and
should be included as an export. This applies to functions which are frequently
called within {admiral} function calls like rlang::exprs(), dplyr::desc()
or the pipe operator dplyr::%>%. To export these functions, the following R
code should be included in the R/reexports.R file using the format:
#' @export
pkg_name::fun
Functions should only perform the derivation logic and not add any kind of metadata, e.g. labels.
A function requires a set of unit tests to verify it produces the expected result. See Writing Unit Tests in {admiral} for details.
The below deprecation strategy provides stability to users while allowing admiral developers the ability to remove and update the code base in the coming days.
-
Phase 1: In the release where the identified function or argument is to be deprecated, there will be a message issued when using the function or argument using
deprecate_inform(). This message will appear to the user for at least one year. Templates, vignettes and any internal calls should be updated to use the new recommended function/argument. -
Phase 2: After at least one year and in the closest next release, a warning will be issued when using the function or argument using
deprecate_warn(). This warning message will appear for at least one year. -
Phase 3: After at least one year and in the closest next release, an error will be thrown when using the function or argument using
deprecate_stop()and follow similar process for Phase 1 and Phase 2. -
Phase 4: Finally after three years from the time of being identified for deprecation, the function or argument will be completely removed from
{admiral}.
NB: Major/Minor release make the most sense for deprecation updates. However, if
a release cycle becomes multiple years, then patch releases should be considered to
help keep {admiral} neat and tidy!
NB: Take care with the NEWS.md entries around deprecation as the person continuing this
process might not be you!
If a function or argument is removed, the documentation must be updated to indicate the function or the argument is now deprecated and which new function/argument should be used instead.
The documentation will be updated at Phase 1:
-
the description level for a function will have a lifecycle badge added,
-
the
@keywordsand@familyroxygen tags will be replaced withdeprecated
#' Title of the function
#'
#' @description
#' r lifecycle::badge("deprecated")
#'
#' This function is deprecated, please use new_fun() instead.
#' .
#' @family deprecated
#' @keywords deprecated
Example for documentation at the argument level
+ the `@param` level for a argument.
```{r, eval = FALSE}
#' @param old_param `r lifecycle::badge("deprecated")` Please use `new_param` instead.
The documentation will be further updated at Phase 3:
- the
@examplessection should be removed.
When a function or argument is deprecated, the function must be updated to issue
a message, warning or error using deprecate_inform(), deprecate_warn() or
deprecate_stop(), respectively, as described above.
There should be a test case added in the test file of the function that checks whether this message/warning/error is issued as appropriate when using the deprecated function or argument.
Phase 1: At the start of this phase the call to deprecate_inform() will
appear as:
fun_xxx <- function(dataset, some_param, other_param) {
deprecate_inform(
when = "x.y.z",
what = "fun_xxx()",
with = "new_fun_xxx()",
details = c(
x = "This message will turn into a warning {at the beginning of 20XX}.",
i = "See admiral's deprecation guidance:
https://pharmaverse.github.io/admiraldev/dev/articles/programming_strategy.html#deprecation"
)
)
new_fun_xxx(
dataset = dataset,
some_param = some_param,
other_param = other_param
)
}NB: Please adjust phrase {at the beginning of 20XX} to relevant timeline.
The code of the deprecated function should be replaced with a call to the new function which should be used instead.
Phase 2: At the start of this phase the call to deprecate_warn() will
appear as:
fun_xxx <- function(dataset, some_param, other_param) {
deprecate_warn(
when = "x.y.z",
what = "fun_xxx()",
with = "new_fun_xxx()",
details = c(
x = "This message will turn into a error {at the beginning of 20XX}.",
i = "See admiral's deprecation guidance:
https://pharmaverse.github.io/admiraldev/dev/articles/programming_strategy.html#deprecation"
)
)
new_fun_xxx(
dataset = dataset,
some_param = some_param,
other_param = other_param
)
}NB: Please adjust phrase {at the beginning of 20XX} to relevant timeline.
Phase 3: At the start of this phase the call to deprecate_stop() will
appear as:
fun_xxx <- function(dataset, some_param, other_param) {
deprecate_stop(
when = "x.y.z",
what = "fun_xxx()",
with = "new_fun_xxx()"
)
new_fun_xxx(
dataset = dataset,
some_param = some_param,
other_param = other_param
)
}Phase 4: Function should be removed from the package.
Phase 1: If the argument is renamed or replaced, a message must be issued and the
new argument takes the value of the old argument until the next phase. Note:
arguments which are not passed as exprs() argument (e.g. new_var = VAR1 or
filter = AVAL > 10) will need to be quoted.
if (!missing(old_param)) {
deprecate_inform("x.y.z", "fun_xxx(old_param = )", "fun_xxx(new_param = )")
# old_param is given using exprs()
new_param <- old_param
# old_param is NOT given using exprs()
new_param <- enexpr(old_param)
}
Phase 2: If the argument is renamed or replaced, a warning must be issued and the
new argument takes the value of the old argument until the next phase Note:
arguments which are not passed as exprs() argument (e.g. new_var = VAR1 or
filter = AVAL > 10) will need to be quoted.
if (!missing(old_param)) {
deprecate_warn("x.y.z", "fun_xxx(old_param = )", "fun_xxx(new_param = )")
# old_param is given using exprs()
new_param <- old_param
# old_param is NOT given using exprs()
new_param <- enexpr(old_param)
}
Phase 3: If an argument is removed and is not replaced, an error must be generated:
if (!missing(old_param)) {
deprecate_stop("x.y.z", "fun_xxx(old_param = )", "fun_xxx(new_param = )")
}
Phase 4: All mentions of the argument are completely removed from admiral.
Unit tests for deprecated functions and arguments must be added to the test file 1 of the function to ensure that a message, warning, or error is issued.
The unit-test should follow the corresponding format, per the unit test guidance.
-
Please put tests for deprecation at the top of test file to make finding the specific test easier for next phase of deprecation.
-
Tests that call multiple functions with deprecation messages can be wrapped using parentheses and curly brackets, e.g.
expect_snapshot({}). -
You can use
withr::local_options(list(lifecycle_verbosity = "quiet"))to suppress the deprecation messages in already created tests.## Test 1: deprecation message if function is called ---- test_that("derive_var_example() Test #: deprecation message if function is called", { expect_snapshot({ ae <- date_source(...) ... derive_var_example(...) }) }) ## Test 2: Test of function argument 1 ---- test_that("derive_var_example() Test 2: Test of function argument 1", { withr::local_options(list(lifecycle_verbosity = "quiet")) ... })
### For Deprecated Functions that Issue a Warning (Phase 2)
The snapshot of the deprecation message test must be updated because instead of
a message a warning is issued now.
### For Deprecated Functions that Issue an Error (Phase 3)
A unit test like the following must be added.
test_that("derive_var_example() Test #: deprecation error if function is called", { expect_error( derive_var_example(), class = "lifecycle_error_deprecated" ) })
When writing the unit test, check that the error has the right class, i.e.,
`"lifecycle_error_deprecated"`.
Other unit tests of the deprecated function must be removed.
# Experimental Functions
admiral is stable with its core functions. New functions added to admiral must
be labelled with the lifecycle badge **experimental**.
```{r, eval=FALSE}
#' Title of the function
#'
#' @description
#' `r lifecycle::badge("experimental")`
#'
Experimental functions will be given two release before we remove the badge. No
deprecation messages will be given to the user if breaking changes are implemented
within the two releases cycle. However, admiral will document the breaking change
in the News.md. Once the two release cycles is reached, admiral will remove the
experimental badge and we will proceed with the normal deprecation cycle if needed.
This experimental time period allows for us to test out the function and receive feedback but doesn't burden us with a deprecation cycle.
Please take the following list as recommendation and try to adhere to its rules if possible.
- Arguments in function calls should be named except for the first parameter
(e.g.
assert_data_frame(dataset, required_vars = exprs(var1, var2), optional = TRUE)). dplyr::if_else()should be used when there are only two conditions. Try to always set themissingargument whenever appropriate.
- Some admiral arguments require selecting one particular option like
mode, e.g.mode = "last". Use quotation marks to capture these. The expected assertion function corresponding to these arguments isassert_character_scalar()/assert_character_vector(). - Many admiral arguments require capturing an expression, typically encased in a
exprs()statement, which are to be evaluated later inside the function body, see arguments likenew_vars, e.g.new_vars = exprs(TRTSDTM = EXSTDTM). Oftentimes, the assertion function corresponding to these areassert_expr()/assert_expr_list(). These arguments are unquoted by using!!!. - Some admiral arguments like
new_varorfilterwhich expect a single variable or expression are not quoted in the call. In the function body, it has to be quoted by usingenexpr().Usually this is combined with the assertion, e.g.,new_var <- assert_symbol(enexpr(new_var)). These arguments are unquoted by using!!. - Keep in mind
!!is a one-to-one replacement and!!!is a one-to-many replacement. Please see this chapter in the Advanced R textbook for more details.
In the following PR, you will find an example of how the function argument dataset was able to be standardized such that the Label and Description of said function argument was aligned across the codebase. Please see the changes to the file derive_adeg_params.R for further details.
The benefits of having a programmatic way to write documentation is that if any changes need to be made, making the modification on the corresponding function, in this case, roxygen_param_dataset(), scales across the codebase, can be tested, and is less prone to user-error such as typos or grammar mistakes.
These functions are implemented in {admiraldev} (in roxygen2.R) and the naming convention for each argument will be as follows roxygen_param_xxx(), where "xxx" is to be replaced with the argument name. The available helper functions are roxygen_param_dataset(), roxygen_param_by_vars(), roxygen_order_na_handling(), and roxygen_save_memory().
- The choice of R Version and Package versions are not set in stone. However, a common development environment is important to establish when working across multiple companies and multiple developers. We currently recommend developers work with the latest R version and latest available packages. However, this will deviate over time as developers come and go from
{admiral}. We actually see this as a positive, i.e. the deviations between developers, as this introduces a bit of random stress testing to our code base. - GitHub allows us through the Actions/Workflows to test
{admiral}under several versions of R as well as several versions of dependent R packages needed for{admiral}. Currently we test{admiral}against the two latest R Versions and the closest snapshots of packages to those R versions. You can view this workflow and others on our admiralci GitHub Repository.
The following R commands cover the most common development tasks. All commands should be run with the package project open (i.e., from the package root directory).
devtools::install_deps(dependencies = TRUE)Installs all packages declared in DESCRIPTION (both Imports and Suggests).
devtools::load_all()Simulates installing and loading the package. Use this frequently during development to ensure your changes are available in the R session.
devtools::document()Runs {roxygen2} to rebuild all man/*.Rd files and regenerate NAMESPACE. Must be run after any change to roxygen headers.
devtools::test()Runs the full {testthat} test suite. All tests must pass before opening a pull request.
devtools::test_file("tests/testthat/test-<file>.R")Useful for rapid iteration while developing or fixing a specific function.
lintr::lint_package()Applies the linting rules defined in .lintr.R (which uses admiral_linters() from {admiraldev}). The CI workflow will fail if linting errors are present.
styler::style_pkg()Reformats source files to comply with the tidyverse style guide. Run this before committing to avoid whitespace-related lintr failures.
devtools::check()Runs the full R CMD check suite locally. The PR CI will fail if check produces any errors, warnings, or notes. See the R CMD Issues vignette for guidance on resolving common failures.
Source: https://pharmaverse.github.io/admiraldev/articles/git_usage.html
title: "Guidance for git and GitHub Usage" output: rmarkdown::html_vignette: toc: true toc_depth: 6 vignette: > %\VignetteIndexEntry{Guidance for git and GitHub Usage} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
This article will give you an overview of how the {admiral} project is utilizing the version-control software git and the website GitHub while working with RStudio. We will go over the primary branches that house the source code for the {admiral} project as well as how we use Feature branches to address Issues. Issues can range from bugs to enhancements that have been identified or requested by developers, users or testers. We also provide the bare minimum of git commands needed to get up and running. Please refer to the Resource section for more in-depth guidance on using git and GitHub.
-
The
mainbranch contains the latest development version of the package. You can find the released versions here -
The
gh-pagesbranch contains the code used to render R package websites - you are looking at it right now! -
The
patchbranch is reserved for special hot fixes to address bugs and should rarely be used. More info in Hot Fix Release -
The
main,gh-pages,patchbranches are under protection. If you try and push changes to these branches you will get an error unless you are an administrator. -
Feature branches are where actual development related to a specific issue happens. Feature branches are merged into
mainonce a pull request is merged. Check out the Pull Request Review Guidance for more guidance on merging intomain.
Feature Branches are where most developers will work when addressing Issues.
Each feature branch must be related to an issue. We encourage new developers to only work on one issue at a time.
The name of the branch must be prefixed with the issue number, followed by a short but meaningful description. As an example, given an issue #94 "Program function to derive LSTALVDT", the branch name would be 94-derive-var-lstalvdt.
- Checkout the main branch:
git checkout main - Pull the latest changes from GitHub:
git pull - Create a new branch off the main branch and switch to it:
git checkout -b <new_branch_name>
You can also create a feature branch in GitHub.
- Switch to the
mainbranch - Type in your new feature branch name
- Click Create branch:
<your_branch_name>@mainfrommain - Be Sure to Pull down newly created branch into RStudio
knitr::include_graphics("github_feature_branch.png", dpi = 144)
To start the commit process, you will need to tell git to move your changes to the staging area. Use git add <your_file> to move all changes of <your_file> in the staging area to wait for the next commit. You can use git add . to move all files you have worked on to the staging area. Next you can commit, which takes a snapshot of your staged changes. When committing, prefix the message with the issue number and add a meaningful message git commit -m '#94 last alive date implementation'.
Lastly, you should push your changes up to GitHub using git push origin <branch name>
You can also make use of the Git Tab within RStudio to commit your changes. A benefit of using this Tab is being able to see your changes to the file with red and green highlighting. Just like in the terminal, start the message with the issue number and add a meaningful and succinct sentence. Hit the Commit button and then Push up to GitHub.
knitr::include_graphics("github_committ.png", dpi = 144)
We require developers to insert the issue number into each commit message. Placing the issue number in your commit message allows reviewers to quickly find discussion surrounding your issue. When pushed to GitHub the issue number will be hyperlinked to the issue tracker, a powerful tool for discussion and traceability, which we think is valuable in a highly regulated industry like Pharma.
Below are styles of commit messaging permitted:
feat: #94 skeleton of function developedchore: #94 styler and lintr updatedocs: #94 parameters and details sections completed
#94 skeleton of function developed#94 styler and lintr update#94 parameters and details sections completed
skeleton of function developed (#94)styler and lintr update (#94)parameters and details sections completed (#94)
We recommend a thorough read through of the articles, Pull Request Review Guidance and the Programming Strategy for in-depth discussions on doing a proper Pull Request. Pull Request authors will benefit from shorter review times by closely following the guidance provided in those two articles. Below we discuss some simple git commands in the terminal and on GitHub for doing a Pull Request. We recommend doing the Pull Request in GitHub only and not through the terminal.
Once all changes are committed, push the updated branch to GitHub:
git push -u origin <branch_name>
In GitHub, under Pull requests, the user will either have a "Compare and pull request" button and/or a "Create Pull Request". The first button will be created for you if GitHub detects recent changes you have made. The branch to merge with must be the main branch (base = main) and the compare branch is the new branch to merge - as shown in the below picture. Please pay close attention to the branch you are merging into!
knitr::include_graphics("github_create_pr.png", dpi = 144)
The issue must be linked to the pull request in the "Development" field of the Pull Request. In most cases, this linkage will automatically close the issue and move to the Done column on our project board.
knitr::include_graphics("github_linked_issues_dark.png", dpi = 144)
Once you have completed the Pull Request you will see all committed changes are then available for the reviewer. A reviewer must be specified in the Pull Request. It is recommended to write a brief summary to your reviewers so they can quickly come up to speed on your Pull Request. Images of your updates are nice too, which are easy to do in GitHub! Use any Screen Capture software and Copy and Paste into your summary.
- At least one reviewer must approve the Pull Request. Please review the Pull Request Review Guidance, which provides in depth guidance on doing a proper Pull Request.
- The reviewer must ensure that the function follows the programming strategy recommendations.
- Any comment/question/discussion must be addressed and documented in GitHub before the Pull Request is merged
Once the review is completed, the reviewer will merge the Pull Request and the feature branch will automatically be deleted.
After merging the Pull Request please check that the corresponding issue has been moved to the done column on the Project Board. Also, please make sure that the issue has closed.
knitr::include_graphics("github_done.png", dpi = 144)
Merge conflict is a situation where git cannot decide which changes to apply since there were multiple updates in the same part of a file. This typically happens when multiple people update the same part of code. Those conflicts always need to be handled manually (as some further code updates may be required):
git checkout main
git pull
git checkout <feature_branch>
git merge main
This provides a list of all files with conflicts In the file with conflicts the conflicting sections are marked with <<<<<<<, =======, and >>>>>>>. The code between these markers must be updated and the markers be removed. Source files need to be updated manually. Generated files like NAMESPACE or the generated documentation files should not be updated manually but recreated after the source files were updated.
To make the changes available call:
git add <file with conflict>
git commit -m "<insert_message>"
git push
For simple merge conflicts, developers can make use of the GitHub interface to solve them. GitHub will show the number of conflicts between the two branches. In the below image, GitHub has found 3 conflicts, but we only display the first one. Just like in the terminal, GitHub will make use of the <<<<<<<, =======, and >>>>>>> to highlight the conflicting sections. You will need to make the decision on whether to keep the code from the base or the feature branch. Once you have decided, go into the code and remove the section you no longer wish to keep. Be sure to remove the <<<<<<<, =======, and >>>>>>> as well! Once you work through the conflicts you will mark as Resolved and Commit your changes. It is recommended to pull your branch back down to RStudio to make sure no untoward effects have happened to your branch.
knitr::include_graphics("github_conflicts.png", dpi = 144)
- merging:
git merge <my_branch>- merge my_branch into current branch - The stashing commands are useful when one wants to go back to clean directory
git stash- stash (store) current changes and restore a clean directorygit stash pop- put back (restore) stashed changesgit revertis also helpful for undoing committed changes without rewriting history
Using code from unmerged branches
- Checkout the unmerged branch you want to use:
git checkout <unmerged_branch> - Pull the latest committed changes from the unmerged branch:
git pull - Check out your feature branch:
git checkout <my_branch> - Merge the unmerged branch to <my_branch>:
git merge <unmerged_branch>
Source: https://pharmaverse.github.io/admiraldev/articles/rcmd_issues.html
title: "R CMD Issues" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{R CMD Issues} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
R CMD check is a command line tool that checks R packages against a standard set of criteria. For a pull request to pass the check must not issue any notes, warnings or errors. Below is a list of common issues and how to resolve them.
If the R CMD check workflow fails only on one or two R versions it can be helpful to reproduce the testing environment locally.
To reproduce a particular R version environment open the {admiral} project in the corresponding R version, comment the line source("renv/activate.R") in the .Rprofile file, restart the R session and then run the following commands in the R console.
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")
if (!dir.exists(".library")) {
dir.create(".library")
}
base_recommended_pkgs <- row.names(installed.packages(priority = "high"))
for (pkg in base_recommended_pkgs) {
path <- file.path(.Library, pkg)
cmd <- sprintf("cp -r %s .library", path)
system(cmd)
}
assign(".lib.loc", ".library", envir = environment(.libPaths))
r_version <- getRversion()
if (grepl("^4.1", r_version)) {
options(repos = "https://packagemanager.posit.co/cran/2021-05-03/")
} else if (grepl("^4.2", r_version)) {
options(repos = "https://packagemanager.posit.co/cran/2022-01-03/")
} else if (grepl("^4.3", r_version)) {
options(repos = "https://packagemanager.posit.co/cran/2023-04-20/")
} else {
options(repos = "https://cran.rstudio.com")
}
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_deps(dependencies = TRUE)
remotes::install_github("pharmaverse/pharmaversesdtm", ref = "devel")
remotes::install_github("pharmaverse/admiraldev", ref = "devel")
rcmdcheck::rcmdcheck()This will ensure that the exact package versions we use in the workflow are installed into the hidden folder .library. That way your existing R packages are not overwritten.
> checking package dependencies ... ERROR
Namespace dependency not required: 'pkg'
Add pkg to the Imports or Suggests field in the DESCRIPTION file. In general, dependencies should be listed in the Imports field. However, if a package is only used inside vignettes or unit tests it should be listed in Suggests because all {admiral} functions would work without these "soft" dependencies being installed.
❯ checking R code for possible problems ... NOTE
function_xyz: no visible binding for global variable 'some_var'
Add some_var to the list of "global" variables in R/globals.R.
❯ checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'function_xyz'
'some_param'
Add an @param some_param section in the header of function_xyz() and run devtools::document() afterwards.
❯ checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'function_xyz':
...
Argument names in code not in docs:
new_param_name
Argument names in docs not in code:
old_param_name
Mismatches in argument names:
Position: 6 Code: new_param_name Docs: old_param_name
The name of a parameter has been changed in the function code but not yet in the header. Change @param old_param_name to @param new_param_name and run devtools::document().
For further reading we recommend the R-pkg manual r-cmd chapter
For unit testing context see tests/testthat/AGENTS.md (generated from https://pharmaverse.github.io/admiraldev/articles/unit_test_guidance.html).
After adding or modifying any roxygen2 comments (#') in R source files,
regenerate the documentation before committing:
devtools::document()This updates all .Rd files in man/ and the NAMESPACE file. Always
run it when you:
- Add or rename a
@param,@return,@export, or@importFromtag - Add a new exported function
- Change a function signature
R CMD check will issue a WARNING for undocumented arguments or a mismatch
between the code and docs if devtools::document() has not been run.
- Programming Strategy
- Git and GitHub Usage
- Common R CMD Check Issues
- Unit Test Guidance
- Admiral Website
- admiraldev Website
Auto-generated by pharmaverse/admiralci – create-agents-md.yml
Footnotes
-
For example, if
derive_var_example()is going to be deprecated and it is defined inexamples.R, the unit tests are intests/testthat/test-examples.R. ↩