| Title: | Prepare Electronic Prescription Record Data to Estimate Drug Exposure | 
| Version: | 0.0.4 | 
| Maintainer: | David Selby <David.Selby@manchester.ac.uk> | 
| BugReports: | https://github.com/belayb/drugprepr/issues | 
| Description: | Prepare prescription data (such as from the Clinical Practice Research Datalink) into an analysis-ready format, with start and stop dates for each patient's prescriptions. Based on Pye et al (2018) <doi:10.1002/pds.4440>. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| Language: | en-GB | 
| LazyData: | true | 
| RoxygenNote: | 7.1.2 | 
| Imports: | dplyr, doseminer, rlang, tidyr, sqldf, stringr, purrr, DescTools | 
| Depends: | R (≥ 2.10) | 
| Suggests: | knitr, rmarkdown, testthat, kableExtra | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2021-11-09 11:50:09 UTC; j10204ds | 
| Author: | Belay Birlie Yimer | 
| Repository: | CRAN | 
| Date/Publication: | 2021-11-09 18:50:05 UTC | 
Clean implausibly-long prescription durations
Description
Given a prescription length limit, truncate any prescriptions that appear to be longer than this, or mark them as missing.
Usage
clean_duration(data, max_months = Inf, method = c("truncate", "remove"))
Arguments
| data | A data frame containing a column called  | 
| max_months | The maximum plausible prescription length in months | 
| method | Either 'truncate' or 'remove'. See details | 
Details
The method 'truncate' causes any duration longer than max_months to
be replaced with the value of max_months (albeit converted to days).
The method 'remove' causes such durations to be replaced with NA.
There is no explicit 'ignore' method, but if you want to 'do nothing', simply
set max_months to an arbitrarily high number.
By default, the maximum is infinite, so nothing should happen.
(Of course, you could also just not run the function...)
Value
A data frame of the same structure as the input, possibly with some elements of the duration column changed
Note
Currently the variable name is hard-coded as 'duration', but in principle this could be parametrised for datasets where the column has a different name.
Examples
long_presc <- data.frame(duration = c(100, 300, 400, 800))
clean_duration(long_presc, 6)
clean_duration(long_presc, 12, 'remove')
Close small gaps between successive prescriptions
Description
Given a series of prescriptions in data, if one prescription
(for the same patient and drug) starts \leq min_gap days
after the previous one finishes, we extend the length of the previous
prescription to cover the gap.
Usage
close_small_gaps(data, min_gap = 0L)
Arguments
| data | A data frame containing columns  | 
| min_gap | Size of largest gaps to close. Default is zero, i.e. do nothing | 
Value
The input data frame data, possibly with some of the
stop_dates changed.
Examples
gappy_data <- data.frame(
  patid = 1,
  prodcode = 'a',
  start_date = Sys.Date() + (0:6) * 7,
  stop_date = Sys.Date() + (0:6) * 7 + 4
)
close_small_gaps(gappy_data)
close_small_gaps(gappy_data, 7)
Compute numerical daily dose from free-text prescribing instructions
Description
The function calls the R package doseminer to extract dose information from free-text prescribing instructions, then computes the average numerical daily dose according to a given decision rule.
Usage
compute_ndd(data, dose_fn = mean, freq_fn = mean, interval_fn = mean)
Arguments
| data | a data frame containing free-text prescribing instructions in a
column called  | 
| dose_fn | function to summarise range of numbers by a single value | 
| freq_fn | function to summarise range of frequencies by a single value | 
| interval_fn | function to summarise range of intervals by a single value | 
Details
The general formula for computing numerical daily dose (ndd) is given by
\mbox{ndd} = \mbox{DF} \times \mbox{DN} / \mbox{DI},
where
- DF
- is dose frequency, the number of dose 'events' per day 
- DN
- is dose number, or number of units of drug taken during each dose 'event' 
- DI
- is dose interval, or the number of days between 'dose days', where an interval of 1 means every day 
Prescriptions can have a variable dose frequency or dose number, such as '2-4 tablets up to 3 times per day'. In this case, the user can choose to reduce these ranges to single values by taking the minimum, maximum or average of these endpoints.
Value
A data frame mapping the raw text to structured dosage information.
Examples
compute_ndd(dataset1, min, min, mean)
Example data from the Clinical Practice Research Datalink (CPRD).
Description
A dataset containing prescription information for two individuals. The dataset is a hypothetical dataset resembling the real CPRD data.
Usage
dataset1
Format
A data frame with 18 rows and 9 variables:
- patid
- unique identifier given to a patient in CPRD GOLD 
- pracid
- unique identifier given to a practice in CPRD GOLD 
- start_date
- Beginning of the prescription period 
- prodcode
- CPRD unique code for the treatment selected by the GP 
- dossageid
- Identifier that allows dosage information on the event to be retrieved from Common Dosages Lookup table 
- text
- Prescription instruction for the prescribed product, as entered by the GP 
- qty
- Total quantity entered by the GP for the prescribed product 
- numdays
- Number of treatment days prescribed for a specific therapy event 
- dose_duration
- an estimated prescription duration, as entered by CPRD 
...
Source
https://cprdcw.cprd.com/_docs/CPRD_GOLD_Full_Data_Specification_v2.0.pdf
Decision 1: impute implausible total quantities
Description
A light wrapper around impute_qty.
Usage
decision_1(data, decision = "a")
Arguments
| data | a data frame | 
| decision | one of the following strings: 
 | 
Note
Decisions f and g are not yet implemented.
See Also
Other decision functions: 
decision_10(),
decision_2(),
decision_3(),
decision_4(),
decision_5(),
decision_6(),
decision_7(),
decision_8(),
decision_9(),
drug_prep()
Decision 10: close small gaps between successive prescriptions
Description
Where one prescription (for the same drug and patient) starts only a short time after the previous finishes, this function can close the gap, as if the prescription was continuous over the entire period.
Usage
decision_10(data, decision = "a")
Arguments
| data | a data frame | 
| decision | one of the following strings: 
 | 
Details
The underlying function is called close_small_gaps
See Also
Other decision functions: 
decision_1(),
decision_2(),
decision_3(),
decision_4(),
decision_5(),
decision_6(),
decision_7(),
decision_8(),
decision_9(),
drug_prep()
Decision 2: impute missing total quantities
Description
A light wrapper around impute_qty.
Usage
decision_2(data, decision = "a")
Arguments
| data | a data frame | 
| decision | one of the following strings: 
 | 
Note
Decisions e and f are not yet implemented.
See Also
Other decision functions: 
decision_10(),
decision_1(),
decision_3(),
decision_4(),
decision_5(),
decision_6(),
decision_7(),
decision_8(),
decision_9(),
drug_prep()
Decision 3: impute implausible daily doses
Description
A light wrapper around impute_ndd.
Usage
decision_3(data, decision = "a")
Arguments
| data | a data frame | 
| decision | one of the following strings: 
 | 
Note
Decisions f and g are not yet implemented.
See Also
Other decision functions: 
decision_10(),
decision_1(),
decision_2(),
decision_4(),
decision_5(),
decision_6(),
decision_7(),
decision_8(),
decision_9(),
drug_prep()
Decision 4: impute missing daily doses
Description
A light wrapper around impute_ndd.
Usage
decision_4(data, decision = "a")
Arguments
| data | a data frame | 
| decision | one of the following strings: 
 | 
Note
Decisions e and f are not yet implemented.
See Also
Other decision functions: 
decision_10(),
decision_1(),
decision_2(),
decision_3(),
decision_5(),
decision_6(),
decision_7(),
decision_8(),
decision_9(),
drug_prep()
Decision 5: impute implausible prescription durations
Description
A light wrapper around clean_duration.
Usage
decision_5(data, decision = "a")
Arguments
| data | a data frame | 
| decision | one of the following strings: 
 | 
See Also
Other decision functions: 
decision_10(),
decision_1(),
decision_2(),
decision_3(),
decision_4(),
decision_6(),
decision_7(),
decision_8(),
decision_9(),
drug_prep()
Decision 6: choose method of calculating prescription duration
Description
This is just shorthand for defining a column equal to one of the specified
formulae. If the column(s) corresponding to decision are missing, an
error will be thrown.
If you have already calculated or obtained the column duration from
elsewhere, this step is not necessary.
Usage
decision_6(data, decision = "c")
Arguments
| data | a data frame | 
| decision | one of the following strings: 
 | 
Note
This step actually takes place before decision_5.
See Also
Other decision functions: 
decision_10(),
decision_1(),
decision_2(),
decision_3(),
decision_4(),
decision_5(),
decision_7(),
decision_8(),
decision_9(),
drug_prep()
Decision 7: impute missing prescription durations
Description
A light wrapper around impute_duration.
Usage
decision_7(data, decision = "a")
Arguments
| data | a data frame | 
| decision | one of the following strings: 
 | 
See Also
Other decision functions: 
decision_10(),
decision_1(),
decision_2(),
decision_3(),
decision_4(),
decision_5(),
decision_6(),
decision_8(),
decision_9(),
drug_prep()
Decision 8: disambiguate prescriptions with the same start date
Description
A light wrapper around impute_duration, followed by removing
duplicate rows with the same combination of prodcode, patid
and start_date.
Usage
decision_8(data, decision = "a")
Arguments
| data | a data frame | 
| decision | one of the following strings 
 | 
See Also
Other decision functions: 
decision_10(),
decision_1(),
decision_2(),
decision_3(),
decision_4(),
decision_5(),
decision_6(),
decision_7(),
decision_9(),
drug_prep()
Decision 9: handle overlapping prescription periods
Description
In situations where one prescription starts before another (for the same patient and drug) finishes, this function will either implicitly sum the doses (i.e. do nothing) or it will divide the intervals into non-overlapping subsets, shifting these sub-intervals forward in time until there is no overlap.
Usage
decision_9(data, decision = "a")
Arguments
| data | a data frame | 
| decision | one of the following strings: 
 | 
Details
The underlying algorithm for shifting overlapping intervals is implemented
by the internal function shift_interval.
See Also
Other decision functions: 
decision_10(),
decision_1(),
decision_2(),
decision_3(),
decision_4(),
decision_5(),
decision_6(),
decision_7(),
decision_8(),
drug_prep()
Run drug preparation algorithm
Description
Run drug preparation algorithm
Usage
drug_prep(data, plausible_values, decisions = rep("a", 10))
Arguments
| data | data frame containing prescription data | 
| plausible_values | data frame containing variables  | 
| decisions | character vector of length 10 | 
Value
A data frame including estimated stop_date for each prescription
See Also
Other decision functions: 
decision_10(),
decision_1(),
decision_2(),
decision_3(),
decision_4(),
decision_5(),
decision_6(),
decision_7(),
decision_8(),
decision_9()
Examples
plausible_values <- data.frame(
  prodcode = c('a', 'b', 'c'),
  min_qty = 0,
  max_qty = c(50, 100, 200),
  min_ndd = 0,
  max_ndd = c(10, 20, 30)
)
drug_prep(example_therapy,
          plausible_values,
          decisions = c('a', 'a', 'a', 'a', 'a',
                        'c', 'a', 'a', 'a', 'a'))
Example electronic prescription dataset
Description
Based on a hypothetical 'therapy' file from the Clinical Practical Research Datalink (CPRD), a UK database of primary care records.
Usage
example_therapy
Format
An object of class data.frame with 30 rows and 6 columns.
Note
This dataset is now generated deterministically, so it will not vary between sessions.
Get the mode (most common value) of a vector
Description
Get the mode (most common value) of a vector
Usage
get_mode(v, na.rm = TRUE)
Arguments
| v | a vector | 
| na.rm | Logical. If  | 
Impute missing or implausible values
Description
This is a workhorse function used by impute_ndd,
impute_qty and others.
Usage
impute(
  data,
  variable,
  method = c("ignore", "mean", "median", "mode", "replace", "min", "max", "sum"),
  where = is.na,
  group,
  ...,
  replace_with = NA_real_
)
Arguments
| data | A data frame containing columns  | 
| variable | Unquoted name of the column in  | 
| method | Method for imputing the values. See details. | 
| where | Logical vector, or function applied to  | 
| group | Level of structure for imputation. Defaults to whole study population. | 
| ... | Extra arguments, currently ignored | 
| replace_with | if the method 'replace' is selected, which value should be inserted? 
 | 
Details
The argument where indicates which values are to be imputed.
It can be specified as either a vector or as a function. Thus you can
specify, for example, is.na to impute all missing values, or
you can pass in a vector, if it depends on something else rather than just
the current values of the variable to imputed.
This design may change in future. In particular, if we want to impute
implausible values and impute missing values separately, it's important that
these steps are independent.
Value
A data frame of the same structure as data, with values imputed
Replace missing or implausible prescription durations
Description
Instead of replacing missing stop dates, we impute the durations and then infer the stop dates from there.
Usage
impute_duration(
  data,
  method,
  where = is.na,
  group = c("patid", "start_date"),
  ...
)
Arguments
| data | A data frame containing columns  | 
| method | Method for imputing the values. See details. | 
| where | Logical vector, or function applied to  | 
| group | Level of structure for imputation. Defaults to whole study population. | 
| ... | Extra arguments, currently ignored | 
Details
We can fix clashing start dates by setting group to start_date
and patid, i.e. average over groups with more than one member;
any metric should return the original values if the group size is one.
Value
A data frame of the same structure as data, with values imputed
Examples
example_duration <- transform(example_therapy, duration = qty / ndd)
impute_duration(example_duration, method = 'mean', group = 'patid')
Replace implausible or missing numerical daily doses (NDD)
Description
Replace implausible or missing numerical daily doses (NDD)
Usage
impute_ndd(data, method, where = is.na, group = "population", ...)
Arguments
| data | A data frame containing columns  | 
| method | Method for imputing the values. See details. | 
| where | Logical vector, or function applied to  | 
| group | Level of structure for imputation. Defaults to whole study population. | 
| ... | Extra arguments, currently ignored | 
Value
A data frame of the same structure as data, with values imputed
Examples
impute_ndd(example_therapy, 'mean')
Find implausible entries Replace implausible or missing prescription quantities
Description
Find implausible entries Replace implausible or missing prescription quantities
Usage
impute_qty(data, method, where = is.na, group = "population", ...)
Arguments
| data | A data frame containing columns  | 
| method | Method for imputing the values. See details. | 
| where | Logical vector, or function applied to  | 
| group | Level of structure for imputation. Defaults to whole study population. | 
| ... | Extra arguments, currently ignored | 
Value
A data frame of the same structure as data, with values imputed
Examples
impute_qty(example_therapy, 'mean')
Separating overlapping prescription periods
Description
Run this function and then you can either simply discard overlapping intervals or shift them around using an appropriate algorithm.
Usage
isolate_overlaps(data)
Arguments
| data | A data frame including variables  | 
Details
The older implementation used isolateoverlaps from the
intervalaverage package and Overlap from the DescTools
package. Here we refactor it using functions from tidyverse instead.
Value
A data frame of patid, prodcode, start_date and
stop_date, where intervals are either exactly overlapping or mutually
non-overlapping (but not partially overlapping), such that the union of such
intervals is equivalent to those originally provided in data
Note
This function currently doesn't use any keys except patid and
prodcode. It may be desirable to add a row ID, for matching each
partial interval back to the original interval from which it was derived.
This may be relevant to models using weighted dosages.
See Also
intervalaverage::isolateoverlaps,
foverlaps
Examples
set.seed(1)
overlapping_data <- data.frame(
  rowid = 1:20,
  patid = 1:2,
  prodcode = 'a',
  start_date = Sys.Date() + c(round(rexp(19, 1/7)), -20),
  qty = rpois(20, 64),
  ndd = sample(seq(.5, 12, by = .5), 20, replace = TRUE),
  stringsAsFactors = FALSE
)
overlapping_data <- transform(overlapping_data,
  stop_date = start_date + qty / ndd
)
isolate_overlaps(overlapping_data)
Human-friendly interface to the drug prep algorithm
Description
A helper function that allows specifying decision rules using English
words rather than alphanumeric codes. Translates the rules into the
corresponding codes and then passes them to drug_prep functions.
Usage
make_decisions(
  implausible_qty,
  missing_qty,
  implausible_ndd,
  missing_ndd,
  implausible_duration,
  calculate_duration,
  missing_duration,
  clash_start,
  overlapping,
  small_gaps
)
Arguments
| implausible_qty | implausible total drug quantities | 
| missing_qty | missing total drug quantities | 
| implausible_ndd | implausible daily dosage | 
| missing_ndd | missing daily dosage | 
| implausible_duration | overly-long prescription durations | 
| calculate_duration | formula or variable to compute prescription duration | 
| missing_duration | missing prescription duration | 
| clash_start | how to disambiguate prescriptions that start on the same date | 
| overlapping | how to handle prescription periods that overlap with one another | 
| small_gaps | how to handle short gaps between successive prescriptions The argument  
 | 
Value
A character vector suitable for passing to the decisions argument of
the drug_prep function.
Examples
make_decisions('ignore',
               'mean population',
               'missing',
               'mean practice',
               'truncate 6',
               'qty / ndd',
               'mean individual',
               'mean',
               'allow',
               'close 15')
Example min-max data.
Description
A dataset containing minimum and maximum possible values for quantity and number of daily dose for given prescription. The dataset is hypothetical.
Usage
min_max_dat
Format
A data frame with 2 rows and 5 variables:
- prodcode
- CPRD unique code for the treatment selected by the GP 
- max_qty
- maximum possible quantity to be prescribed for the product 
- min_qty
- minimum possible quantity to be prescribed for the product 
- max_ndd
- maximum possible number of daily dose to be prescribed for the product 
- min_ndd
- minimum possible number of daily dose to be prescribed for the product 
...
Do values fall outside a specified 'plausible' range?
Description
A utility function for indicating if elements of a vector are implausible.
Usage
outside_range(x, lower, upper, open = TRUE)
Arguments
| x | numeric vector | 
| lower | minimum plausible value | 
| upper | maximum plausible value | 
| open | logical. If  | 
Details
Though the function between already exists, it is not vectorised over the bounds.
Shift time intervals until they no longer overlap
Description
This is a function used by decision_9.
Usage
shift_interval(x)
Arguments
| x | a data frame containing variables  | 
Value
A data frame with time intervals moved such that they no longer overlap