Skip to contents

The `kdps` function identifies subjects to be removed from a study based on kinship and phenotype information. It uses kinship matrices and phenotype data to evaluate and prioritize subjects according to their phenotype scores, taking into account their relatedness. The algorithm can prioritize subjects with high or low phenotype values and filter out subjects based on kinship thresholds and phenotype rankings. It aims to refine the study population by removing subjects that do not meet specific genetic and phenotypic criteria, thus enhancing the robustness of genetic association studies.

Usage

kdps(
  phenotype_file = system.file("extdata", "simple_pheno.txt", package = "kdps"),
  kinship_file = system.file("extdata", "simple_kinship.txt", package = "kdps"),
  fuzziness = 0,
  phenotype_name = "pheno2",
  prioritize_high = FALSE,
  prioritize_low = FALSE,
  phenotype_rank = c("DISEASED1", "DISEASED2", "HEALTHY"),
  fid_name = "FID",
  iid_name = "IID",
  fid1_name = "FID1",
  iid1_name = "IID1",
  fid2_name = "FID2",
  iid2_name = "IID2",
  kinship_name = "KINSHIP",
  kinship_threshold = 0.0442,
  phenotypic_naive = FALSE
)

Arguments

phenotype_file

A string specifying the path to the phenotype data file.

kinship_file

A string specifying the path to the kinship matrix file.

fuzziness

An integer representing the level of fuzziness allowed in removing related subjects, with a default of 0 (no fuzziness).

phenotype_name

The name of the phenotype column in the phenotype file.

prioritize_high

A logical indicating whether to prioritize subjects with high phenotype values for removal.

prioritize_low

A logical indicating whether to prioritize subjects with low phenotype values for removal.

phenotype_rank

A character vector specifying the ranking of phenotypes from highest priority (first) to lowest.

fid_name

The column name for family IDs in the phenotype file.

iid_name

The column name for individual IDs in the phenotype file.

fid1_name

The column name for the first individual's family ID in the kinship file.

iid1_name

The column name for the first individual's ID in the kinship file.

fid2_name

The column name for the second individual's family ID in the kinship file.

iid2_name

The column name for the second individual's ID in the kinship file.

kinship_name

The name of the kinship score column in the kinship file.

kinship_threshold

A numeric threshold for the kinship score, above which individuals are considered related.

phenotypic_naive

A logical indicating whether to ignore phenotype information when resolving conflicts between related individuals.

Value

A data frame with two columns, `FID` and `IID`, representing the family and individual IDs of subjects suggested for removal. This output can be used to refine the study population by excluding these subjects in subsequent analyses.

Details

The function first processes phenotype and kinship data from the specified files, then evaluates subjects based on the provided parameters. It calculates weights for each subject based on their phenotype and uses these weights along with the kinship information to identify subjects that should be removed to minimize relatedness in the study population. The function offers flexibility in handling phenotypes through ranking and prioritization options and can adjust the stringency of relatedness filtering through the kinship threshold and fuzziness parameter.

Examples

kdps(
  phenotype_file = system.file("extdata", "simple_pheno.txt", package = "kdps"),
  kinship_file = system.file("extdata", "simple_kinship.txt", package = "kdps"),
  fuzziness = 0,
  phenotype_name = "pheno2",
  prioritize_high = FALSE,
  prioritize_low = FALSE,
  phenotype_rank = c("DISEASED1", "DISEASED2", "HEALTHY"),
  fid_name = "FID",
  iid_name = "IID",
  fid1_name = "FID1",
  iid1_name = "IID1",
  fid2_name = "FID2",
  iid2_name = "IID2",
  kinship_name = "KINSHIP",
  kinship_threshold = 0.0442,
  phenotypic_naive = FALSE
)
#> Error in data.table::fread(phenotype_file): Input is empty or only contains BOM or terminal control characters