Generate a PGV instance — pgv • gGnome

Takes a table with paths to gGraphs, coverage files, and gWalks (optional), and generates an instance of a PGV directory that is ready to visualize using PGV

pgv(
  data,
  name.col = "sample",
  patient.id = "participant",
  outdir = "./pgv",
  cov.col = "coverage",
  gg.col = "graph",
  gw.col = "walks",
  descriptors = NA,
  append = TRUE,
  cov.field = "ratio",
  cov.field.col = NA,
  cov.bin.width = 10000,
  cov.color.field = NULL,
  ref = NA,
  overwrite = FALSE,
  annotation = c("simple", "bfb", "chromoplexy", "chromothripsis", "del", "dm", "dup",
    "pyrgo", "qrdel", "qrdup", "qrp", "rigma", "tic", "tyfonas"),
  tree = NA,
  cid.field = "sedge.id",
  connections.associations = FALSE,
  kag.col = "kag",
  ncn.gr = NA,
  mc.cores = 1
)

Arguments

data: either a path to a TSV/CSV or a data.table
name.col: column name in the input data table containing the sample names (default: "sample")
patient.id: column name in the input data table containing the patient ID/names (default: 'patricipant'). If your table includes more than one datasets (e.g. samples from multiple patients), then you can specify the column from which to read the dataset names. This column would be used to group together samples that belong to the same dataset. If no values are passed, we take the pair name as patientID.
outdir: the path where to save the files. This path should not exist, unless you want to add more files to an existing directory in which case you must use append = TRUE
cov.col: column name in the input data table containing the paths to coverage files
gg.col: column name in the input data table containing the paths to RDS files containing the gGnome objects
gw.col: column name in the input data table containing the paths to RDS files containing the gWalk objects (optional)
descriptors: list of columns in data table that provides description tags to our patient IDs. Here we are looking for IDs and tags that can be used in PGV to subset our data. Expects a list of character column names. (default: NA)
append: if set to FALSE the the directory is expected to to exist yet (default: TRUE). By default, samples would be appended to a PGV instance if the directory already exists
cov.field: the name of the field in the coverage GRanges that should be used (default: "ratio", use "foreground" if dryclean output)
cov.field.col: column name in the input data table containing the name of the field in the coverage GRanges that should be used. If this is supplied then it overrides the value in "cov.field". Use this if some of your coverage files differ in the field used.
cov.bin.width: bin width to use when rebinning the coverage data (default: 1e4). If you don't want rebinning to be performed then set to NA.
cov.color.field: field in the coverage GRanges to use in order to set the color of coverage data points. If nothing is supplied then default colors are used for each seqname (namely chromosome) by reading the colors that are defined in the settings.json file for the specific reference that is being used for this dataset.
ref: the genome reference name used for this dataset. This reference name must be defined in the settings.json file. By default PGV accepts one of the following: hg19, hg38, covid19. If you are using a different reference then you must first add it to the settings.json file.
overwrite: by default only files that are missing will be created. If set to TRUE then existing coverage arrow files and gGraph JSON files will be overwritten
annotation: which node/edge annotation fields to add to the gGraph JSON file. By default we assume that gGnome::events has been executed and we add the following SV annotations: 'simple', 'bfb', 'chromoplexy', 'chromothripsis', 'del', 'dm', 'dup', 'pyrgo', 'qrdel', 'qrdup', 'qrp', 'rigma', 'tic', 'tyfonas'
tree: path to newick file containing a tree to incorporate with the dataset. IF provided then the tree is added to datafiles.json and will be visualized by PGV. If the names of leaves of the tree match the names defined in the name.col then PGV will automatically assocaited these leaves with the samples and hence upon clicking a leaf of the tree the browser will scroll down to the corresponding genome graph track
cid.field: field in the graph edges that should be used for setting the cid values in the JSON (default: 'sedge.id'). This is useful for cases in which there is some unique identifier used across samples to identify identical junctions (for example "merged.ix" field, which is generated by merge.Junction())
connections.associations: (FALSE) produce a connections.associations table.
kag.col: (default: 'kag') name of column in input table that includes the paths to JaBbA karyographs
ncn.gr: GRanges object or path to GRanges object containing normal copy n umber (ncn) values. The ncn values must be contained in a field named "ncn"
mc.cores: how many cores to use

Value

a generated PGV formatted json ready for visualization.