Takes a table with paths to gGraphs, coverage files, and gWalks (optional), and generates an instance of a PGV directory that is ready to visualize using PGV

pgv(
  data,
  name.col = "sample",
  patient.id = "participant",
  outdir = "./pgv",
  cov.col = "coverage",
  gg.col = "graph",
  gw.col = "walks",
  descriptors = NA,
  append = TRUE,
  cov.field = "ratio",
  cov.field.col = NA,
  cov.bin.width = 10000,
  cov.color.field = NULL,
  ref = NA,
  overwrite = FALSE,
  annotation = c("simple", "bfb", "chromoplexy", "chromothripsis", "del", "dm", "dup",
    "pyrgo", "qrdel", "qrdup", "qrp", "rigma", "tic", "tyfonas"),
  tree = NA,
  cid.field = "sedge.id",
  connections.associations = FALSE,
  kag.col = "kag",
  ncn.gr = NA,
  mc.cores = 1
)

Arguments

data

either a path to a TSV/CSV or a data.table

name.col

column name in the input data table containing the sample names (default: "sample")

patient.id

column name in the input data table containing the patient ID/names (default: 'patricipant'). If your table includes more than one datasets (e.g. samples from multiple patients), then you can specify the column from which to read the dataset names. This column would be used to group together samples that belong to the same dataset. If no values are passed, we take the pair name as patientID.

outdir

the path where to save the files. This path should not exist, unless you want to add more files to an existing directory in which case you must use append = TRUE

cov.col

column name in the input data table containing the paths to coverage files

gg.col

column name in the input data table containing the paths to RDS files containing the gGnome objects

gw.col

column name in the input data table containing the paths to RDS files containing the gWalk objects (optional)

descriptors

list of columns in data table that provides description tags to our patient IDs. Here we are looking for IDs and tags that can be used in PGV to subset our data. Expects a list of character column names. (default: NA)

append

if set to FALSE the the directory is expected to to exist yet (default: TRUE). By default, samples would be appended to a PGV instance if the directory already exists

cov.field

the name of the field in the coverage GRanges that should be used (default: "ratio", use "foreground" if dryclean output)

cov.field.col

column name in the input data table containing the name of the field in the coverage GRanges that should be used. If this is supplied then it overrides the value in "cov.field". Use this if some of your coverage files differ in the field used.

cov.bin.width

bin width to use when rebinning the coverage data (default: 1e4). If you don't want rebinning to be performed then set to NA.

cov.color.field

field in the coverage GRanges to use in order to set the color of coverage data points. If nothing is supplied then default colors are used for each seqname (namely chromosome) by reading the colors that are defined in the settings.json file for the specific reference that is being used for this dataset.

ref

the genome reference name used for this dataset. This reference name must be defined in the settings.json file. By default PGV accepts one of the following: hg19, hg38, covid19. If you are using a different reference then you must first add it to the settings.json file.

overwrite

by default only files that are missing will be created. If set to TRUE then existing coverage arrow files and gGraph JSON files will be overwritten

annotation

which node/edge annotation fields to add to the gGraph JSON file. By default we assume that gGnome::events has been executed and we add the following SV annotations: 'simple', 'bfb', 'chromoplexy', 'chromothripsis', 'del', 'dm', 'dup', 'pyrgo', 'qrdel', 'qrdup', 'qrp', 'rigma', 'tic', 'tyfonas'

tree

path to newick file containing a tree to incorporate with the dataset. IF provided then the tree is added to datafiles.json and will be visualized by PGV. If the names of leaves of the tree match the names defined in the name.col then PGV will automatically assocaited these leaves with the samples and hence upon clicking a leaf of the tree the browser will scroll down to the corresponding genome graph track

cid.field

field in the graph edges that should be used for setting the cid values in the JSON (default: 'sedge.id'). This is useful for cases in which there is some unique identifier used across samples to identify identical junctions (for example "merged.ix" field, which is generated by merge.Junction())

connections.associations

(FALSE) produce a connections.associations table.

kag.col

(default: 'kag') name of column in input table that includes the paths to JaBbA karyographs

ncn.gr

GRanges object or path to GRanges object containing normal copy n umber (ncn) values. The ncn values must be contained in a field named "ncn"

mc.cores

how many cores to use

Value

a generated PGV formatted json ready for visualization.