simplify

simplify

reduce

Author

Joe DeRose

Marcin Imielinski

Xiaotong Yao

Joe DeRose, Marcin Imielinski

Methods

Method `new()`

All purpose constructor of gGraphs from nodes, edges, junctions or various input formats (JaBbA, Weaver, etc)

Usage

gGraph$new(
  genome = NULL,
  breaks = NULL,
  juncs = NULL,
  alignments = NULL,
  prego = NULL,
  jabba = NULL,
  cougar = NULL,
  weaver = NULL,
  remixt = NULL,
  rck = NULL,
  walks = NULL,
  nodes = NULL,
  edges = NULL,
  nodeObj = NULL,
  edgeObj = NULL,
  meta = NULL,
  verbose = FALSE
)

Arguments

genome: Seqinfo or object coercible to seqinfo
breaks: GRanges whose endpoints specify breakpoints in the genome
juncs: Junction object or GRangesList coercible to Junction
prego: PREGO output directory path
jabba: JaBbA graph rds file
cougar: CouGar output directory path
weaver: Weaver output directory path
remixt: RemiXT output directory path
rck: RCK output directory path

Method `walks()`

Exhaustively generates walks (if greedy = FALSE) or otherwise applies a greedy heuristic (greedy = TRUE) returns a gWalk object tied to this graph warning: greedy = FALSE will not scale to large graphs (i.e. may even hang on a couple of hundred nodes, depending on the topology)

Usage

gGraph$walks(field = NULL, greedy = FALSE, verbose = FALSE)

Arguments

greedy: logical scalar specifying whether to generate greedy walks
verbose: logical scalar

Method `set()`

set metadata of gGraph right now mainly useful for gTrack defaults such as "name" or "colormaps", and also used for setting default "by" field for simplify but can be used for configuring other settings in the future (note that this is graph level and not gNode or gEdge level metadata)

Usage

gGraph$set(...)

Arguments

...: name value pairs

Method `queryLookup()`

Returned a data.table of the provided snode.ids, their indicies and the indicies of their reverse complements in the graph. data.table is keyed on snode.id.

Usage

gGraph$queryLookup(id)

Arguments

id: snode.ids to look up
id: signed node ids in graph

Returns

data.table of snode.ids, indicies and reverse complement indicies

Method `disjoin()`

disjoins (i.e. collapses) all overlapping nodes in graph (subject to "by" argument), and aggregates node and edge metadata among them using FUN modifies the current graph optional input gr will first concatenate a reference graph with GRanges gr prior to disjoining collapse argument (if TRUE) will output a graph where there is a single node per reference interval and if collapse = FALSE will only disjoin all the nodes in the graph but keep all overlapping nodes separate i.e. so that overlapping graphs are composed of a common set of disjoint intervals, but we allow there to be several instances of a given interval among the different graphs

Usage

gGraph$disjoin(
  gr = NULL,
  by = NULL,
  collapse = TRUE,
  na.rm = TRUE,
  avg = FALSE,
  sep = ",",
  FUN = default.agg.fun.generator(na.rm = na.rm, sep = sep, avg = avg)
)

Arguments

gr: GRanges around which to disjoin the graph
by: metadata field of current graph around which to limit disjoining
collapse: logical scalar specifying whether to collapse graph nodes after disjoining
na.rm: logical scalar specifying whether to remove NA's when aggregating metadata after collapsing
avg: logical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)
FUN: function which should take (numeric or character) x and na.rm = TRUE and return a scalar value

Method `simplify()`

Simplifies gGraph by collapsing reference adjacent nodes that lack a junction or loose end (ignore.loose = FALSE) between them.

Takes an optional "by" column. If by is not NULL simplify will only collapse adjacent nodes if they share metadata in the columns specified by "by"

Usage

gGraph$simplify(
  by = private$pmeta$by,
  na.rm = TRUE,
  avg = TRUE,
  sep = ",",
  FUN = default.agg.fun.generator(na.rm = na.rm, sep = sep, avg = avg),
  ignore.loose = FALSE
)

Arguments

by: metadata field of current graph around which to limit simplification
na.rm: logical scalar specifying whether to remove NA's when aggregating metadata after collapsing
avg: logical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)
FUN: function which should take (numeric or character) x and na.rm = TRUE and return a scalar value

Method `reduce()`

Reduces graph which is $disjoin() followed by a simplify()$ i.e. collapsing overlapping nodes, then merging adjacent ones subject to (optional) matching on some metadata field

Usage

gGraph$reduce(...)

Arguments

by: metadata field of current graph around which to limit reduction (i.e. disjoining and simplification)
na.rm: logical scalar specifying whether to remove NA's when aggregating metadata after collapsing
avg: logical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)
FUN: function which should take (numeric or character) x and na.rm = TRUE and return a scalar value

Method `subgraph()`

compute subgraph within a certain distance or degree of separation of (all nodes) intersection given GRanges "seed" window win

Usage

gGraph$subgraph(
  seed = si2gr(self),
  d = NULL,
  k = 0,
  bagel = FALSE,
  mod = FALSE,
  ignore.strand = T,
  verbose = FALSE
)

Arguments

seed: GRanges around which to subgraph
d: distance in bp around which to subgraph
k: order in number of edges which to subgraph
ignore.strand: logical scalar specifying whether to ignore.strand
verbose: logical scalar
pad: positive integer scalar padding to add to seed

Method `clusters()`

Marks nodes in graph with metadata field $cluster based on one of several algorithms, selected by mode If i and j are specified, graph is first subsetted then clusters computed, then cluster ids are lifted back to mark the original graph.

Usage

gGraph$clusters(i = NULL, j = NULL, mode = "weak")

Arguments

i: node filter to apply to graph prior to clustering
j: edge filter to apply to graph prior to clustering
weak: character scalar that can take one of the following possible values - "weak" or "strong" specifying weakly or strongly connected components, walktrap specifying cluster_walktrap community detection

Method `eclusters()`

Usage

gGraph$eclusters(
  thresh = 1000,
  range = 1e+06,
  weak = TRUE,
  paths = !weak,
  mc.cores = 1,
  verbose = FALSE,
  chunksize = 1e+30,
  method = "single"
)

Arguments

thresh: the distance threshold with which to group nearby quasi-reciprocal junctions - i.e. if thresh=0 then we only consider clusters of exactly reciprocal junctions.
weak: logical flag if TRUE will not differentiate between cycles and paths and will return all weakly connected clusters in the graph [FALSE]
mc.cores: parallel

Returns

numerical vector of the same length, Inf means they r not facing each other

Method `eclusters2()`

Marks ALT edges belonging (quasi) reciprocal cycles

Usage

gGraph$eclusters2(
  thresh = 1000,
  range = 1e+06,
  weak = TRUE,
  paths = !weak,
  mc.cores = 1,
  verbose = FALSE,
  chunksize = 1e+30,
  method = "single",
  return_pairs = FALSE,
  ignore.small = TRUE,
  max.small = 10000,
  ignore.isolated = TRUE,
  strict = c("strict", "one_to_one", "loose"),
  min.isolated = max.small,
  only_chains = FALSE
)

Arguments

weak: logical flag if TRUE will not differentiate between cycles and paths and will return all weakly connected clusters in the junction graph [FALSE]
mc.cores: parallel
max.small: size below which simple dups and dels are excluded
only_chains: TRUE will only pair breakend to its nearest nearest neighbor IFF the nearest neighbor is reciprocal
juncs: GRangesList of junctions
ignore.strand: usually TRUE

Returns

numerical vector of the same length, Inf means they r not facing each other

Method `paths()`

Returns shortest paths from query gNode to subject gNode in graph in the form of gWalks (note: the gNodes must exist in the graph, unlike in the related but more general proximity function)

Each output path is a gWalk that connects query-subject on the genome described by gGraph gg. Each gWalk is annotated by the metadata of the corresponding query-subject GRanges pair as well as fields "altdist" and "refdist" specifying the "alternate and "reference" gGraph distance of the query-subject pair. The gWalk metadata field "reldist" specifies the relative distance (i.e. ratio of altdist to refdist) for that walk.

NOTE: this operation can be quite expensive for large combinations of of query and subject, so max.dist parameter will by default only compute paths for query-subject pairs that are less then max.dist apart (default 1MB). That default is chosen for large queries (eg >10K on each side), however for smaller queries (eg length <100) the user may want to set max.dist = Inf

By default performs a "cartesian" search, i.e. all pairs of query and subject but if cartesian is set to FALSE will only search for specified pairs of query and subject (then query and subject must be of the same length)

Usage

gGraph$paths(
  query,
  subject = query,
  mc.cores = 1,
  weight = NULL,
  meta = NULL,
  ignore.strand = TRUE,
  cartesian = TRUE
)

Arguments

query: gNode object or snode.id of gNode in this graph
subject: gNode object or snode.id of gNode in this graph
mc.cores: how many cores (default 1)
weight: edge column to use as the weight (instead of standard weight column)

Returns

gWalk object each representing a query-subject shortest path (if any exist)

Method `dist()`

Computes a distance matrix of query and subject intervals (in base pairs) on the gGraph between any arbitrary pairs of granges gr1 and gr2.

Usage

gGraph$dist(
  query,
  subject,
  weight = NULL,
  ignore.strand = TRUE,
  include.internal = TRUE,
  verbose = FALSE
)

Arguments

weight: metadata field of gEdges to use as weight (instead of distance of target node)
include.internal: logical flag whether to allow paths that begin or end inside teh query or subject
gr1: GRanges query
gr2: GRanges query (if NULL, will set to gr1)
dt: returns data.table if TRUE, excluding all Inf distances

Method `rep()`

Creates "bubbles" in the graph by replicating the nodes or gwalks in the argument. Node replication replicates edges going in and out of all replicated nodes. If an edge connects a pair of replicated nodes that edge will be replicated across all pairs of those replciated nodes. Walk replication will create "longer bubbles" with fewer edges getting replicated i.e. it will only replicate intra-walk edges within each walk replicate (but not between separate walk replicates).

(note that this changes the current gGraph in place, and thus the input gNode or gWalk will no longer apply to the new altered gWalk)

New graph keeps track of the parent node and edge ids in the original graph using node metadata parent.node.id and edge metadata parent.edge.id i.e. the replicated nodes will be connected to the sources of the original nodes and if replicated nodes connect to each other, then there will exist an edge connecting all of their instances to each other.

Usage

gGraph$rep(nodes = NULL, times)

Arguments

nodes: = gNode object must point to a node in the graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object
times: scalar or vector of length self$length specifying how many times to replicate each of the nodes.

Returns

gGraph (also modified in place) with nodes annotated with parent.node.id, parent.rep

Method `swap()`

Swap nodes with granges, grl, or Gwalks. Provided replacement vector must be the same length as the inputted nodes, resulting in each node being "swapped" by the provided interval, node, grl (representing a walk), or gWalk. The replacement will inherit left and right edges for the removed node. If the replacement is a walk, then the left side of the first node in the walk will inherit the edges that were previously to the left of the node being replaced, and right side of the last node of the walk will inherit the edges that were previously to the right of the node being replaced.

Note: these replacement obey the orientation of the arguments. So if the node to be replaced is flipped (- orientation with respect to the reference, then it's "left" is to the right on the reference. Similarly for walks whose first interval is flipped with respect to the reference, the left edges will be attached to the right of the node on the reference.

Usage

gGraph$swap(nodes, replacement)

Arguments

nodes: = gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object
replacement: GRanges, GRangesList, or gWalk object whose length is the length(nodes)

Returns

gGraph (also modified in place) with nodes annotated with parent.node.id, parent.rep

Method `connect()`

Connect node pairs in the gGraph by adding (optional) edge metadata and (optionally) inserting nodes or grl / walks in between the given edge. Note: the connections are made with respect to the provided node orientation so if the node is provided in a "flipped" orientation then it's right direction will point left on the reference.

Usage

gGraph$connect(
  n1,
  n2,
  n1.side = "right",
  n2.side = "left",
  type = "ALT",
  meta = NULL,
  insert = NULL
)

Arguments

n1: = gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object
n2: = gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object
n1.side: character vector of length length(n1) whose value is either "left" or "right" (default 'right')
n2.side: character vector of length length(n1) whose value is either "left" or "right" (default 'left')

Returns

gGraph (also modified in place) with nodes annotated with parent.node.id, parent.rep

Method `toposort()`

Usage

gGraph$toposort()

Returns

current graph, modified with nodes marked according to topological sort

Method `print()`

Prints out this gGraph. Prints number of nodes and edges, the gNode associated with this gGraph and the gEdge associated with this gGraph

Usage

gGraph$print()

Method `annotate()`

Used by the mark() functions in gNode, gEdge and gWalks to alter the metadata associated with the nodes and edges in this gGraph. Not recommended to use this function. It is much safer to use mark.

FYI id for nodes is the node.id (not snode.id) id for edges is the edge.id (not sedge.id)

Usage

gGraph$annotate(colName, data, id, class)

Method `maxflow()`

Computes the "max flow" between every node pair in self for some metadata field.

The "max flow" for a node pair i, j is the maximum value m of node and/or edge metadata for which there exists a path p between i and j whose nodes n and/or edges e obey field(n)>=m and/or field(e)>=m for all n,e $\in$ p. (i.e. m is the maximum lower bound of the value of nodes / edges across all paths connecting ij)

The "min version" of this problem (max = FALSE) will determine the min value m for which there exists p whose nodes n and edges e obey field(n)>=m and/or field(e)>=m for all n,e $\in$ p.

The user can also do the problem with lower.bound = FALSE i.e. where m is the (maximum or minimum) <upper> bound value in each path.

By default will try to solve problem across both node and edge metadata if the field is present in either. If the field is only present in one then will solve for that. This property can be toggled using edges.only and nodes.only parameters.

Usage

gGraph$maxflow(
  field = NA,
  walk = FALSE,
  max = TRUE,
  lower.bound = TRUE,
  nfield = NA,
  efield = NA,
  cfield = NA,
  path.only = TRUE,
  require.nodes = NULL,
  multi = FALSE,
  ncopies = 1,
  reverse.complement = FALSE,
  verbose = FALSE
)

Arguments

field: metadata field to run maxflow on
walk: if TRUE will return the single walk that maximizes the sum of metadata fields
max: logical flag whether to find maximum path or (if max = FALSE) minimum path
nfield: field to specify a node field to maximize across paths
efield: field to specify an edge field to maximize across paths
cfield: field to specify a node / edge field that limits / caps the dosage at nodes / edges
path.only: logical flag relevant only if walk = TRUE, if path is TRUE will only allow path based maxflows (TRUE) ie will not return a solution when the graph contains only cycles
multi: logical flag (FALSE) if TRUE will allow the optimization to compute a solution that outputs multiple disjoint paths
ncopies: positive integer representing the number of copies of the flow that we want the graph to support
reverse: complement will compute maximum flow between each node i and the reverse complement of node j in a strand specific way

Method `window()`

Returns the region this gGraph spans as a GRanges

Usage

gGraph$window(pad = 0)

Arguments

pad: A positive amount to pad the window by

Returns

GRanges of the region this gGraph covers

Method `gtrack()`

Usage

gGraph$gtrack(
  y.field = NULL,
  lwd.loose = 3,
  col.loose = alpha("blue", 0.6),
  col.alt = alpha("red", 0.4),
  ...
)

Method `trim()`

Trims the current gGraph to the provided GRanges and returns this as a new gGraph.

Usage

gGraph$trim(tile)

Arguments

tile: GRanges to trim on
tile: interval around which to trim the gGraph
mod: Defaults to FALSE, set to TRUE to modify this gGraph

Details

``` gr = c(GRanges("1", IRanges(10000,100000), "+"), GRanges("2", IRanges(10000,100000), "+")) new.gg = gg$trim(gr) ```

Returns

new gGraph trimmed to tile, unless mod is set to TRUE

Method `fix()`

Modifies (in place) the current seqlevels of the gGraph, including keeping only certain seqlevels, dropping certain seqlevels, and replacing seqlevels.

Warning: this may modify the graph including getting rid of nodes and edges (i.e. those outside the retained seqlevels) and also change coordinates (ie move ranges that were previously on different chromosomes to the same chromosome etc.). Use with caution!

Default behavior is to replace 'chr', with ''.

Usage

gGraph$fix(pattern = NULL, replacement = NULL, drop = TRUE, seqlengths = NULL)

Arguments

pattern: character pattern to replace in seqlevels (used in a gsub, can have backreferences)
replacement: character to replace pattern with (used in a gsub, can have backreferences)
drop: character vector of seqlevels to drop or logical TRUE to drop all seqlevels that are unused (TRUE)
seqlengths: new seqlengths i.e. named integer vector of seqlevels to drop or embed graph into

Returns

current graph modified in place with additional nodes and edges, as specified by user

Method `add()`

Adds GRanges nodes, edges (data.table), or junctions to graph Only one of the below parameters can be specified at a time (since the graph is modified in place, order matters)

Usage

gGraph$add(nodes = NULL, edges = NULL, junctions = NULL)

Arguments

nodes: GRanges, strand is ignored
edges: data.table specifying edges in existing table with field n1, n2, n1.side, n2.side
junctions: Junction object or GRangesList coercible to junction object

Returns

current graph modified in place with additional nodes and edges, as specified by user

Method `json()`

Creates a json file for active visualization using gGnome.js annotations are node / edge features that will be dumped to json

Usage

gGraph$json(
  filename = ".",
  maxcn = 100,
  maxweight = 100,
  save = TRUE,
  verbose = FALSE,
  annotations = NULL,
  nfields = NULL,
  efields = NULL,
  settings = list(y_axis = list(title = "copy number", visible = TRUE)),
  cid.field = NULL,
  no.y = FALSE
)

Arguments

filename: character path to save to
save: whether to save or return list object representing json contents
annotations: which graph annotations to dump to json
nfields: which node fields to dump to json (NULL)
efields: which edge fields to dump to json (NULL)
settings: gGnome.js settings values to add to the output JSON files (list)
cid.field: field in the graph edges that should be used for setting the cid values in the JSON (default: 'sedge.id'). This is useful for cases in which there is some unique identifier used across samples to identify identical junctions (for example "merged.ix" field, which is generated by merge.Junction())

Method `get.diameter()`

Usage

gGraph$get.diameter(weights = NULL)

Method `circos()`

Usage

gGraph$circos(...)

Method `split()`

Usage

gGraph$split(by = "parent.graph")

Method `clone()`

The objects of this class are cloneable with this method.

Usage

gGraph$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Author

Methods

Public methods

Method new()

Usage

Arguments

Method walks()

Usage

Arguments

Method set()

Usage

Arguments

Method queryLookup()

Usage

Arguments

Returns

Method disjoin()

Usage

Arguments

Method simplify()

Usage

Arguments

Method reduce()

Usage

Arguments

Method subgraph()

Usage

Arguments

Method clusters()

Usage

Arguments

Method eclusters()

Usage

Arguments

Returns

Method eclusters2()

Usage

Arguments

Returns

Method paths()

Usage

Arguments

Returns

Method dist()

Usage

Arguments

Method rep()

Usage

Arguments

Returns

Method swap()

Usage

Arguments

Returns

Method connect()

Usage

Arguments

Returns

Method toposort()

Usage

Returns

Method print()

Usage

Method annotate()

Usage

Method maxflow()

Usage

Arguments

Method window()

Usage

Arguments

Returns

Method gtrack()

Usage

Method trim()

Usage

Arguments

Details

Returns

Method `new()`

Method `walks()`

Method `set()`

Method `queryLookup()`

Method `disjoin()`

Method `simplify()`

Method `reduce()`

Method `subgraph()`

Method `clusters()`

Method `eclusters()`

Method `eclusters2()`

Method `paths()`

Method `dist()`

Method `rep()`

Method `swap()`

Method `connect()`

Method `toposort()`

Method `print()`

Method `annotate()`

Method `maxflow()`

Method `window()`

Method `gtrack()`

Method `trim()`

Method `fix()`

Method `add()`

Method `json()`

Method `get.diameter()`

Method `circos()`

Method `split()`

Method `clone()`