simplify

simplify

reduce

Author

Joe DeRose

Marcin Imielinski

Xiaotong Yao

Joe DeRose, Marcin Imielinski

Methods


Method new()

All purpose constructor of gGraphs from nodes, edges, junctions or various input formats (JaBbA, Weaver, etc)

Usage

gGraph$new(
  genome = NULL,
  breaks = NULL,
  juncs = NULL,
  alignments = NULL,
  prego = NULL,
  jabba = NULL,
  cougar = NULL,
  weaver = NULL,
  remixt = NULL,
  rck = NULL,
  walks = NULL,
  nodes = NULL,
  edges = NULL,
  nodeObj = NULL,
  edgeObj = NULL,
  meta = NULL,
  verbose = FALSE
)

Arguments

genome

Seqinfo or object coercible to seqinfo

breaks

GRanges whose endpoints specify breakpoints in the genome

juncs

Junction object or GRangesList coercible to Junction

prego

PREGO output directory path

jabba

JaBbA graph rds file

cougar

CouGar output directory path

weaver

Weaver output directory path

remixt

RemiXT output directory path

rck

RCK output directory path


Method walks()

Exhaustively generates walks (if greedy = FALSE) or otherwise applies a greedy heuristic (greedy = TRUE) returns a gWalk object tied to this graph warning: greedy = FALSE will not scale to large graphs (i.e. may even hang on a couple of hundred nodes, depending on the topology)

Usage

gGraph$walks(field = NULL, greedy = FALSE, verbose = FALSE)

Arguments

greedy

logical scalar specifying whether to generate greedy walks

verbose

logical scalar


Method set()

set metadata of gGraph right now mainly useful for gTrack defaults such as "name" or "colormaps", and also used for setting default "by" field for simplify but can be used for configuring other settings in the future (note that this is graph level and not gNode or gEdge level metadata)

Usage

gGraph$set(...)

Arguments

...

name value pairs


Method queryLookup()

Returned a data.table of the provided snode.ids, their indicies and the indicies of their reverse complements in the graph. data.table is keyed on snode.id.

Usage

gGraph$queryLookup(id)

Arguments

id

snode.ids to look up

id

signed node ids in graph

Returns

data.table of snode.ids, indicies and reverse complement indicies


Method disjoin()

disjoins (i.e. collapses) all overlapping nodes in graph (subject to "by" argument), and aggregates node and edge metadata among them using FUN modifies the current graph optional input gr will first concatenate a reference graph with GRanges gr prior to disjoining collapse argument (if TRUE) will output a graph where there is a single node per reference interval and if collapse = FALSE will only disjoin all the nodes in the graph but keep all overlapping nodes separate i.e. so that overlapping graphs are composed of a common set of disjoint intervals, but we allow there to be several instances of a given interval among the different graphs

Usage

gGraph$disjoin(
  gr = NULL,
  by = NULL,
  collapse = TRUE,
  na.rm = TRUE,
  avg = FALSE,
  sep = ",",
  FUN = default.agg.fun.generator(na.rm = na.rm, sep = sep, avg = avg)
)

Arguments

gr

GRanges around which to disjoin the graph

by

metadata field of current graph around which to limit disjoining

collapse

logical scalar specifying whether to collapse graph nodes after disjoining

na.rm

logical scalar specifying whether to remove NA's when aggregating metadata after collapsing

avg

logical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)

FUN

function which should take (numeric or character) x and na.rm = TRUE and return a scalar value


Method simplify()

Simplifies gGraph by collapsing reference adjacent nodes that lack a junction or loose end (ignore.loose = FALSE) between them.

Takes an optional "by" column. If by is not NULL simplify will only collapse adjacent nodes if they share metadata in the columns specified by "by"

Usage

gGraph$simplify(
  by = private$pmeta$by,
  na.rm = TRUE,
  avg = TRUE,
  sep = ",",
  FUN = default.agg.fun.generator(na.rm = na.rm, sep = sep, avg = avg),
  ignore.loose = FALSE
)

Arguments

by

metadata field of current graph around which to limit simplification

na.rm

logical scalar specifying whether to remove NA's when aggregating metadata after collapsing

avg

logical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)

FUN

function which should take (numeric or character) x and na.rm = TRUE and return a scalar value


Method reduce()

Reduces graph which is $disjoin() followed by a simplify()$ i.e. collapsing overlapping nodes, then merging adjacent ones subject to (optional) matching on some metadata field

Usage

gGraph$reduce(...)

Arguments

by

metadata field of current graph around which to limit reduction (i.e. disjoining and simplification)

na.rm

logical scalar specifying whether to remove NA's when aggregating metadata after collapsing

avg

logical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)

FUN

function which should take (numeric or character) x and na.rm = TRUE and return a scalar value


Method subgraph()

compute subgraph within a certain distance or degree of separation of (all nodes) intersection given GRanges "seed" window win

Usage

gGraph$subgraph(
  seed = si2gr(self),
  d = NULL,
  k = 0,
  bagel = FALSE,
  mod = FALSE,
  ignore.strand = T,
  verbose = FALSE
)

Arguments

seed

GRanges around which to subgraph

d

distance in bp around which to subgraph

k

order in number of edges which to subgraph

ignore.strand

logical scalar specifying whether to ignore.strand

verbose

logical scalar

pad

positive integer scalar padding to add to seed


Method clusters()

Marks nodes in graph with metadata field $cluster based on one of several algorithms, selected by mode If i and j are specified, graph is first subsetted then clusters computed, then cluster ids are lifted back to mark the original graph.

Usage

gGraph$clusters(i = NULL, j = NULL, mode = "weak")

Arguments

i

node filter to apply to graph prior to clustering

j

edge filter to apply to graph prior to clustering

weak

character scalar that can take one of the following possible values - "weak" or "strong" specifying weakly or strongly connected components, walktrap specifying cluster_walktrap community detection


Method eclusters()

Usage

gGraph$eclusters(
  thresh = 1000,
  range = 1e+06,
  weak = TRUE,
  paths = !weak,
  mc.cores = 1,
  verbose = FALSE,
  chunksize = 1e+30,
  method = "single"
)

Arguments

thresh

the distance threshold with which to group nearby quasi-reciprocal junctions - i.e. if thresh=0 then we only consider clusters of exactly reciprocal junctions.

weak

logical flag if TRUE will not differentiate between cycles and paths and will return all weakly connected clusters in the graph [FALSE]

mc.cores

parallel

Returns

numerical vector of the same length, Inf means they r not facing each other


Method eclusters2()

Marks ALT edges belonging (quasi) reciprocal cycles

Usage

gGraph$eclusters2(
  thresh = 1000,
  range = 1e+06,
  weak = TRUE,
  paths = !weak,
  mc.cores = 1,
  verbose = FALSE,
  chunksize = 1e+30,
  method = "single",
  return_pairs = FALSE,
  ignore.small = TRUE,
  max.small = 10000,
  ignore.isolated = TRUE,
  strict = c("strict", "one_to_one", "loose"),
  min.isolated = max.small,
  only_chains = FALSE
)

Arguments

weak

logical flag if TRUE will not differentiate between cycles and paths and will return all weakly connected clusters in the junction graph [FALSE]

mc.cores

parallel

max.small

size below which simple dups and dels are excluded

only_chains

TRUE will only pair breakend to its nearest nearest neighbor IFF the nearest neighbor is reciprocal

juncs

GRangesList of junctions

ignore.strand

usually TRUE

Returns

numerical vector of the same length, Inf means they r not facing each other


Method paths()

Returns shortest paths from query gNode to subject gNode in graph in the form of gWalks (note: the gNodes must exist in the graph, unlike in the related but more general proximity function)

Each output path is a gWalk that connects query-subject on the genome described by gGraph gg. Each gWalk is annotated by the metadata of the corresponding query-subject GRanges pair as well as fields "altdist" and "refdist" specifying the "alternate and "reference" gGraph distance of the query-subject pair. The gWalk metadata field "reldist" specifies the relative distance (i.e. ratio of altdist to refdist) for that walk.

NOTE: this operation can be quite expensive for large combinations of of query and subject, so max.dist parameter will by default only compute paths for query-subject pairs that are less then max.dist apart (default 1MB). That default is chosen for large queries (eg >10K on each side), however for smaller queries (eg length <100) the user may want to set max.dist = Inf

By default performs a "cartesian" search, i.e. all pairs of query and subject but if cartesian is set to FALSE will only search for specified pairs of query and subject (then query and subject must be of the same length)

Usage

gGraph$paths(
  query,
  subject = query,
  mc.cores = 1,
  weight = NULL,
  meta = NULL,
  ignore.strand = TRUE,
  cartesian = TRUE
)

Arguments

query

gNode object or snode.id of gNode in this graph

subject

gNode object or snode.id of gNode in this graph

mc.cores

how many cores (default 1)

weight

edge column to use as the weight (instead of standard weight column)

Returns

gWalk object each representing a query-subject shortest path (if any exist)


Method dist()

Computes a distance matrix of query and subject intervals (in base pairs) on the gGraph between any arbitrary pairs of granges gr1 and gr2.

Usage

gGraph$dist(
  query,
  subject,
  weight = NULL,
  ignore.strand = TRUE,
  include.internal = TRUE,
  verbose = FALSE
)

Arguments

weight

metadata field of gEdges to use as weight (instead of distance of target node)

include.internal

logical flag whether to allow paths that begin or end inside teh query or subject

gr1

GRanges query

gr2

GRanges query (if NULL, will set to gr1)

dt

returns data.table if TRUE, excluding all Inf distances


Method rep()

Creates "bubbles" in the graph by replicating the nodes or gwalks in the argument. Node replication replicates edges going in and out of all replicated nodes. If an edge connects a pair of replicated nodes that edge will be replicated across all pairs of those replciated nodes. Walk replication will create "longer bubbles" with fewer edges getting replicated i.e. it will only replicate intra-walk edges within each walk replicate (but not between separate walk replicates).

(note that this changes the current gGraph in place, and thus the input gNode or gWalk will no longer apply to the new altered gWalk)

New graph keeps track of the parent node and edge ids in the original graph using node metadata parent.node.id and edge metadata parent.edge.id i.e. the replicated nodes will be connected to the sources of the original nodes and if replicated nodes connect to each other, then there will exist an edge connecting all of their instances to each other.

Usage

gGraph$rep(nodes = NULL, times)

Arguments

nodes

= gNode object must point to a node in the graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object

times

scalar or vector of length self$length specifying how many times to replicate each of the nodes.

Returns

gGraph (also modified in place) with nodes annotated with parent.node.id, parent.rep


Method swap()

Swap nodes with granges, grl, or Gwalks. Provided replacement vector must be the same length as the inputted nodes, resulting in each node being "swapped" by the provided interval, node, grl (representing a walk), or gWalk. The replacement will inherit left and right edges for the removed node. If the replacement is a walk, then the left side of the first node in the walk will inherit the edges that were previously to the left of the node being replaced, and right side of the last node of the walk will inherit the edges that were previously to the right of the node being replaced.

Note: these replacement obey the orientation of the arguments. So if the node to be replaced is flipped (- orientation with respect to the reference, then it's "left" is to the right on the reference. Similarly for walks whose first interval is flipped with respect to the reference, the left edges will be attached to the right of the node on the reference.

Usage

gGraph$swap(nodes, replacement)

Arguments

nodes

= gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object

replacement

GRanges, GRangesList, or gWalk object whose length is the length(nodes)

Returns

gGraph (also modified in place) with nodes annotated with parent.node.id, parent.rep


Method connect()

Connect node pairs in the gGraph by adding (optional) edge metadata and (optionally) inserting nodes or grl / walks in between the given edge. Note: the connections are made with respect to the provided node orientation so if the node is provided in a "flipped" orientation then it's right direction will point left on the reference.

Usage

gGraph$connect(
  n1,
  n2,
  n1.side = "right",
  n2.side = "left",
  type = "ALT",
  meta = NULL,
  insert = NULL
)

Arguments

n1

= gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object

n2

= gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object

n1.side

character vector of length length(n1) whose value is either "left" or "right" (default 'right')

n2.side

character vector of length length(n1) whose value is either "left" or "right" (default 'left')

Returns

gGraph (also modified in place) with nodes annotated with parent.node.id, parent.rep


Method toposort()

Usage

gGraph$toposort()

Returns

current graph, modified with nodes marked according to topological sort


Method print()

Prints out this gGraph. Prints number of nodes and edges, the gNode associated with this gGraph and the gEdge associated with this gGraph

Usage

gGraph$print()


Method annotate()

Used by the mark() functions in gNode, gEdge and gWalks to alter the metadata associated with the nodes and edges in this gGraph. Not recommended to use this function. It is much safer to use mark.

FYI id for nodes is the node.id (not snode.id) id for edges is the edge.id (not sedge.id)

Usage

gGraph$annotate(colName, data, id, class)


Method maxflow()

Computes the "max flow" between every node pair in self for some metadata field.

The "max flow" for a node pair i, j is the maximum value m of node and/or edge metadata for which there exists a path p between i and j whose nodes n and/or edges e obey field(n)>=m and/or field(e)>=m for all n,e \(\in\) p. (i.e. m is the maximum lower bound of the value of nodes / edges across all paths connecting ij)

The "min version" of this problem (max = FALSE) will determine the min value m for which there exists p whose nodes n and edges e obey field(n)>=m and/or field(e)>=m for all n,e \(\in\) p.

The user can also do the problem with lower.bound = FALSE i.e. where m is the (maximum or minimum) <upper> bound value in each path.

By default will try to solve problem across both node and edge metadata if the field is present in either. If the field is only present in one then will solve for that. This property can be toggled using edges.only and nodes.only parameters.

Usage

gGraph$maxflow(
  field = NA,
  walk = FALSE,
  max = TRUE,
  lower.bound = TRUE,
  nfield = NA,
  efield = NA,
  cfield = NA,
  path.only = TRUE,
  require.nodes = NULL,
  multi = FALSE,
  ncopies = 1,
  reverse.complement = FALSE,
  verbose = FALSE
)

Arguments

field

metadata field to run maxflow on

walk

if TRUE will return the single walk that maximizes the sum of metadata fields

max

logical flag whether to find maximum path or (if max = FALSE) minimum path

nfield

field to specify a node field to maximize across paths

efield

field to specify an edge field to maximize across paths

cfield

field to specify a node / edge field that limits / caps the dosage at nodes / edges

path.only

logical flag relevant only if walk = TRUE, if path is TRUE will only allow path based maxflows (TRUE) ie will not return a solution when the graph contains only cycles

multi

logical flag (FALSE) if TRUE will allow the optimization to compute a solution that outputs multiple disjoint paths

ncopies

positive integer representing the number of copies of the flow that we want the graph to support

reverse

complement will compute maximum flow between each node i and the reverse complement of node j in a strand specific way


Method window()

Returns the region this gGraph spans as a GRanges

Usage

gGraph$window(pad = 0)

Arguments

pad

A positive amount to pad the window by

Returns

GRanges of the region this gGraph covers


Method gtrack()

Usage

gGraph$gtrack(
  y.field = NULL,
  lwd.loose = 3,
  col.loose = alpha("blue", 0.6),
  col.alt = alpha("red", 0.4),
  ...
)


Method trim()

Trims the current gGraph to the provided GRanges and returns this as a new gGraph.

Usage

gGraph$trim(tile)

Arguments

tile

GRanges to trim on

tile

interval around which to trim the gGraph

mod

Defaults to FALSE, set to TRUE to modify this gGraph

Details

``` gr = c(GRanges("1", IRanges(10000,100000), "+"), GRanges("2", IRanges(10000,100000), "+")) new.gg = gg$trim(gr) ```

Returns

new gGraph trimmed to tile, unless mod is set to TRUE


Method fix()

Modifies (in place) the current seqlevels of the gGraph, including keeping only certain seqlevels, dropping certain seqlevels, and replacing seqlevels.

Warning: this may modify the graph including getting rid of nodes and edges (i.e. those outside the retained seqlevels) and also change coordinates (ie move ranges that were previously on different chromosomes to the same chromosome etc.). Use with caution!

Default behavior is to replace 'chr', with ''.

Usage

gGraph$fix(pattern = NULL, replacement = NULL, drop = TRUE, seqlengths = NULL)

Arguments

pattern

character pattern to replace in seqlevels (used in a gsub, can have backreferences)

replacement

character to replace pattern with (used in a gsub, can have backreferences)

drop

character vector of seqlevels to drop or logical TRUE to drop all seqlevels that are unused (TRUE)

seqlengths

new seqlengths i.e. named integer vector of seqlevels to drop or embed graph into

Returns

current graph modified in place with additional nodes and edges, as specified by user


Method add()

Adds GRanges nodes, edges (data.table), or junctions to graph Only one of the below parameters can be specified at a time (since the graph is modified in place, order matters)

Usage

gGraph$add(nodes = NULL, edges = NULL, junctions = NULL)

Arguments

nodes

GRanges, strand is ignored

edges

data.table specifying edges in existing table with field n1, n2, n1.side, n2.side

junctions

Junction object or GRangesList coercible to junction object

Returns

current graph modified in place with additional nodes and edges, as specified by user


Method json()

Creates a json file for active visualization using gGnome.js annotations are node / edge features that will be dumped to json

Usage

gGraph$json(
  filename = ".",
  maxcn = 100,
  maxweight = 100,
  save = TRUE,
  verbose = FALSE,
  annotations = NULL,
  nfields = NULL,
  efields = NULL,
  settings = list(y_axis = list(title = "copy number", visible = TRUE)),
  cid.field = NULL,
  no.y = FALSE
)

Arguments

filename

character path to save to

save

whether to save or return list object representing json contents

annotations

which graph annotations to dump to json

nfields

which node fields to dump to json (NULL)

efields

which edge fields to dump to json (NULL)

settings

gGnome.js settings values to add to the output JSON files (list)

cid.field

field in the graph edges that should be used for setting the cid values in the JSON (default: 'sedge.id'). This is useful for cases in which there is some unique identifier used across samples to identify identical junctions (for example "merged.ix" field, which is generated by merge.Junction())


Method get.diameter()

Usage

gGraph$get.diameter(weights = NULL)


Method circos()

Usage

gGraph$circos(...)


Method split()

Usage

gGraph$split(by = "parent.graph")


Method clone()

The objects of this class are cloneable with this method.

Usage

gGraph$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.