gGraph-constructor.Rd
simplify
simplify
reduce
new()
All purpose constructor of gGraphs from nodes, edges, junctions or various input formats (JaBbA, Weaver, etc)
gGraph$new(
genome = NULL,
breaks = NULL,
juncs = NULL,
alignments = NULL,
prego = NULL,
jabba = NULL,
cougar = NULL,
weaver = NULL,
remixt = NULL,
rck = NULL,
walks = NULL,
nodes = NULL,
edges = NULL,
nodeObj = NULL,
edgeObj = NULL,
meta = NULL,
verbose = FALSE
)
genome
Seqinfo or object coercible to seqinfo
breaks
GRanges whose endpoints specify breakpoints in the genome
juncs
Junction object or GRangesList coercible to Junction
prego
PREGO output directory path
jabba
JaBbA graph rds file
cougar
CouGar output directory path
weaver
Weaver output directory path
remixt
RemiXT output directory path
rck
RCK output directory path
walks()
Exhaustively generates walks (if greedy = FALSE) or otherwise applies a greedy heuristic (greedy = TRUE) returns a gWalk object tied to this graph warning: greedy = FALSE will not scale to large graphs (i.e. may even hang on a couple of hundred nodes, depending on the topology)
set()
set metadata of gGraph right now mainly useful for gTrack defaults such as "name" or "colormaps", and also used for setting default "by" field for simplify but can be used for configuring other settings in the future (note that this is graph level and not gNode or gEdge level metadata)
queryLookup()
Returned a data.table of the provided snode.ids, their indicies and the indicies of their reverse complements in the graph. data.table is keyed on snode.id.
disjoin()
disjoins (i.e. collapses) all overlapping nodes in graph (subject to "by" argument), and aggregates node and edge metadata among them using FUN modifies the current graph optional input gr will first concatenate a reference graph with GRanges gr prior to disjoining collapse argument (if TRUE) will output a graph where there is a single node per reference interval and if collapse = FALSE will only disjoin all the nodes in the graph but keep all overlapping nodes separate i.e. so that overlapping graphs are composed of a common set of disjoint intervals, but we allow there to be several instances of a given interval among the different graphs
gGraph$disjoin(
gr = NULL,
by = NULL,
collapse = TRUE,
na.rm = TRUE,
avg = FALSE,
sep = ",",
FUN = default.agg.fun.generator(na.rm = na.rm, sep = sep, avg = avg)
)
gr
GRanges around which to disjoin the graph
by
metadata field of current graph around which to limit disjoining
collapse
logical scalar specifying whether to collapse graph nodes after disjoining
na.rm
logical scalar specifying whether to remove NA's when aggregating metadata after collapsing
avg
logical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)
FUN
function which should take (numeric or character) x and na.rm = TRUE and return a scalar value
simplify()
Simplifies gGraph by collapsing reference adjacent nodes that lack a junction or loose end (ignore.loose = FALSE) between them.
Takes an optional "by" column. If by is not NULL simplify will only collapse adjacent nodes if they share metadata in the columns specified by "by"
gGraph$simplify(
by = private$pmeta$by,
na.rm = TRUE,
avg = TRUE,
sep = ",",
FUN = default.agg.fun.generator(na.rm = na.rm, sep = sep, avg = avg),
ignore.loose = FALSE
)
by
metadata field of current graph around which to limit simplification
na.rm
logical scalar specifying whether to remove NA's when aggregating metadata after collapsing
avg
logical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)
FUN
function which should take (numeric or character) x and na.rm = TRUE and return a scalar value
reduce()
Reduces graph which is $disjoin() followed by a simplify()$ i.e. collapsing overlapping nodes, then merging adjacent ones subject to (optional) matching on some metadata field
by
metadata field of current graph around which to limit reduction (i.e. disjoining and simplification)
na.rm
logical scalar specifying whether to remove NA's when aggregating metadata after collapsing
avg
logical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)
FUN
function which should take (numeric or character) x and na.rm = TRUE and return a scalar value
subgraph()
compute subgraph within a certain distance or degree of separation of (all nodes) intersection given GRanges "seed" window win
clusters()
Marks nodes in graph with metadata field $cluster based on one of several algorithms, selected by mode If i and j are specified, graph is first subsetted then clusters computed, then cluster ids are lifted back to mark the original graph.
i
node filter to apply to graph prior to clustering
j
edge filter to apply to graph prior to clustering
weak
character scalar that can take one of the following possible values - "weak" or "strong" specifying weakly or strongly connected components, walktrap specifying cluster_walktrap community detection
eclusters()
gGraph$eclusters(
thresh = 1000,
range = 1e+06,
weak = TRUE,
paths = !weak,
mc.cores = 1,
verbose = FALSE,
chunksize = 1e+30,
method = "single"
)
thresh
the distance threshold with which to group nearby quasi-reciprocal junctions - i.e. if thresh=0 then we only consider clusters of exactly reciprocal junctions.
weak
logical flag if TRUE will not differentiate between cycles and paths and will return all weakly connected clusters in the graph [FALSE]
mc.cores
parallel
eclusters2()
Marks ALT edges belonging (quasi) reciprocal cycles
gGraph$eclusters2(
thresh = 1000,
range = 1e+06,
weak = TRUE,
paths = !weak,
mc.cores = 1,
verbose = FALSE,
chunksize = 1e+30,
method = "single",
return_pairs = FALSE,
ignore.small = TRUE,
max.small = 10000,
ignore.isolated = TRUE,
strict = c("strict", "one_to_one", "loose"),
min.isolated = max.small,
only_chains = FALSE
)
weak
logical flag if TRUE will not differentiate between cycles and paths and will return all weakly connected clusters in the junction graph [FALSE]
mc.cores
parallel
max.small
size below which simple dups and dels are excluded
only_chains
TRUE will only pair breakend to its nearest nearest neighbor IFF the nearest neighbor is reciprocal
juncs
GRangesList of junctions
ignore.strand
usually TRUE
paths()
Returns shortest paths from query gNode to subject gNode in graph in the form of gWalks (note: the gNodes must exist in the graph, unlike in the related but more general proximity function)
Each output path is a gWalk that connects query-subject on the genome described by gGraph gg. Each gWalk is annotated by the metadata of the corresponding query-subject GRanges pair as well as fields "altdist" and "refdist" specifying the "alternate and "reference" gGraph distance of the query-subject pair. The gWalk metadata field "reldist" specifies the relative distance (i.e. ratio of altdist to refdist) for that walk.
NOTE: this operation can be quite expensive for large combinations of of query and subject, so max.dist parameter will by default only compute paths for query-subject pairs that are less then max.dist apart (default 1MB). That default is chosen for large queries (eg >10K on each side), however for smaller queries (eg length <100) the user may want to set max.dist = Inf
By default performs a "cartesian" search, i.e. all pairs of query and subject but if cartesian is set to FALSE will only search for specified pairs of query and subject (then query and subject must be of the same length)
gGraph$paths(
query,
subject = query,
mc.cores = 1,
weight = NULL,
meta = NULL,
ignore.strand = TRUE,
cartesian = TRUE
)
dist()
Computes a distance matrix of query and subject intervals (in base pairs) on the gGraph between any arbitrary pairs of granges gr1 and gr2.
gGraph$dist(
query,
subject,
weight = NULL,
ignore.strand = TRUE,
include.internal = TRUE,
verbose = FALSE
)
weight
metadata field of gEdges to use as weight (instead of distance of target node)
include.internal
logical flag whether to allow paths that begin or end inside teh query or subject
gr1
GRanges query
gr2
GRanges query (if NULL, will set to gr1)
dt
returns data.table if TRUE, excluding all Inf distances
rep()
Creates "bubbles" in the graph by replicating the nodes or gwalks in the argument. Node replication replicates edges going in and out of all replicated nodes. If an edge connects a pair of replicated nodes that edge will be replicated across all pairs of those replciated nodes. Walk replication will create "longer bubbles" with fewer edges getting replicated i.e. it will only replicate intra-walk edges within each walk replicate (but not between separate walk replicates).
(note that this changes the current gGraph in place, and thus the input gNode or gWalk will no longer apply to the new altered gWalk)
New graph keeps track of the parent node and edge ids in the original graph using node metadata parent.node.id and edge metadata parent.edge.id i.e. the replicated nodes will be connected to the sources of the original nodes and if replicated nodes connect to each other, then there will exist an edge connecting all of their instances to each other.
swap()
Swap nodes with granges, grl, or Gwalks. Provided replacement vector must be the same length as the inputted nodes, resulting in each node being "swapped" by the provided interval, node, grl (representing a walk), or gWalk. The replacement will inherit left and right edges for the removed node. If the replacement is a walk, then the left side of the first node in the walk will inherit the edges that were previously to the left of the node being replaced, and right side of the last node of the walk will inherit the edges that were previously to the right of the node being replaced.
Note: these replacement obey the orientation of the arguments. So if the node to be replaced is flipped (- orientation with respect to the reference, then it's "left" is to the right on the reference. Similarly for walks whose first interval is flipped with respect to the reference, the left edges will be attached to the right of the node on the reference.
connect()
Connect node pairs in the gGraph by adding (optional) edge metadata and (optionally) inserting nodes or grl / walks in between the given edge. Note: the connections are made with respect to the provided node orientation so if the node is provided in a "flipped" orientation then it's right direction will point left on the reference.
gGraph$connect(
n1,
n2,
n1.side = "right",
n2.side = "left",
type = "ALT",
meta = NULL,
insert = NULL
)
n1
= gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object
n2
= gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object
n1.side
character vector of length length(n1) whose value is either "left" or "right" (default 'right')
n2.side
character vector of length length(n1) whose value is either "left" or "right" (default 'left')
print()
Prints out this gGraph. Prints number of nodes and edges, the gNode associated with this gGraph and the gEdge associated with this gGraph
annotate()
Used by the mark() functions in gNode, gEdge and gWalks to alter the metadata associated with the nodes and edges in this gGraph. Not recommended to use this function. It is much safer to use mark.
FYI id for nodes is the node.id (not snode.id) id for edges is the edge.id (not sedge.id)
maxflow()
Computes the "max flow" between every node pair in self for some metadata field.
The "max flow" for a node pair i, j is the maximum value m of node and/or edge metadata for which there exists a path p between i and j whose nodes n and/or edges e obey field(n)>=m and/or field(e)>=m for all n,e \(\in\) p. (i.e. m is the maximum lower bound of the value of nodes / edges across all paths connecting ij)
The "min version" of this problem (max = FALSE) will determine the min value m for which there exists p whose nodes n and edges e obey field(n)>=m and/or field(e)>=m for all n,e \(\in\) p.
The user can also do the problem with lower.bound = FALSE i.e. where m is the (maximum or minimum) <upper> bound value in each path.
By default will try to solve problem across both node and edge metadata if the field is present in either. If the field is only present in one then will solve for that. This property can be toggled using edges.only and nodes.only parameters.
gGraph$maxflow(
field = NA,
walk = FALSE,
max = TRUE,
lower.bound = TRUE,
nfield = NA,
efield = NA,
cfield = NA,
path.only = TRUE,
require.nodes = NULL,
multi = FALSE,
ncopies = 1,
reverse.complement = FALSE,
verbose = FALSE
)
field
metadata field to run maxflow on
walk
if TRUE will return the single walk that maximizes the sum of metadata fields
max
logical flag whether to find maximum path or (if max = FALSE) minimum path
nfield
field to specify a node field to maximize across paths
efield
field to specify an edge field to maximize across paths
cfield
field to specify a node / edge field that limits / caps the dosage at nodes / edges
path.only
logical flag relevant only if walk = TRUE, if path is TRUE will only allow path based maxflows (TRUE) ie will not return a solution when the graph contains only cycles
multi
logical flag (FALSE) if TRUE will allow the optimization to compute a solution that outputs multiple disjoint paths
ncopies
positive integer representing the number of copies of the flow that we want the graph to support
reverse
complement will compute maximum flow between each node i and the reverse complement of node j in a strand specific way
window()
Returns the region this gGraph spans as a GRanges
trim()
Trims the current gGraph to the provided GRanges and returns this as a new gGraph.
tile
GRanges to trim on
tile
interval around which to trim the gGraph
mod
Defaults to FALSE, set to TRUE to modify this gGraph
fix()
Modifies (in place) the current seqlevels of the gGraph, including keeping only certain seqlevels, dropping certain seqlevels, and replacing seqlevels.
Warning: this may modify the graph including getting rid of nodes and edges (i.e. those outside the retained seqlevels) and also change coordinates (ie move ranges that were previously on different chromosomes to the same chromosome etc.). Use with caution!
Default behavior is to replace 'chr', with ''.
pattern
character pattern to replace in seqlevels (used in a gsub, can have backreferences)
replacement
character to replace pattern with (used in a gsub, can have backreferences)
drop
character vector of seqlevels to drop or logical TRUE to drop all seqlevels that are unused (TRUE)
seqlengths
new seqlengths i.e. named integer vector of seqlevels to drop or embed graph into
add()
Adds GRanges nodes, edges (data.table), or junctions to graph Only one of the below parameters can be specified at a time (since the graph is modified in place, order matters)
json()
Creates a json file for active visualization using gGnome.js annotations are node / edge features that will be dumped to json
filename
character path to save to
save
whether to save or return list object representing json contents
annotations
which graph annotations to dump to json
nfields
which node fields to dump to json (NULL)
efields
which edge fields to dump to json (NULL)
settings
gGnome.js settings values to add to the output JSON files (list)
cid.field
field in the graph edges that should be used for setting the cid values in the JSON (default: 'sedge.id'). This is useful for cases in which there is some unique identifier used across samples to identify identical junctions (for example "merged.ix" field, which is generated by merge.Junction())
split()