Here we analyze gGraphs with "cn" (copy number) field to enforce integer cn and junction balance, ie sum of incoming (or outgoing) edge cn should be equal to node copy cn.

The goal is to find a balaned assignment of "cn" to the nodes and edges of the gGraph that maximally resemble the input weights while minimizing the loose end penalty. The similarity / distance function can be weighted by optional node / edge metadata field $weight (when weighted = TRUE).

To output this gGraph, we design a MIP with (1) objective function that minimizes (weighted sum of square) distance of fit node and junction copy number to provided values in $cn field (2) and lambda* the sum of copy number at non terminal loose ends subject to (3) junction balance constraint (4) fixing copy number of a subset of nodes and junctions

Objective weight can be modulated at nodes and edges with $weight metadata field (default node weight is node width, and edge weight is 1). These fields will then set the penalty incurred to a fit of x to that node / edge with copy number c and weight w as (x-c)^2/w.

Lambda can be modulated at nodes with $lambda node metadata field (default 1)

For "haplographs" ie graphs that have more than one node overlapping a given location, it may be important to constrain total copy number using a haploid read depth signal. The marginal parameter enables this through a GRanges annotated with $cn and optionally $weight field that provides a target total copy number that the optimization will attempt to satisfy. This provided copy number c and weight w (default 1) will be evaluated against the sum s of the fit copy numbers of all nodes overlapping that location by adding a penalty of (c-s)^2/w to the corresponding solution. marginal can also have an optional logical field $fix that will actually constrain the marginal copy number to be equal to the provided value (note: that the optimization may be infeasible, and function will error out)

Additional controls can be inputted by changing the graph metadata - e.g. adding fields $lb and $ub to nodes and edges will constrain their fit copy number to those bounds. Adding $reward field to edges will add a reward for each copy of that edge in the solution.

balance(
  gg,
  lambda = 10,
  marginal = NULL,
  emarginal = NULL,
  tight = NULL,
  nfix = NULL,
  efix = NULL,
  nrelax = NULL,
  erelax = NULL,
  L0 = TRUE,
  loose.collapse = FALSE,
  M = 1000,
  phased = FALSE,
  ism = FALSE,
  lp = TRUE,
  verbose = 1,
  tilim = 10,
  trelim = 32000,
  nodefileind = 1,
  epgap = 0.001,
  debug = FALSE,
  use.gurobi = FALSE,
  nonintegral = FALSE
)

Arguments

gg

gGraph with field $cn, can be NA for some nodes and edges, optional field $weight which will adjust the quadratic penalty on the fit to x as (x-$cn)^2/weight

lambda

positive number specifying loose end penalty, note if gg$node metadata contain $lambda field then this lambda will be multiplied by the node level lambda (default 10)

marginal

GRanges with field $cn and optional $weight field will be used to fit the summed values at each base of the genome to optimally fit the marginal value, optional field $fix will actually constrain the marginal to be the provided value

emarginal

Junctions object with marginal CN in the $cn field (and optionally $weight in the weight field). optional field $fix will actually constrain the marginal to be the provided value.

tight

indices or epxression on node metadata specifying at which nodes to disallow loose ensd

nfix

indices or expression on node metadata specifying which node cn to fix

efix

indices or expression on edge metadata specifying which edge cn to fix

nrelax

indices or expression on node metadata specifying which nodes cn to relax

erelax

indices or expression on edge metadata specifying which edges cn to relax

L0

flag whether to apply loose end penalty as L1 (TRUE)

loose.collapse

(parameter only relevant if L0 = TRUE) will count all unique (by coordinate) instances of loose ends in the graph as the loose end penalty, rather than each instance alone ... useful for fitting a metagenome graph (FALSE)

M

(numeric) big M constraint for L0 norm loose end penalty (default 1e3)

phased

(logical) indicates whether to run phased/unphased. default = FALSE

ism

(logical) additional ISM constraints (FALSE)

lp

(logical) solve as linear program using abs value (default TRUE)

verbose

(integer)scalar specifying whether to do verbose output, value 2 will spit out MIP (1)

tilim

(numeric) time limit on MIP in seconds (10)

trelim

(numeric) max size of uncompressed tree in MB (default 32e3)

nodefileind

(numeric) one of 0 (no node file) 1 (in memory compressed) 2 (on disk uncompressed) 3 (on disk compressed) default 1

epgap

(numeric) relative optimality gap threshhold between 0 and 1 (default 1e-3)

debug

(logical) returns list with names gg and sol. sol contains full RCPLEX solution. (default FALSE)

use.gurobi

(logical) use gurobi optimizer? if TRUE uses gurobi instead of cplex. default FALSE.

nonintegral

(logical) run without integer constraints on REF edges and nodes? default FALSE.

Value

balanced gGraph maximally resembling input gg in CN while minimizing loose end penalty lambda.

Details

applications of gGraph

Author

Marcin Imielinski