pcst.md

PCST:

why I work on PCST
- the road map: PCST->MST->Quota->Budget
- PCST->MST
- MST = Quota
- Quota -> Budget: how the 5-approximation comes?
- that's for undirected case
incorrectness of the proof on the greedy primal-dual method
- the chapter
  - p11: minimum violation set
  - p19: general algorithm
  - p40: pcst algorithm
- dtgs
- Def of minimal violation set in Figure 4.3 will be the roots of active components. Then dual constraints might be broken.
- However, if we let the violation set be all the active components and their subset, solution feasibility is not guaranteed. Even the reverse deletion step will not help.
- No idea on how to formulate integer program for PCST-DAG specifically
- No idea on the algorithm if we use the original integer program: specifically, what is the VIOLATION-SET oracle?
  - generate feasible solution
  - does not violate any dual constraints so that approximation guarantee can be proved
Is PCST-DAG NP-hard?
- how is PCST NP-hard derived? derive steiner tree(ST) to PCST as ST is special case of PCST
PCST-dag is NP-hard because Steiner-DAG is NP-hard and Steiner-DAG is special case of PCST-DAG
- (root, set node) with weight 1
- (set node, element node) with weight 0
for directed case: I only found resources for steiner problem
- check out A Compendium on Steiner Tree Problems
  - 1.32 budget problem
- Approximation Algorithms for Directed Steiner Problems: with approximation guarantee
  - density(combines cost and number of terminals) as the heuristic, high density is more likely to lead to near-optimal solution
  - recursive definition of optimal l-level tree based on the result of l-1-level trees(inspired from the worst case example)
- a series of ...
branch-and-cut algorithm by Ivana seems to be more general:
- the paper: how to prove the approximation ratio
  - why transform to directed version?
  - a different IP forumulation
  - some preprocessing:
    1. least-cost test: path from i to j has less cost than edge from i to j, remove the edge
  - branch-and-cut
    - insert the cut constraints(exponential) at the begining of the branching/separation step
    - why use the maxflow to find the violating constraints
  - no approximation guarantee

Sampling:

checked the query paper:
- it focues on modifying the result on the same tree
- however, we want to find trees on different parts of the graph
the root sampling stragegy
- simple upperbound and pruning using threshold
  - determining the threshold: U / average edge cost
- further sample by heuristic by choosing either of the following:
  - exploitation: subtree that has high log|V| * |V| / |C|(or simply density thresholded by subtree size) is more promising, we maintain a fixed-size queue for those roots
  - exploration: select those with high upper bound
  - probability to explore depends on how much we have covered the all nodes. For example, we covered 90% of the nodes, then the proba to explore is 0.1
- a pipline to add sampling preference to roots(maybe overkilling):
  - sample roots that are at the peak of the frequency against time
  - dense subgraph detection to identify candidates
  - preference on people/interaction on the dense interaction subgraph

xiaohan2012/pcst.md