Problem: https://leetcode.com/problems/surrounded-region
We have to find all regions of O's that are surrounded by X's, and convert them to X's.
The most trivial approach here would be taking every cell that contains an O, and checking to see whether or not it is connected to at least one of the borders through other O cells. If this is the case, that cell is not surrounded. Otherwise that cell is part of a (possibly large) surrounded region and has to marked with X:
O O O O O O O Outer ring of O's is not
O X X X X X O surrounded: starting from any
O X O O O X O of the O's in the outer ring
O X O X O X O we can reach a border.
O X O O O X O Inner ring of O's is surrounded:
O X X X X X O none of the O's in the inner
O O O O O O O ring can reach any of the borders.
But this approach is too slow: For every O-cell in the grid we are doing a separate DFS (clearing the visited marks), yielding
at worst case O(nm)
DFS's, totalling to O((nm).(nm))
running time, with n
and m
being the sizes of grid. We can optimize this by "coloring" each connected segment of O's,
causing us to visit every cell at most once, and maintaining an array of booleans colorReachesOutside
, where colorReachesOutside[c]
determines whether or not the O-cells that are colored with c
can reach outside of the grid. For any O-cell whose color cannot reach out to the borders of the grid, we replace it with X
.
But if we try to approach the problem in an inverse manner, we can avoid all of the complications of coloring: If we start one dfs (without clearing the visit markers) one by one from all of the O-cells on the boundary, every O-cell that we reach has a way out of the grid. Every other O-cell, has to be marked with X
because not being reachable from any O-cell on the border means it is locked inside a block enclosed by X's.
Every cell will be visited at most once (we're doing a single dfs i.e. maintaining the
visit markers between subsequent calls), yielding complexity of O(nm)
.
For the visits array, notice that we're only "walking" over the O-cells when doing the DFS. That meas we only need to keep the visit states from the cells that initially contain O's. Therefore one trick we can use to refrain from allocating extra memory for the visit array would be to initially mark all of the O's with N
, as in non-reachable.
Then we change every visited N
back to an O
as we're doing the DFS. We will be left
with some (possibly zero) N
s at the end of the algorithm: those are the original O-cells that are non-reachable. We have to mark them with X
:
class Solution {
// Check if a pair of coordinates falls inside the board. To be used in the dfs
inline int is_inbounds(const vector<vector<char>>& board, int row, int col) {
return 0 <= row && row < board.size() &&
0 <= col && col < board[0].size();
}
// DFS over the 'N' nodes that markes them with 'O's as it visits them. Doesn't
// revisit the cells that have been changed to 'O's, and does not visit 'X's at all.
void try_expand(vector<vector<char>>& board, int row, int col) {
if(!is_inbounds(board, row, col)) {
return;
}
if(board[row][col] != 'N') { // no revisit to 'O's, no visit to 'X's
return;
}
board[row][col] = 'O'; // changing 'N' to 'O' means changing non-reachable
// to reachable
int dr[4] = {-1, 1, 0, 0}; // direction array for doing the dfs. Changing the
int dc[4] = {0, 0, -1, 1}; // current row by dr[0] and current column by dc[0]
for(int i = 0; i < 4; i++) { // will result in going down (south).
// Similar for 1 2 3.
try_expand(board, row + dr[i], col + dc[i]);
}
}
public:
void solve(vector<vector<char>>& board) {
if(board.empty()) {
return; // This is necessary, otherwise board[0].size will fail
}
int numRows = board.size();
int numCols = board[0].size();
for(int i = 0; i < numRows; i++) {
for(int j = 0; j < numCols; j++) {
if(board[i][j] == 'O') { // Initially assume all O's are unreachable
board[i][j] = 'N';
}
}
}
// expand from the top and bottom borders. Notice that the visit markers of the
// dfs (which is the O and N markers in the board vector itself) are not reset
// between the calls.
for(int i = 0; i < numRows; i++) {
try_expand(board, i, 0);
try_expand(board, i, numCols - 1);
}
// expand from the left and right borders. Notice how we're excluding the already
// expanded cells on the border with j = [1, numCols - 1). However it wouldn't be
// incorrect if we hadn't done so: The calls would have just returned immediately
// if they had been expanded previously since the dfs changes 'N's to 'O's as it
// goes on, and only follows 'N's. Changing 'N's back to 'O's is our way of
// maintaining the visit markers. (O means visited, N means unvisited)
for(int j = 1; j < numCols - 1; j++) {
try_expand(board, 0, j);
try_expand(board, numRows - 1, j);
}
// Anything that is unreachable has to be marked with 'X', and reachables have
// been correctly changed back to 'O's by the dfs.
for(int i = 0; i < numRows; i++) {
for(int j = 0; j < numCols; j++) {
if(board[i][j] == 'N') {
board[i][j] = 'X';
}
}
}
}
};
On improving the trivial O(n^2.M^2)
solution above, we might be tempted to start from an O-cell (not necessarily on the border), do the dfs until we reach either the border or
some other cell that can reach the border, and if this happens, mark it with R
(as in reachable), and otherwise mark it with N
as in unreachable. However this is not a
correct approach and may lead to incorrect N
and R
states:
X X X X X X X X X X X X X X X X X X X X X X X X X X X X
X O O O O O X X O O * O O X X O O . O O X X O O . . . X
X O O O O O X -> X O O O O O X -> X O O . O O X -> X O O . . . X ->
O O O O O O X O O O O O O X O O O . O O X O O O . . . X
X O O O O O X X O O O O O X X O O . O O X X O O . . . X
X X X X X X X X X X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X X X X X X X X X X
X O O . . . X X O O . N N X X O O . N N X X R R R N N X
X O O . . . X -> X O O . N N X -> X O O . N N X -> X R R R N N X
O O O . . . X O O O . N N X . . O . N N X R R R R N N X
X O O . . . X X O O . N N X X . . . N N X X R R R N N X
X X X X X X X X X X X X X X X X X X X X X X X X X X X X
In the above map suppose we start the DFS from the O-cell marked with *
. .
Marks the path of not-yet-returned dfs calls, and N
and R
stand for
non-reachable and reachable respectively. The final state of the visits
incorrectly marks a number of the O-cells with N, because their DFS happened
to return before the first dfs call that allows us to "break out". Note that
their state is not fixable as we're not allowed to run more DFSs to correct
more and more states, as that would again lead O(nm)
dfs's at the worst case
and we lose the invariant of visiting each node at most once that would
allow us to assert that the overal complexity is O(nm)
.
In particular, any DFS relation that somehow depends on the order of visiting the nodes (In this case we are interested in returning from the dfs call to a border node, before returning from the DFS call to any non-border cells in the DFS spanning tree) could be problematic and incorrect. In this case, one approach to fix this shortcoming is using the coloring scheme described earlier.