Leetcode 726: Number of Atoms

Description

Given a string formula representing a chemical formula, return the count of each atom.

The atomic element always starts with an uppercase character, then zero or more lowercase letters, representing the name.

One or more digits representing that element's count may follow if the count is greater than 1. If the count is 1, no digits will follow.

For example, "H2O" and "H2O2" are possible, but "H1O2" is impossible.

Two formulas are concatenated together to produce another formula.

For example, "H2O2He3Mg4" is also a formula.

A formula placed in parentheses, and a count (optionally added) is also a formula.

For example, "(H2O2)" and "(H2O2)3" are formulas.

Return the count of all elements as a string in the following form: the first name (in sorted order), followed by its count (if that count is more than 1), followed by the second name (in sorted order), followed by its count (if that count is more than 1), and so on.

The test cases are generated so that all the values in the output fit in a 32-bit integer.

Link to the original problem

Example 1

Input: formula = "H2O"
Output: "H2O"
Explanation: The count of elements are {'H': 2, 'O': 1}.

Example 2

Input: formula = "Mg(OH)2"
Output: "H2MgO2"
Explanation: The count of elements are {'H': 2, 'Mg': 1, 'O': 2}.

Example 3

Input: formula = "K4(ON(SO3)2)2"
Output: "K4N2O14S4"
Explanation: The count of elements are {'K': 4, 'N': 2, 'O': 14, 'S': 4}.

Constraints:

1 <= formula.length <= 1000
formula consists of English letters, digits, '(', and ')'.
formula is always valid.

Idea

I have to say "Stack" is quite a cool data structure. Why? It's a fairly simple logic (first-in-last-out or last-in-first-out), but it can do magical things. For example, since you are probably a software engineer, you definitely heard about the concept of a "Stack Trace", right? Especially when you debug some production errors, you'll rely on some kind of stack traces (i.e., function calls) to find out where the error begins, and it's actually implemented with stack internally.
And you'll find stack in real life too. For example, imagine you are taking an elevator, and it's almost full. You enter and before you reach your floor, another person wants to get out, then you have to step out first to let him out and then you get in again. Oh my, this is so "stack", lol!
Back to business.
Actually this is not an easy question, so it's kind of hard to get to the point right way.
However, we can always start somewhere, and on the way, we might be able to find some pattern. So where?
Let's try to solve the example inputs first. And it's ok to start naively too.
For input "H2O", I think everyone can solve it easily. It's like, we see a letter "H", a number "2", so it's "H2". And we see a "O", and no more inputs, so it's "O". So aftering adding the previous "H2", we have "H2O".
For input "Mg(OH)2", from left to right, we see a "Mg", and then a "(", which means a nested level starts now. And it's a sign that we might need to start counting some temporary result before reaching the ")". Why, though? It's because that after the ")", we might find some number that we need to multiply it with the previous temporary result, and it could be any number, so it does not make sense to add it to the result before the "(", i.e., "Mg". And let's continue, we see a "O" and a "H", and then ")", so we know the temporary result is "OH". And we are expecting some number next. Indeed, we will find a "2", that means, we got "O2H2". And by adding the previous "Mg", we get "MgO2H2". However, it's not the final answer yet. Because the order of the letters are not sorted, so we need to sort it somehow and get "H2MgO2".
Before moving on to the next example input, let's think about how we should store the temporary result efficiently so that we can sort them easily. Since we are sorting by the letters, not by the numbers, we can record them separately, like in a map (or dictionary, depending how you name the data structure) with key-value pairs, where key being the letters and the value being the numbers. In that case, when we get the final map, we can build the answer easily.
For the example input "K4(ON(SO3)2)2". Since this one is a bit more complicated and I'll be very specific about the steps, and it might be very tedious too, unfortunately. (Bear with me though, because it'll make the idea very clear). So same process, first we have { K: 4 }, then "(" means a nested level starts. So we starts new and found { O: 1 } and then { O: 1, N: 1 }. Then, a "(" means another nested level starts. So we start new, and we found { S: 1 }, and then { S: 1, O: 3 }. And then a ")" means current level ends and we should expect a number, and we found "2", meaning we should multiply "2" to the previous result, which makes it { S: 2, O: 6 }. Before moving on, we need one more step, that is, we need to add this temporary result ({ S: 2, O: 6 }) back to the previous temporary result { O: 1, N: 1 } to make a new temporary result. Why though? Because, "ON" and "(SO3)2" are on the same level, and if we merge them as a new temporary result and we can continue the calculation in the same fashion, and no matter how many temporary results we have on the same level, we can just keep merging them to the first one. And it's just easier this way, otherwise, we need to keep record of how many temporary results we have on the same level, and this is just not worth the trouble. OK, after merging we have { S: 2, O: 7, N: 1 }. Moving on, we found a ")", which means the previous level is about to close, and we found a "2", so we have { S: 4, O: 14, N: 2 }. Same logic, we will merge it with previous { K: 4}, then we got { K: 4, S: 4, O: 14, N: 2 }. Now, it should be easy to get the final answer "K4N2O14S4".
So I've mentioned "temporary results" so many times, I'm pretty sure everyone is getting tired of it, so am I. So question is "how do we store the temporary results"? You might've already guessed it, Stack! Yep, a stack is our solution. Because it works perfectly with nested levels and can store tempory "state" on top of it, and can exit the stack with a last-in-first-out order, which is exactly what we need, nice!
Btw, someone might ask "what if we don't find a number after some ")""? And that's perfectly OK, because it just means we don't need to multiply the previous temporary result with some number, or you could think of it as multiplying it by a number "1".
So let's summarize what we have so far.
a) We need a stack to store the tempory results.
b) We start the temporary result by counting the letters and numbers. And we start new with each nested level when we meet a "(", and store previous result onto the stack.
c) When we meet a ")" we know we are about to close a temporary result. And when we meet a number, we multiply it with current temporary result, and add it back to the top of stack.
And notice, we are essentially trying to solve the same problems again and again. For example, for the third example, "K4(ON(SO3)2)2", the sequence we solved it will be "(SO3)2", then "(ON(SO3)2)2" and then "K4(ON(SO3)2)2".
The code should make more sense right now. I added more comments there to make it easier to understand. Let me know if it still does not make sense to you.
Thanks for reading!

Solution

Javascript

const countOfAtoms = function(formula) {
  // we split the formula by "letters", "numbers", "(", ")"
  // so we don't have to check the formula letter by letter.
  formula = formula.match(/[A-Z][a-z]*|\d+|[()]/g);
  const stk = [];
  let temp = {};
  for (let i = 0; i < formula.length; i += 1) {
    if (formula[i] == '(') {
      // A nested level starts here, we record what we have for
      // the temporary results to the stack.
      stk.push(temp);
      // And start new.
      temp = {}; // 
    } else if (formula[i].match(/^\d+$/)) {
      // We see a number, and before the number is a ')',
      if (formula[i-1] == ')') {
        // We multiply the number with the temporary results.
        for (const k in temp) temp[k] *= +formula[i];
        // And we merge the top of the stack to the temporary
        // result so it's always updated.  
        mergeTemp(stk.pop(), temp);
      } else {
        // If it's a letter before the number, we need to 
        // subtract it by 1, because we had it initialized as
        // { X: 1 } to begin with.
        temp[formula[i-1]] += +formula[i] - 1;
      }
    } else if (formula[i] != ')') {
      // If it's just letters, we update its count.
      // Note we might have the same letter on the same level.
      // For example, "H2...H4"
      temp[formula[i]] = temp[formula[i]] + 1 || 1;
    }
  }

  // 1. Make sure we merge all the temporary results on the
  //    outermost level, if thy exist on the stack.
  // 2. This is the bug where a input like "Mg(H2O)N" this would
  //    not include { Mg: 1 } to the final result, because we only 
  //    try merging stack top to temp when we see a number after
  //    ')' on line 21, but in this case, we don't see a number, so
  //    we just move on and leave { Mg: 1 } on the stack.
  // 3. We could potentially remove these kind of parenthesis,
  //    however, I don't think it's worth the trouble.
  while (stk.length) mergeTemp(stk.pop(), temp);

  // Now we have all the key-value pairs, let's sort them
  // and print them in order with desired format.
  let ans = '';
  Object.keys(temp).sort((a, b) => a.localeCompare(b)).map(k => {
    ans += `${k}${temp[k] > 1 ? temp[k] : ''}`;
  });
  return ans;

  function mergeTemp(prev, temp) {
    for (const k in prev) {
      temp[k] = temp[k] + prev[k] || prev[k];
    }
  }
};

References

Stack

any9527/leetcode_726.md

Description

Idea

Solution

Javascript

References