some algorithmic puzzles, tutorials, interview questions... and stuff...

Saturday, October 31, 2009

How to find the missing number from an array

This is what I've been asked at an interview at Microsoft (over the phone): An array of length N (for example N = 6) is given, containing all the elements from 1 to N, except one which is duplicated. So instead of having:

   1:             1, 2, 3, 4, 5, 6
we have an array which has one number duplicated:

   1:             1, 2, 3, 4, 4, 6
(notice a duplicated "4"). One additional detail: the array is not ordered. This is an final example of such an array:

   1:             3, 4, 1, 6, 4, 2

The question is: how do you find the number which is missing?

There are multiple solutions:

1. We could order the array as the first step. This would have complexity O(n*log2n) in the best case considering merge sort, or quick sort in the average case. Next step would be to iterate through the array once, and check which number is the same as the previous one. We have the answer.

2. Previous solution was O(n*log2n) in time complexity and took no additional space to solve (depending on the sorting algorithm). A second solution would be to keep an extra array of boolean values, initialized with false. We can iterate through the array once, and switch the corresponding boolean value from false to true as we go. Once we find a value which is already true, we have the duplicated number. At the end, one of the values in the extra array will remain false. This is the answer. This solution is O(n) in time complexity (much better), but also has O(n) space complexity.

3. The last solution is the best but is a trick actually. Considering there are N numbers in the array, from 1 to N, it means their sum is (N*(N+1))/2. So if we do the sum in the array, it will be X, different than (N*(N+1))/2. Calculating the difference between the values gives us the answer to the questions in O(1) time complexity and O(1) space complexity.


  1. I hate to say but I can't see how your solution works. In addition, there are a few mistakes in your analysis. The sum is N*(N+1)/2 in your case and not N*(N-1)/2 and also the sum X doesn't always have to be less than N*(N+1)/2 (if for example 1 is missing or if I duplicate the missing 5 with a 6). For the example given, if I replace any number other than one with a number one less than it, then the sum will always be 20 (or mutatis mutandi 19 if I replace it with the number always 2 less than it). I don't see how, by your analysis, this would help me to find the missing number (5 in your example).

  2. Hey you have a point here, the sum is N*(N+1)/2 not N*(N-1)/2, I fixed this in the post. However the solution works.

    Assuming you add up the numbers and you get X, and the sum N*(N+1)/2 equals Y (just a notation), then your missing number will be abs(X-Y) - the absolute value of the difference between the two. The absolute value is taken because the missing number could be replaced by either a smaller or a bigger number. I hope this makes more sense.

  3. i still dont understand how u can conclude that 5 is the missing number

  4. Lets take an example :
    1 2 3 3 4
    So Sum = Y = 13
    It should be : X = n*(n+1)/ 2 = 5 * ( 5 +1 ) /2= 15

    so diff = abs(x-y) = 2 , but the missing number is 5 .

  5. then how would you modify the solution to make it work in O(1)?

  6. Anonymous, the example you gave is wrong:
    it should be:
    1 2 3 3 5
    Y = 14
    X = 15
    5 - abs(X - Y) = 4 therefore the missing number is 4

  7. Solution 3 also works but you should consider duplicate numbers in the sum
    ex:1 2 3 3 5
    X = 1+2+3+5 =11
    Y = n(n+1)/2 = 5*6/2 = 15
    Missing Number = 15-11 = 4

  8. Solution 3 can lead to integer overflow when N is large. It is better to use a HashSet in Java and go on adding elements.