Job Interview Question #11 - Summing Numbers

8 July, 2008 – 5:07 pm

I’ve recently started following a blog which sets weekly programming challenges. I’m getting sick of Nintendo DS brain training and thought I’d try something more relevant to my line of work. Also, this gives me a good opportunity to learn more about my favoured scripting language at present, which is Ruby. So this week’s challenge states:

Given a list of n integers and another integer called m, determine (true / false) if there exist 2 numbers in that list which sum up to m.
Example: 2,6,4,9,1,12,7 and m=14 -> 2 and 12 sum up to 14, so the answer is true.
Provide the best algorithm in both manners: performance and memory to solve this puzzle. Don’t forget to mention the complexity of your solution!

Read on for my thoughts in solving this challenge.

Looking at the problem, you obviously start with an array of numbers called n. If the sum of two numbers from the array is m, then you know that the array *may* contain two elements which are equal to m, such as n[a] + n[b] = m.

However you don’t know which two elements of the array might achieve this, nor do you know how many pairs in the array would satisfy the same conditions.

So my first thoughts were to iterate through each element of the array, then for each element, subtract that value from m, and iterate through remaining elements in the array looking for that value.

def test_nosort(m,n) 
  found=0
  n.each do |a|
    b = m - a
    n.each do |x|
      found += 1 if (x.to_i == b)
    end
  end
   puts found if @debug
end

The problem with this approach, is that for a starting array size of n, you have to check all possible combinations or n*n-1. Say for an array of size 100 elements, you have to check each value in the array 100-1 times (excluding current value), which means 100*99 = 9,900 possible combinations!

In the example provided above, I used the #each iterator for the array believing this to be the fastest operation when iterating through an array in Ruby.

This approach is suitable for small sized arrays (< 100 elements), but quickly becomes unusable when scaled up. In hindsight, this approach seems very similar to a bubble sort algorithm, where the list to be sorted has two items compared at a time and swapping them if they’re in the wrong order; iterating as many times until no more swaps are needed. It has a worst-case complexity described as n^2 where n is the number of items being sorted.

In the next approach, I decided to give regex a try ( I love regex :) ), where in this case I still use an #each iterator to loop through the array, but use a regex grep comparison against the original array. The following code demonstrates:

def test_regex(m,n)
  found=0
  n.each do |a|
    b = m - a
    found += n.grep(b).length
  end
  puts found if @debug
end

This approach performed better than the first approach. I also experimented with some other array methods such as #select, #detect and #partition all of which gave varying but slow results.

After a much needed coffee and re-think, I decided to build a hash, where each key was based on the original values in my array, with the key-value equaling a count of all occurrences for that key… Then I could just use the array #each iterator and jump straight to the number of values that match that key. The following code demonstrates:

def test_hash(m,n)
  found=0
  hash = {}
  n.each do |a|
    hash.has_key? a ? hash[a] = hash[a].to_i + 1 : hash[a] = 1
  end
  n.each do |a|
    b = m - a
    found += hash[b] if hash.has_key? b
  end
  puts found if @debug
end

This approach was a marked success over previous approaches and it scales well for large arrays of numbers. You can see the differences in time (using the benchmark gem) for each approach when using an array size of 100 elements as follows:

      user     system      total        real
test_nosort results :
  0.530000   0.000000   0.530000 (  0.534364)
test_sort results   :
  0.530000   0.000000   0.530000 (  0.534992)
test_regex results  :
  0.360000   0.000000   0.360000 (  0.357073)
test_detect results :
  6.390000   0.040000   6.430000 (  6.480277)
test_select results :
  0.400000   0.000000   0.400000 (  0.403032)
test_partn results  :
  0.600000   0.010000   0.610000 (  0.623047)
test_hash results   :
  0.020000   0.000000   0.020000 (  0.021390)
test_binary results :
  0.270000   0.000000   0.270000 (  0.273587)

Even with array sizes of 10,000+ elements the benchmark was in the order of seconds.

However with an even larger array size of say 100,000+ elements, the memory usage of this hash search would be significant. I also tried a binary search but found this had worse performance than a hash. From reading wikipedia though it would appear that a binary tree search might be the best bet in terms of memory usage for large arrays.

The Answer?

For now I am satisfied that a hash key search is the least complex way to deal with this particular challenge, with the only caveat being a ceiling limit on the amount of elements that are in your initial array to begin with e.g. 1 - 10,000 elements. For larger arrays, binary tree algorithms might offer better memory performance but would be more complex to implement in a scripting language like Ruby.

The code I used to test this is attached below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
require 'benchmark'
 
# m = 14
m = rand(100) 
# n = [2,6,4,9,1,12,7]
n = Array.new(100) { |e| e = rand(100) } 
@debug = false
 
def test_nosort(m,n) 
  found=0
  n.each do |a|
    b = m - a
    n.each do |x|
      found += 1 if (x.to_i == b)
    end
  end
   puts found if @debug
end
 
def test_sort(m,n)
  found=0
  n.sort!
  n.each do |a|
     b = m - a
    n.reverse_each do |x|
      found += 1 if (x.to_i == b)
    end
  end
  puts found if @debug
end
 
def test_regex(m,n)
  found=0
  n.each do |a|
    b = m - a
    found += n.grep(b).length
  end
  puts found if @debug
end
 
def test_detect(m,n)
  found=0
  n.each do |a|
    b = m - a
    found += 1 if n.detect { |x| /\b#{b}\b/ =~ x.to_s }
  end
  puts found if @debug
end
 
def test_select(m,n)
  found=0
  n.each do |a|
    b = m - a
    found_array = n.select { |x| x == b }
    found += found_array.length
  end
   puts found if @debug
end
 
def test_partn(m,n)
  found=0
  n.each do |a|
    b = m - a
    found_array = n.partition { |x| x == b }
    found += found_array.length
  end
   puts found if @debug
end
 
def test_hash(m,n)
  found=0
  hash = {}
  n.each do |a|
    hash.has_key? a ? hash[a] = hash[a].to_i + 1 : hash[a] = 1
  end
  n.each do |a|
    b = m - a
    found += hash[b] if hash.has_key? b
  end
  puts found if @debug
end
 
def test_binary(m,n)
  found=0
  n.each do |a|
    b = m - a
    found += 1 if binary_search(n, b)
  end
  puts found if @debug
end
 
def binary_search(array, val, left = 0, right = nil)
  array.sort!
  right = array.size unless right;
  mid = (left + right) / 2;
 
  return nil if left > right
 
  if val == array[mid]
    return mid
  elsif val > array[mid]
    binary_search(array, val, mid + 1, right)
  else
    binary_search(array, val, left, mid - 1)
  end
end
 
Benchmark.bm() do |timer|
  @debug = false
  timer.report("test_nosort results :\n") { 100.times { test_nosort(m,n) }  }  
  timer.report("test_sort results   :\n") { 100.times { test_sort(m,n)   }  }    
  timer.report("test_regex results  :\n") { 100.times { test_regex(m,n)   }  }    
  timer.report("test_detect results :\n") { 100.times { test_detect(m,n)   }  }    
  timer.report("test_select results :\n") { 100.times { test_select(m,n)   }  }    
  timer.report("test_partn results  :\n") { 100.times { test_partn(m,n)   }  }    
  timer.report("test_hash results   :\n") { 100.times { test_hash(m,n)   }  }    
  timer.report("test_binary results :\n") { 100.times { test_binary(m,n)   }  }    
end
 
# >>       user     system      total        real
# >> test_nosort results :
# >>   0.530000   0.000000   0.530000 (  0.534364)
# >> test_sort results   :
# >>   0.530000   0.000000   0.530000 (  0.534992)
# >> test_regex results  :
# >>   0.360000   0.000000   0.360000 (  0.357073)
# >> test_detect results :
# >>   6.390000   0.040000   6.430000 (  6.480277)
# >> test_select results :
# >>   0.400000   0.000000   0.400000 (  0.403032)
# >> test_partn results  :
# >>   0.600000   0.010000   0.610000 (  0.623047)
# >> test_hash results   :
# >>   0.020000   0.000000   0.020000 (  0.021390)
# >> test_binary results :
# >>   0.270000   0.000000   0.270000 (  0.273587)
Share it: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Netscape
  • Reddit
  • Slashdot
  • Technorati
  • YahooMyWeb

Post a Comment

*
To prove that you're not a bot, enter this code
Anti-Spam Image