Skip to main content

Warm-up to Tech Interview Prep

·963 words·5 mins
Adam Sweeney
Author
Adam Sweeney
Code is fun. Sharing is caring.
Tech Interview Prep - This article is part of a series.
Part 2: This Article

Counting Solves Problems!
#

For this second entry in the “Tech Interview Prep” series, we’ll look at a simple problem and a few solutions. 

Problem Statement
#

Given an array of integers, return true if the array contains at least one duplicate and false if all values are unique. 

Before We Write Any Code
#

During a typical technical interview, the problem statement is often intentionally vague. The expectation is to repeat the problem and ask clarifying questions. The answers to these questions will lead to different solutions. Some example questions include:

  • Will the array be sorted?
  • Can it be mutated?
  • Are there any runtime or memory requirements?

Solutions
#

Most of the solutions in this post display runtimes. The code used to generate those runtimes is available at the end of this post.

The Brute-force Method
#

The first thing that often comes to mind is the brute-force solution. We loop through the array and compare every element to all the subsequent elements. The pseudocode would look something like this:

for every element e:
    for all subsequent elements s:
        if e is equivalent to s:
            return true
return false

Here is the output from the program:

Brute force, no dupes - 1968 milliseconds, Result: false
Brute force, early dupe - 0 milliseconds, Result: true
Brute force, dupe at end - 1983 milliseconds, Result: true

The runtime of this solution is O(n2). As the array grows, the runtime of the algorithm also exponentially grows. In other words, this algorithm’s performance worsens exponentially relative to the growth of the input.

It is worth observing that if the duplicate is near the beginning of the array, the brute force algorithm is quite fast. Big-Oh notation only considers the worst-case.

Counting Method One
#

The counting methods utilize an extra data structure, the unordered set. A set can only contain unique values, and an unordered set doesn’t bother with any sorting.

The idea here is to iterate over the array and attempt to add each element to the set. If there are duplicates, at least one element will fail to be added to the set. The size of the set will be less than that of the array. The pseudocode looks like this:

for every element e:
    add e to set v
return e.size != v.size

Here is the output:

Full set, no dupes - 12 milliseconds, Result: false
Full set, early dupe - 10 milliseconds, Result: true
Full set, dupe at end - 9 milliseconds, Result: true

Much better! The runtime of this solution is O(n). The performance of the algorithm grows in lockstep with the size of the input. 

Observing the above algorithm, an opportunity for an efficiency gain presents itself. If a duplicate exists early in the array, the brute force algorithm will return early. The algorithm utilizing the set currently does not. 

Counting Method Two
#

For this method, we’ll just jump right into the pseudocode:

for every element e:
    add e to set v
    if the insertion failed
        return true
return false

The runtime of this solution is still O(n), but it has the benefit of early returning. 

Here is the output:

Set fast return, no dupes - 9 milliseconds, Result: false
Set fast return, early dupe - 0 milliseconds, Result: true
Set fast return, dupe at end - 10 milliseconds, Result: true

Other Methods
#

The focus of this post is on counting. Memory constraints may not allow for counting.

Instead of falling back to the brute force method, the array can be sorted. The sorted array is given a single pass where each element is compared to the next one for equivalence. This method runs in O(nlg n) time. The runtime of this algorithm worsens faster as the input grows, but slowly. It also requires no extra memory for the second data structure. The final tradeoff is that the original array is usually modified. If the extra memory required for a copy is allowed, counting is preferable.

The Code
#

Below is the program that generated the results shown in this post. It was run on my MacBook and compiled using the following command:

clang++ -Wall -Wextra -pedantic -std=c++20 -O2 dupes.cpp.

Across many compilations and executions the exact results varied but the ranges were very consistent.

#include <chrono>
#include <iostream>
#include <numeric>
#include <string>
#include <tuple>
#include <unordered_set>
#include <vector>

bool brute_force(std::vector<int> &nums) {
  for (std::size_t i = 0; i < nums.size() - 1; ++i) {
    for (std::size_t j = i + 1; j < nums.size(); ++j) {
      if (nums[i] == nums[j])
        return true;
    }
  }
  return false;
}

bool full_set(std::vector<int> &nums) {
  std::unordered_set<int> vals;
  for (auto i : nums) {
    vals.insert(i);
  }
  return vals.size() != nums.size();
}

bool set_first_dupe(std::vector<int> &nums) {
  std::unordered_set<int> vals;
  for (auto i : nums) {
    if (auto [itr, succeeded] = vals.insert(i); !succeeded)
      return true;
  }
  return false;
}

auto time_contains_duplicates(std::vector<int> &nums,
                              bool (*f)(std::vector<int> &)) {
  auto start = std::chrono::steady_clock::now();
  bool result = f(nums);
  auto end = std::chrono::steady_clock::now();

  return std::make_tuple(
      std::chrono::duration_cast<std::chrono::milliseconds>(end - start),
      result);
}

void print_results(std::string preface, std::vector<int> &nums,
                   bool (*f)(std::vector<int> &)) {
  auto [time, result] = time_contains_duplicates(nums, f);
  std::cout << preface << " - " << time.count()
            << " ms, Result: " << std::boolalpha << result << '\n';
}

int main() {
  constexpr int numElems = 100'000;

  std::vector<int> noDupes(numElems);
  std::iota(noDupes.begin(), noDupes.end(), 1);

  auto dupeAtBeginning(noDupes);
  dupeAtBeginning[3] = 1;

  auto dupeAtEnd(noDupes);
  dupeAtEnd[numElems - 3] = dupeAtEnd[numElems - 2];

  print_results("Brute force, no dupes", noDupes, brute_force);
  print_results("Brute force, early dupe", dupeAtBeginning, brute_force);
  print_results("Brute force, dupe at end", dupeAtEnd, brute_force);
  std::cout << '\n';
  print_results("Full set, no dupes", noDupes, full_set);
  print_results("Full set, early dupe", dupeAtBeginning, full_set);
  print_results("Full set, dupe at end", dupeAtEnd, full_set);
  std::cout << '\n';
  print_results("Set fast return, no dupes", noDupes, set_first_dupe);
  print_results("Set fast return, early dupe", dupeAtBeginning, set_first_dupe);
  print_results("Set fast return, dupe at end", dupeAtEnd, set_first_dupe);
}
Tech Interview Prep - This article is part of a series.
Part 2: This Article