Chapter 1: Collections

知识解析

本文将 array, hash table, string 统称为collections。 array可以看作 "key是从0开始的连续整数的hash table",string可以看作是character组成的array。注意这里说的都是抽象意义上的array和string概念,与具体语言中的class Arrayclass String相区别。

1.1 Hash Table as ADT

Hash table的功能是基于"key"快速查找value (O(1)O(1)时间) ,存储的基本元素是key-value pair。

Hash table 是一种“归类”的数据结构,由hash_func(key) 得到bucket number,把key, value pair 放入对应的bucket。整个hash table就是一组bucket。hash_func就是hash function,负责把 key 转换成尽可能少的整数,作为key对应的bucket index。【Cracking the Coding Interview】介绍了一种简单而普遍的实现方式。

如果不同的key出现在同一个bucket,就叫做hash table的collision。hash function的设计原则,就是collision越少越好。

Hash 函数的具体实现方法,可以参考open hashing 和 closed hashing的相关资料:Meaning of open hashing and closed hashing

How keys allocated to different buckets via hash function

如果hash table的key的范围与连续的整数对应,比如keys是从az , 或者数字ij,那么可以把array当做hashtable,index与真正的key一一对应。比如用arr[0] 表示 key:akey:i对应的value。这种情况具体选择专门的hash数据结构还是array,取决于key的分布是密集 (array) 还是稀疏 (hash)。

特别地,如果value是只有两种状态,比如truefalse ,那么可以把二进制数或bit-vector作为hash table来记录。

特别地,Hash table还可以用Balanced BST来实现, lookup的时间由 O(1)O(1) 增长为 O(logn)O(logn),但可能节省space cost , 且可以按照key的顺序输出。

1.2 Reverse array / string

  1. 用头尾两个pointer,相向遍历,交换元素直到指针相遇 while l < r ,但不适用于数据流

  2. 顺序遍历元素,push入stack,再依次pop,也可以用递归

1.3 String Matching 的常见算法 (在string中查找substring)

假定源string长度为m,需要查找的substring长度为n

模式识别

1.4 在collection内检索和统计的问题

1.4.1 统计一个元素集中元素出现的次数

e.g. 1 Implement an algorithm to determine if a string has all unique characters. (Hint: 也可以用排序) e.g. 2 Write a method to decide if one string is a permutation of the other. (Hint: 也可以用排序) e.g. 3 Given a newspaper and message as two strings, check if the message can be composed using letters in the newspaper.

1.4.2 检索某一类元素的问题

e.g.1 Given an array of integers, find two numbers such that they add up to a specific number. (Leetcode: 2Sum)

e.g.2 Find all unique triplets in the array which gives the sum of zero.(Leetcode: 3Sum)

1.5 处理array或string问题的几点细节

1.5.1 数组或string内,问题的解与前驱子问题有递推关系

e.g. Given an unsorted array of integers, find the length of the longest consecutive elements sequence. (Leetcode: Longest Consecutive Sequence)

# 1. 前驱子问题的最大长度 + 当前节点的影响 => 到当前节点为止的最大长度
# 2. 要求当前节点对最大长度的影响,就要知道当前节点所在序列的长度,
#    就必须知道num-1和num+1所在序列的上界和下界

class Solution:
    def longestConsecutive(self, nums: List[int]) -> int:
        bounds = {}
        max_length = 0
        for num in nums:
            if num in bounds:
                continue

            # extend the bound via neighbors
            low, high = num, num

            if num - 1 in bounds:
                low = bounds[num-1][0]
            
            if num + 1 in bounds:
                high = bounds[num+1][1]

            # future extension can only be from low and high so only they need correct bounds
            bounds[high] = bounds[low] = bounds[num] = (low, high)

            # maintain max_length
            if high - low + 1 > max_length:
                max_length = high - low + 1
        
        return max_length  

1.5.2 求最值或总和的问题,其实就是1.5.1的最简单特例,只不过我们只需要了解之前紧邻的一个节点的解就能推导出当前解 :sum(n) = sum(n-1) + curr ,在update的过程中用 sum 一个变量记录即可

1.5.3 处理两个不等长的array或string,总是可以以while aIdx < aLength and bIdx < bLength 作为结束条件,再处理可能剩下的元素。

e.g Given two sorted arrays, merge B into A in sorted order. (Leetcode: Merge Sorted Array)

1.5.4 array或string修改后长度变化的问题

e.g. 1 Given input "Mr John Smith", output "Mr%20John%20Smith" e.g. 2 Given input "Mr%20John%20Smith" , output "Mr John Smith"

1.6** Sliding Window 问题

1.7 处理2D-Array的问题

e.g.1 Given an NxN matrix, rotate the matrix by 90 degrees. (从test case出发理解交换规律) e.g.2 If mat[i][j] is 0, then set the entire row and column to 0.

e.g.3 Given a matrix of m x n elements (m rows, n columns), return all elements of the matrix in spiral order. (LeetCode: Spiral Matrix)

def spiralOrder(self, matrix: List[List[int]]) -> List[int]:
    if len(matrix) == 0 or len(matrix[0]) == 0: return []

    seq = []
    row, col = 0, 0
    row_l, col_l = 0,0
    row_h, col_h = len(matrix), len(matrix[0])
    
    while True:
        # right
        for col in range(col_l, col_h):
            seq.append(matrix[row_l][col])
        row_l += 1
        if row_l == row_h: return seq
        
        # down
        for row in range(row_l, row_h):
            seq.append(matrix[row][col_h-1])
        col_h -= 1
        if col_l == col_h: return seq

        # left
        for col in reversed(range(col_l, col_h)):
            seq.append(matrix[row_h-1][col])
        row_h -= 1
        if row_l == row_h: return seq
        
        # up
        for row in reversed(range(row_l, row_h)):
            seq.append(matrix[row][col_l])
        col_l += 1
        if col_l == col_h: return seq

1.8 设计数字来映射/hash的情境

e.g. 1 Design a hash function that is suitable for words in a dictionary.(EPI: 12.1) (问题的特征: word不会很长,每一位的可能性有限且密集,可以对应0-25的整数) e.g. 2 A hash function for the state of a chess game.(EPI: 12.2) (问题的特征: 对每一次transition需要很容易的变换hash)

e.g. 3 Implement a method rand7() given rand5().(CtCI:17.11)

Last updated

Was this helpful?