Repeated DNA Sequences

描述

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:

["AAAAACCCCC", "CCCCCAAAAA"].

分析

首先能想到一个简单直接的方法，用一个长度为10的窗口，从左到右扫描，放入 HashMap，并把计数器增一。最后，把 HashMap 中所有计数器大于1的字符串输出来。时间复杂度 O(n), 由于HashMap中存储了所有长度为10的子串，所以空间复杂度O(10n)。

由于字符串中只存在 A, C, G, T 四种字符，我们可以把每个字符映射为2个bit:

A -> 00
C -> 01
G -> 10
T -> 11

每个长度为10的字符串，可以映射为 20 bits, 小于32位，因此可以把这个字符串映射到一个整数。这个方法时间复杂度依旧是O(n)，但空间复杂度下降到了O(n)。

Repeated DNA Sequences

Repeated DNA Sequences

描述

分析

解法1 简单粗暴

解法2 完美哈希

results matching ""

No results matching ""