tandemRepeat static method

List<List<int>> tandemRepeat(
  1. NucleotideSequence seq,
  2. int unitLen,
  3. int minRepeat,
  4. bool compareOnlyBase, {
  5. bool fuzzyComp = false,
})

(en) Search for tandem repeats (short sequence repeats).

(ja) タンデムリピート(短い配列の反復)を検索します。

  • seq : Sequence data.
  • unitLen : The length of the arrays that make up the repeats.
  • minRepeat : Specifies the minimum number of iterations to find.
  • compareOnlyBase : If true, compare only Nucleotide Base. If false, compare also replacement decoration and anotherName.
  • fuzzyComp : If true, Can contain m, r, w, s, y, k, v, h, d, b, n. If true, t and u are searched as the same.

Returns : [repeatStartPosition, repeatEndPosition,...]

Implementation

static List<List<int>> tandemRepeat(
    NucleotideSequence seq, int unitLen, int minRepeat, bool compareOnlyBase,
    {bool fuzzyComp = false}) {
  List<List<int>> r = [];
  late NucleotideSequence pattern;
  for (int i = 0; i <= seq.length() - unitLen * minRepeat; i++) {
    int repeatCount = 0;
    for (int j = 1; j * unitLen + i <= seq.length() - unitLen; j++) {
      if (compareOnlyBase) {
        pattern = seq.subSeqNonInfo(i, i + unitLen);
        if (UtilCompareNucleotide.compareBase(
            seq.subSeqNonInfo(i + j * unitLen, i + (j + 1) * unitLen),
            pattern,
            fuzzyComp)) {
          repeatCount++;
        } else {
          break;
        }
      } else {
        pattern = seq.subSeq(i, i + unitLen);
        if (UtilCompareNucleotide.compare(
            seq.subSeq(i + j * unitLen, i + (j + 1) * unitLen),
            pattern,
            fuzzyComp)) {
          repeatCount++;
        } else {
          break;
        }
      }
    }
    if (repeatCount >= minRepeat - 1) {
      r.add([i, i + unitLen * (repeatCount + 1)]);
    }
  }
  return r;
}