BioKit is a Dart package for Bioinformatics.

Ensure that you have BioKit installed before continuing.

This document is intended to make you proficient with BioKit in the least amount of time possible; you can read through it sequentially, or if you're reading this on biokit.org, use the heading menu on the right side of the page to jump to a topic of interest.

If you want a deeper look at how BioKit works, view our API Reference.

Creating Sequences

Create a DNA, RNA or Peptide instance:

DNA dnaSeq = DNA(seq: 'ATGCTA');

RNA rnaSeq = RNA(seq: 'AUGCUA');

Peptide pepSeq = Peptide(seq: 'MSLAKR');

DNA and RNA classes must be initialized with a String of at least six valid nucleotides, while the Peptide class requires a minimum of two valid amino acids.

If any monomer in the sequence passed to the seq parameter is not valid for the class, an error is thrown.

Add Sequence Metadata

Optionally, you can add name, id, and desc metadata when you instantiate the class. Using DNA as an example:

DNA dnaSeq = DNA(seq: 'ATGCTA', name: 'My Name', id: 'My ID', desc: 'My Description');

If you do not set a value for the name, id, or desc fields at the time of instantiation, each will receive a default String value.

Get Properties

Return the values of the properties of a DNA, RNA, or Peptide instance:

dnaSeq.seq;
// ATGCTA

dnaSeq.len;
// 6 

dnaSeq.id;
// Default ID

dnaSeq.name;
// Default name

dnaSeq.desc;
// Default description

dnaSeq.type;
// dna

Set Properties

Update the properties of a DNA, RNA, or Peptide instance:

dnaSeq.name = 'New name';

dnaSeq.id = 'New ID';

dnaSeq.desc = 'New description';

Sequence Info

View information about a DNA, RNA, or Peptide instance by calling its info() method or printing it to the console:

dnaSeq.info();
/*
{
   "seq":"ATGCTA",
   "type":"dna",
   "monomers":6,
   "name":"New Name",
   "id":"New ID",
   "desc":"New description"
}
*/

print(dnaSeq);
/*
{
   "seq":"ATGCTA",
   "type":"dna",
   "monomers":6,
   "name":"New Name",
   "id":"New ID",
   "desc":"New description"
}
*/

Random Sequences

Return a random DNA, RNA, or Peptide instance with the random() method and pass the desired length of the sequence to the len parameter:

// A random DNA instance with 20 nucleotides.
DNA dnaSeq = DNA.random(len: 20);

dnaSeq.info();

/*
{
   "seq":"TAACTTCGATCGCTCTGGCA",
   "type":"dna",
   "monomers":20,
   "name":"Default Name",
   "id":"Default ID",
   "desc":"Default description"
}
*/

FASTA Data

BioKit contains a number of methods and functions for working with FASTA formatted data.

Uniprot ID

Return a String of protein data in FASTA format using the static uniprotIdToFASTA() method from the Utils class:


String proteinFASTA = await Utils.uniprotIdToFASTA(uniprotId: 'B5ZC00');

/*
>sp|B5ZC00|SYG_UREU1 Glycine--tRNA ligase OS=Ureaplasma urealyticum ...
MKNKFKTQEELVNHLKTVGFVFANSEIYNGLANAWDYGPLGVLLKNNLKNLWWKEFVTKQ
KDVVGLDSAIILNPLVWKASGHLDNFS ...
*/

Note that this method requires network access.

Read String

Use the readFASTA() method to parse FASTA formatted String data.

readFASTA() is able to parse FASTA files containing multiple sequences, and hence returns a List:

List<Map<String, String>> proteinMaps = await Utils.readFASTA(str: proteinFASTA);

/*
[
   {
      "seq":"MKNKFKTQEELVNHLKTVGFVFANSEIYNGLANAWDYGPLGVLLKNNLKNLWWKEFVTK ... ",
      "id":"sp|B5ZC00|SYG_UREU1",
      "desc":"Glycine--tRNA ligase OS=Ureaplasma urealyticum serovar 10 (... "
   }
]
*/

Read File

Read in data from a FASTA formatted txt file:

List<Map<String, String>> dnaMaps = await Utils.readFASTA(path: './gene_bank.txt');

/*
[
   {
      "seq":"GGCAGATTCCCCCTAGACCCGCCCGCACCATGGTCAGGCATGCCCCTCCTCATCGCTGG ... ",
      "id":"HSBGPG",
      "desc":"Human gene for bone gla protein (BGP)"
   },
   {
      "seq":"CCACTGCACTCACCGCACCCGGCCAATTTTTGTGTTTTTAGTAGAGACTAAATACCATA ... ",
      "id":"HSGLTH1",
      "desc":"Human theta 1-globin gene"
   }
]
*/

Write File

Write the contents of a DNA, RNA, or Peptide instance to a FASTA formatted txt file using the toFASTA() method:

// Get the first Map object.
Map<String, String> firstSeq = dnaMaps.first;

// Create a new DNA instance.
DNA dnaSeq = DNA(seq: firstSeq['seq']!, id: firstSeq['id']!, desc: firstSeq['desc']!);

// Write the instance contents to FASTA formatted file.
dnaSeq.toFASTA(path: '../deliverables', filename: 'my_dna_seq');

/*
>HSBGPG Human gene for bone gla protein (BGP)
GGCAGATTCCCCCTAGACCCGCCCGCACCATGGTCAGGCATGCCCCTCCTCATCGCTGGG
CACAGCCCAGAGGGTATAAACAGTGCTGGAGGCTGGCGGGGCAGGCCAGCTGAGTCCTGA
GCAGCAGCCCAGCGCAGCCACCGAGACA ...
*/

DNA Analysis Report

Create a DNA analysis report by calling the report() method on a DNA instance:

dnaSeq.report(path: '../deliverables', creator: 'John Doe', title: 'BGP Report');

+ Operator

Return the concatenated sequence result of two or more DNA, RNA, or Peptide instance sequences, of the same type, with the + operator:

RNA rnaSeq1 = RNA(seq: 'AUGCAG');
RNA rnaSeq2 = RNA(seq: 'GCUGAA');

rnaSeq1 + rnaSeq2; 
// "AUGCAGGCUGAA"

Reversing

Reverse a DNA, RNA, or Peptide instance's sequence with the reverse() method:

Peptide pepSeq = Peptide(seq: 'MPAG');

pepSeq.reverse();
// GAPM

Point Mutations

Return the number of positional-differences between two DNA, RNA, or Peptide instance sequences, of the same type, with the difference() method:

DNA dnaSeq1 = DNA(seq: 'ATGCAT');

// Difference: "A" at index 1, and "T" at index 4. 
DNA dnaSeq2 = DNA(seq: 'AAGCTT');

dnaSeq1.difference(oSeq: dnaSeq2)
// 2

Motif Detection

BioKit has a number of functions and methods to convert and detect matches between a motif and the sequence of a DNA, RNA, or Peptide instance.

Find Motifs

Return the indices of all matches between a DNA, RNA, or Peptide instance's sequence and the sequence passed to the findMotif() method's motif parameter:

RNA rnaSeq = RNA(seq: 'GAUAUAUC');

rnaSeq.findMotif(motif: 'AUAU');

/*
{
   "matchCount":2,
   "matchIndices":[
      {
         "match":"AUAU",
         "startIndex":1,
         "endIndex":4
      },
      {
         "match":"AUAU",
         "startIndex":3,
         "endIndex":6
      }
   ]
}
*/

Set overlap to false to return only the match indices that do not overlap:

rnaSeq.findMotif(motif: 'AUAU', overlap: false);

/*
{
   "matchCount":1,
   "matchIndices":[
      {
         "match":"AUAU",
         "startIndex":0,
         "endIndex":3
      }
   ]
}
*/

Shared Motifs

Return the longest shared motif between two DNA, RNA, or Peptide instance sequences, of the same type:

DNA dnaSeq1 = DNA('GATATA');

DNA dnaSeq2 = DNA('AGCATA');

dnaSeq1.sharedMotif(oSeq: dnaSeq2); 
// ATA

Manually Convert Motif to Regex

The findMotif() method automatically converts motifs passed to its motif parameter to regular-expression format, however, you can also perform the conversion manually using the motifToRe() function:

Utils.motifToRe(motif: 'N{P}[ST]{P}'); 
// 'N[^P][S|T|][^P]'

// No change needs to be made.
Utils.motifToRe(motif: 'ATGC');
// ATGC

Splicing

Return a sequence with all occurrences of a motif removed from a DNA, RNA, or Peptide instance's sequence using the splice method, and passing the motif to the motif parameter:

RNA rnaSeq = RNA(seq: 'AUCAUGU');

// Removes all occurrences of 'AU'.
rnaSeq.splice(motif: 'AU');
// CGU

Monomer Frequency

Return the frequency of each monomer in a DNA, RNA, or Peptide instance's sequence with the freq() method:

DNA dnaSeq = DNA(seq: 'AGCTTTTCAGC');

dnaSeq.freq();

/*
{
   "A":2.0,
   "G":2.0,
   "C":3.0,
   "T":4.0
}
*/

Percentage of Total

Return the percentage of the total that each monomer count represents in the sequence by passing true to the norm parameter of the freq() method:

dnaSeq.freq(norm: true);

/*
{
   "A":18.2,
   "G":18.2,
   "C":27.3,
   "T":36.4
}
*/

Ignore the Stop Amino Acid

When the translate() method is called on DNA or RNA instances, BioKit returns an amino acid sequence; when BioKit encounters a stop codon, rather than stoping translation, or ignoring the stop codon, BioKit places an "X" character at that position in the amino acid sequence:

// UAG is a stop codon
RNA rnaSeq = RNA(seq: 'CGGUAGACU'); 

rnaSeq.translate();

/*
{
   "aaSeq":"RXT",
   "nucCount":8,
   "aaCount":3
}
*/

Therefore, If you use the aaSeq key's value to create a new Peptide instance, and then execute the freq() method, the "X" character will be taken into account as part of the calculation:

// Create a Peptide instance using the RNA instance translation product.
Peptide pepSeq = Peptide(seq: rnaSeq.translate()['aaSeq']!);

pepSeq.freq(); 

/*
{
   "R":1.0,
   "X":1.0,
   "T":1.0
}
*/ 

However, if you do not want the "X" character to be taken into account as part of the calculation, pass true to the ignoreStopAA parameter of the freq() method:

pepSeq.freq(ignoreStopAA: true);

/*
{
   "R":1.0,
   "T":1.0
}
*/

Modified Sequence Length

In addition to being able to return the length of a DNA, RNA, or Peptide instance's sequence by using the len getter:

DNA dnaSeq = DNA(seq: 'ATGCGAT');

dnaSeq.len;
// 7 

You can also return the length of the sequence minus a particular monomer by using the lenMinus() method, and passing the monomer you'd like to discount:

dnaSeq.lenMinus(monomer: 'A');
// 5

Generate Combinations

Return all possible combinations of a DNA, RNA, or Peptide instance's sequence using the combinations() method:

Peptide pepSeq = Peptide(seq: 'MSTC');

pepSeq.combinations(); 
// [M, MS, MST, MSTC, S, ST, STC, T, TC]

Sort the combinations by setting sorted to true:

pepSeq.combinations(sorted: true);
// [MSTC, MST, STC, MS, ST, TC, M, S, T]

Codon Frequency

Return the frequency of a codon in a DNA or RNA instance's sequence using the codonFreq() method, passing the codon of interest to the codon parameter:

RNA rnaSeq = RNA(seq: 'AUGAGGAUGCACAUG');

rnaSeq.codonFreq(codon: 'AUG');
// 3 

Be aware that codonFreq() scans the sequence in batches of three nucleotides per step, starting with the first three nucleotides in the sequence. Therefore, the exact codon must be present in a batch in order to be detected.

Complementary Strand

Return the complementary strand to a DNA or RNA instance sequence's with the complementary() method:

DNA dnaSeq = DNA(seq: 'AAACCCGGT');

dnaSeq.complementary();
// TTTGGGCCA

To return the reverse complementary strand, pass true to the rev parameter:

dnaSeq.complementary(rev: true);
// ACCGGGTTT

Guanine & Cytosine Content

Return the percentage of Guanine and Cytosine content in a DNA or RNA instance's sequence with the gcContent() method:

DNA dnaSeq = DNA(seq: 'TCCCTACGCCG');

dnaSeq.gcContent();
// 72.73

Translation

Return the amino acid translation product from a DNA or RNA instance's sequence, using the translate() method:

RNA rnaSeq = RNA(seq: 'AUGGCCAUGGCGCCCAGAACU');

rnaSeq.translate();

/*
{
   "aaSeq":"MAMAPRT",
   "nucCount":20,
   "aaCount":7
}
*/

Return the reverse complementary translation strand by passing true to the rev parameter:

rnaSeq.translate(rev: true); 

/*
{
   "aaSeq":"SSGRHGH",
   "nucCount":20,
   "aaCount":7
}
*/

Modify the index in which translation starts by passing the desired start index to the startIdx parameter:

rnaSeq.translate(startIdx: 2);

/*
{
   "aaSeq":"GHGAQN",
   "nucCount":18,
   "aaCount":6
}
*/

Generate Proteins

Return proteins from open reading frames present in a DNA or RNA instance sequence's with the proteins() method:

DNA dnaSeq = DNA(seq: 'AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCTGAATGATCCGAGTAGCATCTCAG');

dnaSeq.proteins(); 
// [MLLGSFRGHPHVT, MGMTPE, MTPE, M, M]

Return only unique proteins by passing true to the unique parameter:

dnaSeq.proteins(unique: true);
// [MLLGSFRGHPHVT, MGMTPE, MTPE, M]

Transcription

Return the RNA transcription product from a DNA instance's sequence using the transcribe() method:

DNA dnaSeq = DNA(seq: 'TACGTAA');

dnaSeq.transcribe();
// UACGUAA

Change where transcription starts from by passing the desired start index to the startIdx parameter:

dnaSeq.transcribe(startIdx: 3); 
// GUAA

Restriction Sites

Return restriction sites in a DNA instance's sequence with the restrictionSites() method:

DNA dnaSeq = DNA(seq: 'TGCATGTCTATATG');

dnaSeq.restrictionSites();

/*
{
   "TGCA":[
      {
         "startIdx":0,
         "endIndex":4
      }
   ],
   "CATG":[
      {
         "startIdx":2,
         "endIndex":6
      }
   ],
   "TATA":[
      {
         "startIdx":8,
         "endIndex":12
      }
   ],
   "ATAT":[
      {
         "startIdx":9,
         "endIndex":13
      }
   ]
}
*/

Pass values to the minSiteLen and maxSiteLen parameters to change the restriction site search length.

Transition/Transversion Ratio

Return the transition/transversion ratio between two DNA instance sequences with the tranRatio() method:

DNA dnaSeq1 = DNA(seq: 'GACTGGTGGAAGT');

DNA dnaSeq2 = DNA(seq: 'TTATCGGCTGAAT');

dnaSeq1.tranRatio(oSeq: dnaSeq2); 
// 0.29

Note that if the number of transversions is equal to 0, the method returns -1, as division by 0 is undefined and leads to a result of inf.

Double Helix Geometric Length

Return the geometric length (nm) of a double helix formed by a DNA instance's sequence using the dHelixGeoLen() method:

DNA dnaSeq = DNA(seq: 'ATGCATGC');

dnaSeq.dHelixGeoLen();
// 2.72

Double Helix Turns

Return the number of turns in a double helix formed by a DNA instance's sequence using the dHelixTurns() method:

DNA dnaSeq = DNA(seq: 'ATGCATGCATGCATGC');

dnaSeq.dHelixTurns();
// 1.6 

Reverse Transcription

Return the reverse transcription product from an RNA instance's sequence using the revTranscribe() method:

RNA rnaSeq = RNA(seq: 'AUGCUAGU');

rnaSeq.revTranscribe();
// ATGCTAGT

Monoisotopic Mass

Return the Monoisotopic mass (Da) of a Peptide instance's sequence using the monoMass() method:

Peptide pepSeq = Peptide(seq: 'MSTGARVD');

pepSeq.monoMass();
// 817.38

Modify the number of decimal places by passing a the desired number of decimals to the decimals parameter:

pepSeq.monoMass(decimals: 1);
// 817.4

Return the Monoisotopic mass in kDa by passing true to the kDa parameter:

pepSeq.monoMass(kDa: true);
// 0.82

Libraries

biokit
BioKit is a Dart package for object-orientated Bioinformatics.