
BioKit is a Dart package for Bioinformatics.
Ensure that you have BioKit installed before continuing.
This document is intended to make you proficient with BioKit in the least amount of time possible; you can read through it sequentially, or if you're reading this on biokit.org, use the heading menu on the right side of the page to jump to a topic of interest.
If you want a deeper look at how BioKit works, view our API Reference.
Creating Sequences
Create a DNA
, RNA
or Peptide
instance:
DNA dnaSeq = DNA(seq: 'ATGCTA');
RNA rnaSeq = RNA(seq: 'AUGCUA');
Peptide pepSeq = Peptide(seq: 'MSLAKR');
DNA
and RNA
classes must be initialized with a String
of at least six valid nucleotides, while the Peptide
class requires a minimum of two valid amino acids.
If any monomer in the sequence passed to the seq
parameter is not valid for the class, an error
is thrown.
Add Sequence Metadata
Optionally, you can add name
, id
, and desc
metadata when you instantiate the class. Using DNA
as an example:
DNA dnaSeq = DNA(seq: 'ATGCTA', name: 'My Name', id: 'My ID', desc: 'My Description');
If you do not set a value for the name
, id
, or desc
fields at the time of instantiation, each will receive a default String
value.
Get Properties
Return the values of the properties of a DNA
, RNA
, or Peptide
instance:
dnaSeq.seq;
// ATGCTA
dnaSeq.len;
// 6
dnaSeq.id;
// Default ID
dnaSeq.name;
// Default name
dnaSeq.desc;
// Default description
dnaSeq.type;
// dna
Set Properties
Update the properties of a DNA
, RNA
, or Peptide
instance:
dnaSeq.name = 'New name';
dnaSeq.id = 'New ID';
dnaSeq.desc = 'New description';
Sequence Info
View information about a DNA
, RNA
, or Peptide
instance by calling its info()
method or printing it to the console:
dnaSeq.info();
/*
{
"seq":"ATGCTA",
"type":"dna",
"monomers":6,
"name":"New Name",
"id":"New ID",
"desc":"New description"
}
*/
print(dnaSeq);
/*
{
"seq":"ATGCTA",
"type":"dna",
"monomers":6,
"name":"New Name",
"id":"New ID",
"desc":"New description"
}
*/
Random Sequences
Return a random DNA
, RNA
, or Peptide
instance with the random()
method and pass the desired length of the sequence to the len
parameter:
// A random DNA instance with 20 nucleotides.
DNA dnaSeq = DNA.random(len: 20);
dnaSeq.info();
/*
{
"seq":"TAACTTCGATCGCTCTGGCA",
"type":"dna",
"monomers":20,
"name":"Default Name",
"id":"Default ID",
"desc":"Default description"
}
*/
FASTA Data
BioKit contains a number of methods and functions for working with FASTA formatted data.
Uniprot ID
Return a String
of protein data in FASTA format using the static uniprotIdToFASTA()
method from the Utils
class:
String proteinFASTA = await Utils.uniprotIdToFASTA(uniprotId: 'B5ZC00');
/*
>sp|B5ZC00|SYG_UREU1 Glycine--tRNA ligase OS=Ureaplasma urealyticum ...
MKNKFKTQEELVNHLKTVGFVFANSEIYNGLANAWDYGPLGVLLKNNLKNLWWKEFVTKQ
KDVVGLDSAIILNPLVWKASGHLDNFS ...
*/
Note that this method requires network access.
Read String
Use the readFASTA()
method to parse FASTA formatted String
data.
readFASTA()
is able to parse FASTA files containing multiple sequences, and hence returns a List
:
List<Map<String, String>> proteinMaps = await Utils.readFASTA(str: proteinFASTA);
/*
[
{
"seq":"MKNKFKTQEELVNHLKTVGFVFANSEIYNGLANAWDYGPLGVLLKNNLKNLWWKEFVTK ... ",
"id":"sp|B5ZC00|SYG_UREU1",
"desc":"Glycine--tRNA ligase OS=Ureaplasma urealyticum serovar 10 (... "
}
]
*/
Read File
Read in data from a FASTA formatted txt file:
List<Map<String, String>> dnaMaps = await Utils.readFASTA(path: './gene_bank.txt');
/*
[
{
"seq":"GGCAGATTCCCCCTAGACCCGCCCGCACCATGGTCAGGCATGCCCCTCCTCATCGCTGG ... ",
"id":"HSBGPG",
"desc":"Human gene for bone gla protein (BGP)"
},
{
"seq":"CCACTGCACTCACCGCACCCGGCCAATTTTTGTGTTTTTAGTAGAGACTAAATACCATA ... ",
"id":"HSGLTH1",
"desc":"Human theta 1-globin gene"
}
]
*/
Write File
Write the contents of a DNA
, RNA
, or Peptide
instance to a FASTA formatted txt file using the toFASTA()
method:
// Get the first Map object.
Map<String, String> firstSeq = dnaMaps.first;
// Create a new DNA instance.
DNA dnaSeq = DNA(seq: firstSeq['seq']!, id: firstSeq['id']!, desc: firstSeq['desc']!);
// Write the instance contents to FASTA formatted file.
dnaSeq.toFASTA(path: '../deliverables', filename: 'my_dna_seq');
/*
>HSBGPG Human gene for bone gla protein (BGP)
GGCAGATTCCCCCTAGACCCGCCCGCACCATGGTCAGGCATGCCCCTCCTCATCGCTGGG
CACAGCCCAGAGGGTATAAACAGTGCTGGAGGCTGGCGGGGCAGGCCAGCTGAGTCCTGA
GCAGCAGCCCAGCGCAGCCACCGAGACA ...
*/
DNA Analysis Report
Create a DNA analysis report by calling the report()
method on a DNA
instance:
dnaSeq.report(path: '../deliverables', creator: 'John Doe', title: 'BGP Report');
+ Operator
Return the concatenated sequence result of two or more DNA
, RNA
, or Peptide
instance sequences, of the same type, with the +
operator:
RNA rnaSeq1 = RNA(seq: 'AUGCAG');
RNA rnaSeq2 = RNA(seq: 'GCUGAA');
rnaSeq1 + rnaSeq2;
// "AUGCAGGCUGAA"
Reversing
Reverse a DNA
, RNA
, or Peptide
instance's sequence with the reverse()
method:
Peptide pepSeq = Peptide(seq: 'MPAG');
pepSeq.reverse();
// GAPM
Point Mutations
Return the number of positional-differences between two DNA
, RNA
, or Peptide
instance sequences, of the same type, with the difference()
method:
DNA dnaSeq1 = DNA(seq: 'ATGCAT');
// Difference: "A" at index 1, and "T" at index 4.
DNA dnaSeq2 = DNA(seq: 'AAGCTT');
dnaSeq1.difference(oSeq: dnaSeq2)
// 2
Motif Detection
BioKit has a number of functions and methods to convert and detect matches between a motif and the sequence of a DNA
, RNA
, or Peptide
instance.
Find Motifs
Return the indices of all matches between a DNA
, RNA
, or Peptide
instance's sequence and the sequence passed to the findMotif()
method's motif
parameter:
RNA rnaSeq = RNA(seq: 'GAUAUAUC');
rnaSeq.findMotif(motif: 'AUAU');
/*
{
"matchCount":2,
"matchIndices":[
{
"match":"AUAU",
"startIndex":1,
"endIndex":4
},
{
"match":"AUAU",
"startIndex":3,
"endIndex":6
}
]
}
*/
Set overlap
to false
to return only the match indices that do not overlap:
rnaSeq.findMotif(motif: 'AUAU', overlap: false);
/*
{
"matchCount":1,
"matchIndices":[
{
"match":"AUAU",
"startIndex":0,
"endIndex":3
}
]
}
*/
Shared Motifs
Return the longest shared motif between two DNA
, RNA
, or Peptide
instance sequences, of the same type:
DNA dnaSeq1 = DNA('GATATA');
DNA dnaSeq2 = DNA('AGCATA');
dnaSeq1.sharedMotif(oSeq: dnaSeq2);
// ATA
Manually Convert Motif to Regex
The findMotif()
method automatically converts motifs passed to its motif
parameter to regular-expression format, however, you can also perform the conversion manually using the motifToRe()
function:
Utils.motifToRe(motif: 'N{P}[ST]{P}');
// 'N[^P][S|T|][^P]'
// No change needs to be made.
Utils.motifToRe(motif: 'ATGC');
// ATGC
Splicing
Return a sequence with all occurrences of a motif removed from a DNA
, RNA
, or Peptide
instance's sequence using the splice
method, and passing the motif to the motif
parameter:
RNA rnaSeq = RNA(seq: 'AUCAUGU');
// Removes all occurrences of 'AU'.
rnaSeq.splice(motif: 'AU');
// CGU
Monomer Frequency
Return the frequency of each monomer in a DNA
, RNA
, or Peptide
instance's sequence with the freq()
method:
DNA dnaSeq = DNA(seq: 'AGCTTTTCAGC');
dnaSeq.freq();
/*
{
"A":2.0,
"G":2.0,
"C":3.0,
"T":4.0
}
*/
Percentage of Total
Return the percentage of the total that each monomer count represents in the sequence by passing true
to the norm
parameter of the freq()
method:
dnaSeq.freq(norm: true);
/*
{
"A":18.2,
"G":18.2,
"C":27.3,
"T":36.4
}
*/
Ignore the Stop Amino Acid
When the translate()
method is called on DNA
or RNA
instances, BioKit returns an amino acid sequence; when BioKit encounters a stop codon, rather than stoping translation, or ignoring the stop codon, BioKit places an "X" character at that position in the amino acid sequence:
// UAG is a stop codon
RNA rnaSeq = RNA(seq: 'CGGUAGACU');
rnaSeq.translate();
/*
{
"aaSeq":"RXT",
"nucCount":8,
"aaCount":3
}
*/
Therefore, If you use the aaSeq
key's value to create a new Peptide
instance, and then execute the freq()
method, the "X" character will be taken into account as part of the calculation:
// Create a Peptide instance using the RNA instance translation product.
Peptide pepSeq = Peptide(seq: rnaSeq.translate()['aaSeq']!);
pepSeq.freq();
/*
{
"R":1.0,
"X":1.0,
"T":1.0
}
*/
However, if you do not want the "X" character to be taken into account as part of the calculation, pass true
to the ignoreStopAA
parameter of the freq()
method:
pepSeq.freq(ignoreStopAA: true);
/*
{
"R":1.0,
"T":1.0
}
*/
Modified Sequence Length
In addition to being able to return the length of a DNA
, RNA
, or Peptide
instance's sequence by using the len
getter:
DNA dnaSeq = DNA(seq: 'ATGCGAT');
dnaSeq.len;
// 7
You can also return the length of the sequence minus a particular monomer by using the lenMinus()
method, and passing the monomer
you'd like to discount:
dnaSeq.lenMinus(monomer: 'A');
// 5
Generate Combinations
Return all possible combinations of a DNA
, RNA
, or Peptide
instance's sequence using the combinations()
method:
Peptide pepSeq = Peptide(seq: 'MSTC');
pepSeq.combinations();
// [M, MS, MST, MSTC, S, ST, STC, T, TC]
Sort the combinations by setting sorted
to true
:
pepSeq.combinations(sorted: true);
// [MSTC, MST, STC, MS, ST, TC, M, S, T]
Codon Frequency
Return the frequency of a codon in a DNA
or RNA
instance's sequence using the codonFreq()
method, passing the codon of interest to the codon
parameter:
RNA rnaSeq = RNA(seq: 'AUGAGGAUGCACAUG');
rnaSeq.codonFreq(codon: 'AUG');
// 3
Be aware that codonFreq()
scans the sequence in batches of three nucleotides per step, starting with the first three nucleotides in the sequence. Therefore, the exact codon
must be present in a batch in order to be detected.
Complementary Strand
Return the complementary strand to a DNA
or RNA
instance sequence's with the complementary()
method:
DNA dnaSeq = DNA(seq: 'AAACCCGGT');
dnaSeq.complementary();
// TTTGGGCCA
To return the reverse complementary strand, pass true
to the rev
parameter:
dnaSeq.complementary(rev: true);
// ACCGGGTTT
Guanine & Cytosine Content
Return the percentage of Guanine and Cytosine content in a DNA
or RNA
instance's sequence with the gcContent()
method:
DNA dnaSeq = DNA(seq: 'TCCCTACGCCG');
dnaSeq.gcContent();
// 72.73
Translation
Return the amino acid translation product from a DNA
or RNA
instance's sequence, using the translate()
method:
RNA rnaSeq = RNA(seq: 'AUGGCCAUGGCGCCCAGAACU');
rnaSeq.translate();
/*
{
"aaSeq":"MAMAPRT",
"nucCount":20,
"aaCount":7
}
*/
Return the reverse complementary translation strand by passing true
to the rev
parameter:
rnaSeq.translate(rev: true);
/*
{
"aaSeq":"SSGRHGH",
"nucCount":20,
"aaCount":7
}
*/
Modify the index in which translation starts by passing the desired start index to the startIdx
parameter:
rnaSeq.translate(startIdx: 2);
/*
{
"aaSeq":"GHGAQN",
"nucCount":18,
"aaCount":6
}
*/
Generate Proteins
Return proteins from open reading frames present in a DNA
or RNA
instance sequence's with the proteins()
method:
DNA dnaSeq = DNA(seq: 'AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCTGAATGATCCGAGTAGCATCTCAG');
dnaSeq.proteins();
// [MLLGSFRGHPHVT, MGMTPE, MTPE, M, M]
Return only unique proteins by passing true
to the unique
parameter:
dnaSeq.proteins(unique: true);
// [MLLGSFRGHPHVT, MGMTPE, MTPE, M]
Transcription
Return the RNA transcription product from a DNA
instance's sequence using the transcribe()
method:
DNA dnaSeq = DNA(seq: 'TACGTAA');
dnaSeq.transcribe();
// UACGUAA
Change where transcription starts from by passing the desired start index to the startIdx
parameter:
dnaSeq.transcribe(startIdx: 3);
// GUAA
Restriction Sites
Return restriction sites in a DNA
instance's sequence with the restrictionSites()
method:
DNA dnaSeq = DNA(seq: 'TGCATGTCTATATG');
dnaSeq.restrictionSites();
/*
{
"TGCA":[
{
"startIdx":0,
"endIndex":4
}
],
"CATG":[
{
"startIdx":2,
"endIndex":6
}
],
"TATA":[
{
"startIdx":8,
"endIndex":12
}
],
"ATAT":[
{
"startIdx":9,
"endIndex":13
}
]
}
*/
Pass values to the minSiteLen
and maxSiteLen
parameters to change the restriction site search length.
Transition/Transversion Ratio
Return the transition/transversion ratio between two DNA
instance sequences with the tranRatio()
method:
DNA dnaSeq1 = DNA(seq: 'GACTGGTGGAAGT');
DNA dnaSeq2 = DNA(seq: 'TTATCGGCTGAAT');
dnaSeq1.tranRatio(oSeq: dnaSeq2);
// 0.29
Note that if the number of transversions is equal to 0
, the method returns -1
, as division by 0
is undefined and leads to a result of inf
.
Double Helix Geometric Length
Return the geometric length (nm) of a double helix formed by a DNA
instance's sequence using the dHelixGeoLen()
method:
DNA dnaSeq = DNA(seq: 'ATGCATGC');
dnaSeq.dHelixGeoLen();
// 2.72
Double Helix Turns
Return the number of turns in a double helix formed by a DNA
instance's sequence using the dHelixTurns()
method:
DNA dnaSeq = DNA(seq: 'ATGCATGCATGCATGC');
dnaSeq.dHelixTurns();
// 1.6
Reverse Transcription
Return the reverse transcription product from an RNA
instance's sequence using the revTranscribe()
method:
RNA rnaSeq = RNA(seq: 'AUGCUAGU');
rnaSeq.revTranscribe();
// ATGCTAGT
Monoisotopic Mass
Return the Monoisotopic mass (Da) of a Peptide
instance's sequence using the monoMass()
method:
Peptide pepSeq = Peptide(seq: 'MSTGARVD');
pepSeq.monoMass();
// 817.38
Modify the number of decimal places by passing a the desired number of decimals to the decimals
parameter:
pepSeq.monoMass(decimals: 1);
// 817.4
Return the Monoisotopic mass in kDa by passing true
to the kDa
parameter:
pepSeq.monoMass(kDa: true);
// 0.82
Libraries
- biokit
- BioKit is a Dart package for object-orientated Bioinformatics.