biokit 1.0.0 biokit: ^1.0.0 copied to clipboard
BioKit is a Dart package for Bioinformatics.
BioKit is a Dart package for Bioinformatics.
Ensure that you have BioKit installed before continuing.
This document is intended to make you proficient with BioKit in the least amount of time possible; you can read through it sequentially, or if you're reading this on biokit.org, use the heading menu on the right side of the page to jump to a topic of interest.
If you want a deeper look at how BioKit works, view our API Reference.
Creating Sequences #
Create a DNA
, RNA
or Peptide
instance:
DNA dnaSeq = DNA(seq: 'ATGCTA');
RNA rnaSeq = RNA(seq: 'AUGCUA');
Peptide pepSeq = Peptide(seq: 'MSLAKR');
DNA
and RNA
classes must be initialized with a String
of at least six valid nucleotides, while the Peptide
class requires a minimum of two valid amino acids.
If any monomer in the sequence passed to the seq
parameter is not valid for the class, an error
is thrown.
Add Sequence Metadata #
Optionally, you can add name
, id
, and desc
metadata when you instantiate the class. Using DNA
as an example:
DNA dnaSeq = DNA(seq: 'ATGCTA', name: 'My Name', id: 'My ID', desc: 'My Description');
If you do not set a value for the name
, id
, or desc
fields at the time of instantiation, each will receive a default String
value.
Get Properties #
Return the values of the properties of a DNA
, RNA
, or Peptide
instance:
dnaSeq.seq;
// ATGCTA
dnaSeq.len;
// 6
dnaSeq.id;
// Default ID
dnaSeq.name;
// Default name
dnaSeq.desc;
// Default description
dnaSeq.type;
// dna
Set Properties #
Update the properties of a DNA
, RNA
, or Peptide
instance:
dnaSeq.name = 'New name';
dnaSeq.id = 'New ID';
dnaSeq.desc = 'New description';
Sequence Info #
View information about a DNA
, RNA
, or Peptide
instance by calling its info()
method or printing it to the console:
dnaSeq.info();
/*
{
"seq":"ATGCTA",
"type":"dna",
"monomers":6,
"name":"New Name",
"id":"New ID",
"desc":"New description"
}
*/
print(dnaSeq);
/*
{
"seq":"ATGCTA",
"type":"dna",
"monomers":6,
"name":"New Name",
"id":"New ID",
"desc":"New description"
}
*/
Random Sequences #
Return a random DNA
, RNA
, or Peptide
instance with the random()
method and pass the desired length of the sequence to the len
parameter:
// A random DNA instance with 20 nucleotides.
DNA dnaSeq = DNA.random(len: 20);
dnaSeq.info();
/*
{
"seq":"TAACTTCGATCGCTCTGGCA",
"type":"dna",
"monomers":20,
"name":"Default Name",
"id":"Default ID",
"desc":"Default description"
}
*/
FASTA Data #
BioKit contains a number of methods and functions for working with FASTA formatted data.
Uniprot ID #
Return a String
of protein data in FASTA format using the static uniprotIdToFASTA()
method from the Utils
class:
String proteinFASTA = await Utils.uniprotIdToFASTA(uniprotId: 'B5ZC00');
/*
>sp|B5ZC00|SYG_UREU1 Glycine--tRNA ligase OS=Ureaplasma urealyticum ...
MKNKFKTQEELVNHLKTVGFVFANSEIYNGLANAWDYGPLGVLLKNNLKNLWWKEFVTKQ
KDVVGLDSAIILNPLVWKASGHLDNFS ...
*/
Note that this method requires network access.
Read String
#
Use the readFASTA()
method to parse FASTA formatted String
data.
readFASTA()
is able to parse FASTA files containing multiple sequences, and hence returns a List
:
List<Map<String, String>> proteinMaps = await Utils.readFASTA(str: proteinFASTA);
/*
[
{
"seq":"MKNKFKTQEELVNHLKTVGFVFANSEIYNGLANAWDYGPLGVLLKNNLKNLWWKEFVTK ... ",
"id":"sp|B5ZC00|SYG_UREU1",
"desc":"Glycine--tRNA ligase OS=Ureaplasma urealyticum serovar 10 (... "
}
]
*/
Read File #
Read in data from a FASTA formatted txt file:
List<Map<String, String>> dnaMaps = await Utils.readFASTA(path: './gene_bank.txt');
/*
[
{
"seq":"GGCAGATTCCCCCTAGACCCGCCCGCACCATGGTCAGGCATGCCCCTCCTCATCGCTGG ... ",
"id":"HSBGPG",
"desc":"Human gene for bone gla protein (BGP)"
},
{
"seq":"CCACTGCACTCACCGCACCCGGCCAATTTTTGTGTTTTTAGTAGAGACTAAATACCATA ... ",
"id":"HSGLTH1",
"desc":"Human theta 1-globin gene"
}
]
*/
Write File #
Write the contents of a DNA
, RNA
, or Peptide
instance to a FASTA formatted txt file using the toFASTA()
method:
// Get the first Map object.
Map<String, String> firstSeq = dnaMaps.first;
// Create a new DNA instance.
DNA dnaSeq = DNA(seq: firstSeq['seq']!, id: firstSeq['id']!, desc: firstSeq['desc']!);
// Write the instance contents to FASTA formatted file.
dnaSeq.toFASTA(path: '../deliverables', filename: 'my_dna_seq');
/*
>HSBGPG Human gene for bone gla protein (BGP)
GGCAGATTCCCCCTAGACCCGCCCGCACCATGGTCAGGCATGCCCCTCCTCATCGCTGGG
CACAGCCCAGAGGGTATAAACAGTGCTGGAGGCTGGCGGGGCAGGCCAGCTGAGTCCTGA
GCAGCAGCCCAGCGCAGCCACCGAGACA ...
*/
DNA Analysis Report #
Create a DNA analysis report by calling the report()
method on a DNA
instance:
dnaSeq.report(path: '../deliverables', creator: 'John Doe', title: 'BGP Report');
+ Operator #
Return the concatenated sequence result of two or more DNA
, RNA
, or Peptide
instance sequences, of the same type, with the +
operator:
RNA rnaSeq1 = RNA(seq: 'AUGCAG');
RNA rnaSeq2 = RNA(seq: 'GCUGAA');
rnaSeq1 + rnaSeq2;
// "AUGCAGGCUGAA"
Reversing #
Reverse a DNA
, RNA
, or Peptide
instance's sequence with the reverse()
method:
Peptide pepSeq = Peptide(seq: 'MPAG');
pepSeq.reverse();
// GAPM
Point Mutations #
Return the number of positional-differences between two DNA
, RNA
, or Peptide
instance sequences, of the same type, with the difference()
method:
DNA dnaSeq1 = DNA(seq: 'ATGCAT');
// Difference: "A" at index 1, and "T" at index 4.
DNA dnaSeq2 = DNA(seq: 'AAGCTT');
dnaSeq1.difference(oSeq: dnaSeq2)
// 2
Motif Detection #
BioKit has a number of functions and methods to convert and detect matches between a motif and the sequence of a DNA
, RNA
, or Peptide
instance.
Find Motifs #
Return the indices of all matches between a DNA
, RNA
, or Peptide
instance's sequence and the sequence passed to the findMotif()
method's motif
parameter:
RNA rnaSeq = RNA(seq: 'GAUAUAUC');
rnaSeq.findMotif(motif: 'AUAU');
/*
{
"matchCount":2,
"matchIndices":[
{
"match":"AUAU",
"startIndex":1,
"endIndex":4
},
{
"match":"AUAU",
"startIndex":3,
"endIndex":6
}
]
}
*/
Set overlap
to false
to return only the match indices that do not overlap:
rnaSeq.findMotif(motif: 'AUAU', overlap: false);
/*
{
"matchCount":1,
"matchIndices":[
{
"match":"AUAU",
"startIndex":0,
"endIndex":3
}
]
}
*/
Shared Motifs #
Return the longest shared motif between two DNA
, RNA
, or Peptide
instance sequences, of the same type:
DNA dnaSeq1 = DNA('GATATA');
DNA dnaSeq2 = DNA('AGCATA');
dnaSeq1.sharedMotif(oSeq: dnaSeq2);
// ATA
Manually Convert Motif to Regex #
The findMotif()
method automatically converts motifs passed to its motif
parameter to regular-expression format, however, you can also perform the conversion manually using the motifToRe()
function:
Utils.motifToRe(motif: 'N{P}[ST]{P}');
// 'N[^P][S|T|][^P]'
// No change needs to be made.
Utils.motifToRe(motif: 'ATGC');
// ATGC
Splicing #
Return a sequence with all occurrences of a motif removed from a DNA
, RNA
, or Peptide
instance's sequence using the splice
method, and passing the motif to the motif
parameter:
RNA rnaSeq = RNA(seq: 'AUCAUGU');
// Removes all occurrences of 'AU'.
rnaSeq.splice(motif: 'AU');
// CGU
Monomer Frequency #
Return the frequency of each monomer in a DNA
, RNA
, or Peptide
instance's sequence with the freq()
method:
DNA dnaSeq = DNA(seq: 'AGCTTTTCAGC');
dnaSeq.freq();
/*
{
"A":2.0,
"G":2.0,
"C":3.0,
"T":4.0
}
*/
Percentage of Total #
Return the percentage of the total that each monomer count represents in the sequence by passing true
to the norm
parameter of the freq()
method:
dnaSeq.freq(norm: true);
/*
{
"A":18.2,
"G":18.2,
"C":27.3,
"T":36.4
}
*/
Ignore the Stop Amino Acid #
When the translate()
method is called on DNA
or RNA
instances, BioKit returns an amino acid sequence; when BioKit encounters a stop codon, rather than stoping translation, or ignoring the stop codon, BioKit places an "X" character at that position in the amino acid sequence:
// UAG is a stop codon
RNA rnaSeq = RNA(seq: 'CGGUAGACU');
rnaSeq.translate();
/*
{
"aaSeq":"RXT",
"nucCount":8,
"aaCount":3
}
*/
Therefore, If you use the aaSeq
key's value to create a new Peptide
instance, and then execute the freq()
method, the "X" character will be taken into account as part of the calculation:
// Create a Peptide instance using the RNA instance translation product.
Peptide pepSeq = Peptide(seq: rnaSeq.translate()['aaSeq']!);
pepSeq.freq();
/*
{
"R":1.0,
"X":1.0,
"T":1.0
}
*/
However, if you do not want the "X" character to be taken into account as part of the calculation, pass true
to the ignoreStopAA
parameter of the freq()
method:
pepSeq.freq(ignoreStopAA: true);
/*
{
"R":1.0,
"T":1.0
}
*/
Modified Sequence Length #
In addition to being able to return the length of a DNA
, RNA
, or Peptide
instance's sequence by using the len
getter:
DNA dnaSeq = DNA(seq: 'ATGCGAT');
dnaSeq.len;
// 7
You can also return the length of the sequence minus a particular monomer by using the lenMinus()
method, and passing the monomer
you'd like to discount:
dnaSeq.lenMinus(monomer: 'A');
// 5
Generate Combinations #
Return all possible combinations of a DNA
, RNA
, or Peptide
instance's sequence using the combinations()
method:
Peptide pepSeq = Peptide(seq: 'MSTC');
pepSeq.combinations();
// [M, MS, MST, MSTC, S, ST, STC, T, TC]
Sort the combinations by setting sorted
to true
:
pepSeq.combinations(sorted: true);
// [MSTC, MST, STC, MS, ST, TC, M, S, T]
Codon Frequency #
Return the frequency of a codon in a DNA
or RNA
instance's sequence using the codonFreq()
method, passing the codon of interest to the codon
parameter:
RNA rnaSeq = RNA(seq: 'AUGAGGAUGCACAUG');
rnaSeq.codonFreq(codon: 'AUG');
// 3
Be aware that codonFreq()
scans the sequence in batches of three nucleotides per step, starting with the first three nucleotides in the sequence. Therefore, the exact codon
must be present in a batch in order to be detected.
Complementary Strand #
Return the complementary strand to a DNA
or RNA
instance sequence's with the complementary()
method:
DNA dnaSeq = DNA(seq: 'AAACCCGGT');
dnaSeq.complementary();
// TTTGGGCCA
To return the reverse complementary strand, pass true
to the rev
parameter:
dnaSeq.complementary(rev: true);
// ACCGGGTTT
Guanine & Cytosine Content #
Return the percentage of Guanine and Cytosine content in a DNA
or RNA
instance's sequence with the gcContent()
method:
DNA dnaSeq = DNA(seq: 'TCCCTACGCCG');
dnaSeq.gcContent();
// 72.73
Translation #
Return the amino acid translation product from a DNA
or RNA
instance's sequence, using the translate()
method:
RNA rnaSeq = RNA(seq: 'AUGGCCAUGGCGCCCAGAACU');
rnaSeq.translate();
/*
{
"aaSeq":"MAMAPRT",
"nucCount":20,
"aaCount":7
}
*/
Return the reverse complementary translation strand by passing true
to the rev
parameter:
rnaSeq.translate(rev: true);
/*
{
"aaSeq":"SSGRHGH",
"nucCount":20,
"aaCount":7
}
*/
Modify the index in which translation starts by passing the desired start index to the startIdx
parameter:
rnaSeq.translate(startIdx: 2);
/*
{
"aaSeq":"GHGAQN",
"nucCount":18,
"aaCount":6
}
*/
Generate Proteins #
Return proteins from open reading frames present in a DNA
or RNA
instance sequence's with the proteins()
method:
DNA dnaSeq = DNA(seq: 'AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCTGAATGATCCGAGTAGCATCTCAG');
dnaSeq.proteins();
// [MLLGSFRGHPHVT, MGMTPE, MTPE, M, M]
Return only unique proteins by passing true
to the unique
parameter:
dnaSeq.proteins(unique: true);
// [MLLGSFRGHPHVT, MGMTPE, MTPE, M]
Transcription #
Return the RNA transcription product from a DNA
instance's sequence using the transcribe()
method:
DNA dnaSeq = DNA(seq: 'TACGTAA');
dnaSeq.transcribe();
// UACGUAA
Change where transcription starts from by passing the desired start index to the startIdx
parameter:
dnaSeq.transcribe(startIdx: 3);
// GUAA
Restriction Sites #
Return restriction sites in a DNA
instance's sequence with the restrictionSites()
method:
DNA dnaSeq = DNA(seq: 'TGCATGTCTATATG');
dnaSeq.restrictionSites();
/*
{
"TGCA":[
{
"startIdx":0,
"endIndex":4
}
],
"CATG":[
{
"startIdx":2,
"endIndex":6
}
],
"TATA":[
{
"startIdx":8,
"endIndex":12
}
],
"ATAT":[
{
"startIdx":9,
"endIndex":13
}
]
}
*/
Pass values to the minSiteLen
and maxSiteLen
parameters to change the restriction site search length.
Transition/Transversion Ratio #
Return the transition/transversion ratio between two DNA
instance sequences with the tranRatio()
method:
DNA dnaSeq1 = DNA(seq: 'GACTGGTGGAAGT');
DNA dnaSeq2 = DNA(seq: 'TTATCGGCTGAAT');
dnaSeq1.tranRatio(oSeq: dnaSeq2);
// 0.29
Note that if the number of transversions is equal to 0
, the method returns -1
, as division by 0
is undefined and leads to a result of inf
.
Double Helix Geometric Length #
Return the geometric length (nm) of a double helix formed by a DNA
instance's sequence using the dHelixGeoLen()
method:
DNA dnaSeq = DNA(seq: 'ATGCATGC');
dnaSeq.dHelixGeoLen();
// 2.72
Double Helix Turns #
Return the number of turns in a double helix formed by a DNA
instance's sequence using the dHelixTurns()
method:
DNA dnaSeq = DNA(seq: 'ATGCATGCATGCATGC');
dnaSeq.dHelixTurns();
// 1.6
Reverse Transcription #
Return the reverse transcription product from an RNA
instance's sequence using the revTranscribe()
method:
RNA rnaSeq = RNA(seq: 'AUGCUAGU');
rnaSeq.revTranscribe();
// ATGCTAGT
Monoisotopic Mass #
Return the Monoisotopic mass (Da) of a Peptide
instance's sequence using the monoMass()
method:
Peptide pepSeq = Peptide(seq: 'MSTGARVD');
pepSeq.monoMass();
// 817.38
Modify the number of decimal places by passing a the desired number of decimals to the decimals
parameter:
pepSeq.monoMass(decimals: 1);
// 817.4
Return the Monoisotopic mass in kDa by passing true
to the kDa
parameter:
pepSeq.monoMass(kDa: true);
// 0.82