utf library

Support for encoding and decoding Unicode characters in UTF-8, UTF-16, and UTF-32.

Classes

IterableUtf8Decoder
Return type of decodeUtf8AsIterable and variants. The Iterable type provides an iterator on demand and the iterator will only translate bytes as requested by the user of the iterator. (Note: results are not cached.)
IterableUtf16Decoder
Return type of decodeUtf16AsIterable and variants. The Iterable type provides an iterator on demand and the iterator will only translate bytes as requested by the user of the iterator. (Note: results are not cached.)
IterableUtf32Decoder
Return type of decodeUtf32AsIterable and variants. The Iterable type provides an iterator on demand and the iterator will only translate bytes as requested by the user of the iterator. (Note: results are not cached.)
Utf8Decoder
Provides an iterator of Unicode codepoints from UTF-8 encoded bytes. The parameters can set an offset into a list of bytes (as int), limit the length of the values to be decoded, and override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value. The return value from this method can be used as an Iterable (e.g. in a for-loop).
Utf8DecoderTransformer
StringTransformer that decodes a stream of UTF-8 encoded bytes.
Utf8EncoderTransformer
StringTransformer that UTF-8 encodes a stream of strings.
Utf16beBytesToCodeUnitsDecoder
Convert UTF-16BE encoded bytes to utf16 code units by grouping 1-2 bytes to produce the code unit (0-(2^16)-1).
Utf16BytesToCodeUnitsDecoder
Convert UTF-16 encoded bytes to UTF-16 code units by grouping 1-2 bytes to produce the code unit (0-(2^16)-1). Relies on BOM to determine endian-ness, and defaults to BE.
Utf16CodeUnitDecoder
An Iterator of codepoints built on an Iterator of UTF-16 code units. The parameters can override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
Utf16leBytesToCodeUnitsDecoder
Convert UTF-16LE encoded bytes to utf16 code units by grouping 1-2 bytes to produce the code unit (0-(2^16)-1).
Utf32beBytesDecoder
Convert UTF-32BE encoded bytes to codepoints by grouping 4 bytes to produce the unicode codepoint.
Utf32BytesDecoder
Abstrace parent class converts encoded bytes to codepoints.
Utf32leBytesDecoder
Convert UTF-32BE encoded bytes to codepoints by grouping 4 bytes to produce the unicode codepoint.

Constants

UNICODE_BOM → const int
0xfeff
UNICODE_BYTE_ONE_MASK → const int
0xff00
UNICODE_BYTE_ZERO_MASK → const int
0xff
UNICODE_PLANE_ONE_MAX → const int
0xffff
UNICODE_REPLACEMENT_CHARACTER_CODEPOINT → const int
Invalid codepoints or encodings may be substituted with the value U+fffd.
0xfffd
UNICODE_UTF16_HI_MASK → const int
0xffc00
UNICODE_UTF16_LO_MASK → const int
0x3ff
UNICODE_UTF16_OFFSET → const int
0x10000
UNICODE_UTF16_RESERVED_HI → const int
0xdfff
UNICODE_UTF16_RESERVED_LO → const int
0xd800
UNICODE_UTF16_SURROGATE_UNIT_0_BASE → const int
0xd800
UNICODE_UTF16_SURROGATE_UNIT_1_BASE → const int
0xdc00
UNICODE_UTF_BOM_HI → const int
0xfe
UNICODE_UTF_BOM_LO → const int
0xff
UNICODE_VALID_RANGE_MAX → const int
0x10ffff

Functions

codepointsToString(List<int> codepoints) → String
Generate a string from the provided Unicode codepoints. [...]
codepointsToUtf8(List<int> codepoints, [ int offset = 0, int length ]) → List<int>
Encode code points as UTF-8 code units.
decodeUtf8(List<int> bytes, [ int offset = 0, int length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) → String
Produce a String from a List of UTF-8 encoded bytes. The parameters can set an offset into a list of bytes (as int), limit the length of the values to be decoded, and override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
decodeUtf8AsIterable(List<int> bytes, [ int offset = 0, int length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) IterableUtf8Decoder
Decodes the UTF-8 bytes as an iterable. Thus, the consumer can only convert as much of the input as needed. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
decodeUtf16(List<int> bytes, [ int offset = 0, int length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) → String
Produce a String from a sequence of UTF-16 encoded bytes. This method always strips a leading BOM. Set the replacementCodepoint to null to throw an ArgumentError rather than replace the bad value. The default value for the replacementCodepoint is U+FFFD.
decodeUtf16AsIterable(List<int> bytes, [ int offset = 0, int length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) IterableUtf16Decoder
Decodes the UTF-16 bytes as an iterable. Thus, the consumer can only convert as much of the input as needed. Determines the byte order from the BOM, or uses big-endian as a default. This method always strips a leading BOM. Set the replacementCodepoint to null to throw an ArgumentError rather than replace the bad value. The default value for replacementCodepoint is U+FFFD.
decodeUtf16be(List<int> bytes, [ int offset = 0, int length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) → String
Produce a String from a sequence of UTF-16BE encoded bytes. This method strips a leading BOM by default, but can be overridden by setting the optional parameter stripBom to false. Set the replacementCodepoint to null to throw an ArgumentError rather than replace the bad value. The default value for the replacementCodepoint is U+FFFD.
decodeUtf16beAsIterable(List<int> bytes, [ int offset = 0, int length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) IterableUtf16Decoder
Decodes the UTF-16BE bytes as an iterable. Thus, the consumer can only convert as much of the input as needed. This method strips a leading BOM by default, but can be overridden by setting the optional parameter stripBom to false. Set the replacementCodepoint to null to throw an ArgumentError rather than replace the bad value. The default value for the replacementCodepoint is U+FFFD.
decodeUtf16le(List<int> bytes, [ int offset = 0, int length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) → String
Produce a String from a sequence of UTF-16LE encoded bytes. This method strips a leading BOM by default, but can be overridden by setting the optional parameter stripBom to false. Set the replacementCodepoint to null to throw an ArgumentError rather than replace the bad value. The default value for the replacementCodepoint is U+FFFD.
decodeUtf16leAsIterable(List<int> bytes, [ int offset = 0, int length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) IterableUtf16Decoder
Decodes the UTF-16LE bytes as an iterable. Thus, the consumer can only convert as much of the input as needed. This method strips a leading BOM by default, but can be overridden by setting the optional parameter stripBom to false. Set the replacementCodepoint to null to throw an ArgumentError rather than replace the bad value. The default value for the replacementCodepoint is U+FFFD.
decodeUtf32(List<int> bytes, [ int offset = 0, int length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) → String
Produce a String from a sequence of UTF-32 encoded bytes. The parameters allow an offset into a list of bytes (as int), limiting the length of the values be decoded and the ability of override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
decodeUtf32AsIterable(List<int> bytes, [ int offset = 0, int length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) IterableUtf32Decoder
Decodes the UTF-32 bytes as an iterable. Thus, the consumer can only convert as much of the input as needed. Determines the byte order from the BOM, or uses big-endian as a default. This method always strips a leading BOM. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
decodeUtf32be(List<int> bytes, [ int offset = 0, int length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) → String
Produce a String from a sequence of UTF-32BE encoded bytes. The parameters allow an offset into a list of bytes (as int), limiting the length of the values be decoded and the ability of override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
decodeUtf32beAsIterable(List<int> bytes, [ int offset = 0, int length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) IterableUtf32Decoder
Decodes the UTF-32BE bytes as an iterable. Thus, the consumer can only convert as much of the input as needed. This method strips a leading BOM by default, but can be overridden by setting the optional parameter stripBom to false. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
decodeUtf32le(List<int> bytes, [ int offset = 0, int length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) → String
Produce a String from a sequence of UTF-32LE encoded bytes. The parameters allow an offset into a list of bytes (as int), limiting the length of the values be decoded and the ability of override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
decodeUtf32leAsIterable(List<int> bytes, [ int offset = 0, int length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) IterableUtf32Decoder
Decodes the UTF-32LE bytes as an iterable. Thus, the consumer can only convert as much of the input as needed. This method strips a leading BOM by default, but can be overridden by setting the optional parameter stripBom to false. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
encodeUtf8(String str) → List<int>
Produce a sequence of UTF-8 encoded bytes from the provided string.
encodeUtf16(String str) → List<int>
Produce a list of UTF-16 encoded bytes. This method prefixes the resulting bytes with a big-endian byte-order-marker.
encodeUtf16be(String str, [ bool writeBOM = false ]) → List<int>
Produce a list of UTF-16BE encoded bytes. By default, this method produces UTF-16BE bytes with no BOM.
encodeUtf16le(String str, [ bool writeBOM = false ]) → List<int>
Produce a list of UTF-16LE encoded bytes. By default, this method produces UTF-16LE bytes with no BOM.
encodeUtf32(String str) → List<int>
Produce a list of UTF-32 encoded bytes. This method prefixes the resulting bytes with a big-endian byte-order-marker.
encodeUtf32be(String str, [ bool writeBOM = false ]) → List<int>
Produce a list of UTF-32BE encoded bytes. By default, this method produces UTF-32BE bytes with no BOM.
encodeUtf32le(String str, [ bool writeBOM = false ]) → List<int>
Produce a list of UTF-32LE encoded bytes. By default, this method produces UTF-32BE bytes with no BOM.
hasUtf16beBom(List<int> utf16EncodedBytes, [ int offset = 0, int length ]) → bool
Identifies whether a List of bytes starts (based on offset) with a big-endian byte-order marker (BOM).
hasUtf16Bom(List<int> utf32EncodedBytes, [ int offset = 0, int length ]) → bool
Identifies whether a List of bytes starts (based on offset) with a byte-order marker (BOM).
hasUtf16leBom(List<int> utf16EncodedBytes, [ int offset = 0, int length ]) → bool
Identifies whether a List of bytes starts (based on offset) with a little-endian byte-order marker (BOM).
hasUtf32beBom(List<int> utf32EncodedBytes, [ int offset = 0, int length ]) → bool
Identifies whether a List of bytes starts (based on offset) with a big-endian byte-order marker (BOM).
hasUtf32Bom(List<int> utf32EncodedBytes, [ int offset = 0, int length ]) → bool
Identifies whether a List of bytes starts (based on offset) with a byte-order marker (BOM).
hasUtf32leBom(List<int> utf32EncodedBytes, [ int offset = 0, int length ]) → bool
Identifies whether a List of bytes starts (based on offset) with a little-endian byte-order marker (BOM).
stringToCodepoints(String str) → List<int>
Provide a list of Unicode codepoints for a given string.
utf8ToCodepoints(List<int> utf8EncodedBytes, [ int offset = 0, int length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT ]) → List<int>

Typedefs

Utf32BytesDecoderProvider() Utf32BytesDecoder