utf_convert library
Support for encoding and decoding Unicode characters in UTF-8, UTF-16, and UTF-32.
Classes
- IterableUtf16Decoder
- Return type of decodeUtf16AsIterable and variants. The Iterable type provides an iterator on demand and the iterator will only translate bytes as requested by the user of the iterator. (Note: results are not cached.)
- IterableUtf32Decoder
- Return type of decodeUtf32AsIterable and variants. The Iterable type provides an iterator on demand and the iterator will only translate bytes as requested by the user of the iterator. (Note: results are not cached.)
- IterableUtf8Decoder
- Return type of decodeUtf8AsIterable and variants. The Iterable type provides an iterator on demand and the iterator will only translate bytes as requested by the user of the iterator. (Note: results are not cached.)
- Utf16beBytesToCodeUnitsDecoder
- Convert UTF-16BE encoded bytes to utf16 code units by grouping 1-2 bytes to produce the code unit (0-(2^16)-1).
- Utf16BytesToCodeUnitsDecoder
- Convert UTF-16 encoded bytes to UTF-16 code units by grouping 1-2 bytes to produce the code unit (0-(2^16)-1). Relies on BOM to determine endian-ness, and defaults to BE.
- Utf16CodeUnitDecoder
- An Iterator
- Utf16leBytesToCodeUnitsDecoder
- Convert UTF-16LE encoded bytes to utf16 code units by grouping 1-2 bytes to produce the code unit (0-(2^16)-1).
- Utf32beBytesDecoder
- Convert UTF-32BE encoded bytes to codepoints by grouping 4 bytes to produce the unicode codepoint.
- Utf32BytesDecoder
- Abstract parent class converts encoded bytes to codepoints.
- Utf32leBytesDecoder
- Convert UTF-32BE encoded bytes to codepoints by grouping 4 bytes to produce the unicode codepoint.
- Utf8Decoder
- Provides an iterator of Unicode codepoints from UTF-8 encoded bytes. The parameters can set an offset into a list of bytes (as int), limit the length of the values to be decoded, and override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value. The return value from this method can be used as an Iterable (e.g. in a for-loop).
- Utf8DecoderTransformer
- StringTransformer that decodes a stream of UTF-8 encoded bytes.
- Utf8EncoderTransformer
- StringTransformer that UTF-8 encodes a stream of strings.
Constants
- UNICODE_BOM → const int
- UNICODE_BYTE_ONE_MASK → const int
- UNICODE_BYTE_ZERO_MASK → const int
- UNICODE_PLANE_ONE_MAX → const int
- UNICODE_REPLACEMENT_CHARACTER_CODEPOINT → const int
- Invalid codepoints or encodings may be substituted with the value U+fffd.
- UNICODE_UTF16_HI_MASK → const int
- UNICODE_UTF16_LO_MASK → const int
- UNICODE_UTF16_OFFSET → const int
- UNICODE_UTF16_RESERVED_HI → const int
- UNICODE_UTF16_RESERVED_LO → const int
- UNICODE_UTF16_SURROGATE_UNIT_0_BASE → const int
- UNICODE_UTF16_SURROGATE_UNIT_1_BASE → const int
- UNICODE_UTF_BOM_HI → const int
- UNICODE_UTF_BOM_LO → const int
- UNICODE_VALID_RANGE_MAX → const int
Functions
-
codepointsToString(
List< int> codepoints) → String - Generate a string from the provided Unicode codepoints.
-
codepointsToUtf8(
List< int?> codepoints, [int offset = 0, int? length]) → List<int?> - Encode code points as UTF-8 code units.
-
decodeUtf16(
List< int> bytes, [int offset = 0, int? length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → String -
Produce a String from a sequence of UTF-16 encoded bytes. This method always
strips a leading BOM. Set the
replacementCodepoint
to null to throw an ArgumentError rather than replace the bad value. The default value for thereplacementCodepoint
is U+FFFD. -
decodeUtf16AsIterable(
List< int> bytes, [int offset = 0, int? length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → IterableUtf16Decoder -
Decodes the UTF-16 bytes as an iterable. Thus, the consumer can only convert
as much of the input as needed. Determines the byte order from the BOM,
or uses big-endian as a default. This method always strips a leading BOM.
Set the
replacementCodepoint
to null to throw an ArgumentError rather than replace the bad value. The default value forreplacementCodepoint
is U+FFFD. -
decodeUtf16be(
List< int> bytes, [int offset = 0, int? length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → String -
Produce a String from a sequence of UTF-16BE encoded bytes. This method
strips a leading BOM by default, but can be overridden by setting the
optional parameter
stripBom
to false. Set thereplacementCodepoint
to null to throw an ArgumentError rather than replace the bad value. The default value for thereplacementCodepoint
is U+FFFD. -
decodeUtf16beAsIterable(
List< int> bytes, [int offset = 0, int? length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → IterableUtf16Decoder -
Decodes the UTF-16BE bytes as an iterable. Thus, the consumer can only
convert as much of the input as needed. This method strips a leading BOM by
default, but can be overridden by setting the optional parameter
stripBom
to false. Set thereplacementCodepoint
to null to throw an ArgumentError rather than replace the bad value. The default value for thereplacementCodepoint
is U+FFFD. -
decodeUtf16le(
List< int> bytes, [int offset = 0, int? length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → String -
Produce a String from a sequence of UTF-16LE encoded bytes. This method
strips a leading BOM by default, but can be overridden by setting the
optional parameter
stripBom
to false. Set thereplacementCodepoint
to null to throw an ArgumentError rather than replace the bad value. The default value for thereplacementCodepoint
is U+FFFD. -
decodeUtf16leAsIterable(
List< int> bytes, [int offset = 0, int? length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → IterableUtf16Decoder -
Decodes the UTF-16LE bytes as an iterable. Thus, the consumer can only
convert as much of the input as needed. This method strips a leading BOM by
default, but can be overridden by setting the optional parameter
stripBom
to false. Set thereplacementCodepoint
to null to throw an ArgumentError rather than replace the bad value. The default value for thereplacementCodepoint
is U+FFFD. -
decodeUtf32(
List< int> bytes, [int offset = 0, int? length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → String - Produce a String from a sequence of UTF-32 encoded bytes. The parameters allow an offset into a list of bytes (as int), limiting the length of the values be decoded and the ability of override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
-
decodeUtf32AsIterable(
List< int> bytes, [int offset = 0, int? length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → IterableUtf32Decoder - Decodes the UTF-32 bytes as an iterable. Thus, the consumer can only convert as much of the input as needed. Determines the byte order from the BOM, or uses big-endian as a default. This method always strips a leading BOM. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
-
decodeUtf32be(
List< int> bytes, [int offset = 0, int? length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → String - Produce a String from a sequence of UTF-32BE encoded bytes. The parameters allow an offset into a list of bytes (as int), limiting the length of the values be decoded and the ability of override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
-
decodeUtf32beAsIterable(
List< int> bytes, [int offset = 0, int? length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → IterableUtf32Decoder -
Decodes the UTF-32BE bytes as an iterable. Thus, the consumer can only convert
as much of the input as needed. This method strips a leading BOM by default,
but can be overridden by setting the optional parameter
stripBom
to false. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value. -
decodeUtf32le(
List< int> bytes, [int offset = 0, int? length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → String - Produce a String from a sequence of UTF-32LE encoded bytes. The parameters allow an offset into a list of bytes (as int), limiting the length of the values be decoded and the ability of override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
-
decodeUtf32leAsIterable(
List< int> bytes, [int offset = 0, int? length, bool stripBom = true, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → IterableUtf32Decoder -
Decodes the UTF-32LE bytes as an iterable. Thus, the consumer can only convert
as much of the input as needed. This method strips a leading BOM by default,
but can be overridden by setting the optional parameter
stripBom
to false. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value. -
decodeUtf8(
List< int> bytes, [int offset = 0, int? length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → String - Produce a String from a List of UTF-8 encoded bytes. The parameters can set an offset into a list of bytes (as int), limit the length of the values to be decoded, and override the default Unicode replacement character. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
-
decodeUtf8AsIterable(
List< int> bytes, [int offset = 0, int? length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → IterableUtf8Decoder - Decodes the UTF-8 bytes as an iterable. Thus, the consumer can only convert as much of the input as needed. Set the replacementCharacter to null to throw an ArgumentError rather than replace the bad value.
-
encodeUtf16(
String str) → List< int?> - Produce a list of UTF-16 encoded bytes. This method prefixes the resulting bytes with a big-endian byte-order-marker.
-
encodeUtf16be(
String str, [bool writeBOM = false]) → List< int?> - Produce a list of UTF-16BE encoded bytes. By default, this method produces UTF-16BE bytes with no BOM.
-
encodeUtf16le(
String str, [bool writeBOM = false]) → List< int?> - Produce a list of UTF-16LE encoded bytes. By default, this method produces UTF-16LE bytes with no BOM.
-
encodeUtf32(
String str) → List< int?> - Produce a list of UTF-32 encoded bytes. This method prefixes the resulting bytes with a big-endian byte-order-marker.
-
encodeUtf32be(
String str, [bool writeBOM = false]) → List< int?> - Produce a list of UTF-32BE encoded bytes. By default, this method produces UTF-32BE bytes with no BOM.
-
encodeUtf32le(
String str, [bool writeBOM = false]) → List< int?> - Produce a list of UTF-32LE encoded bytes. By default, this method produces UTF-32BE bytes with no BOM.
-
encodeUtf8(
String str) → List< int?> - Produce a sequence of UTF-8 encoded bytes from the provided string.
-
hasUtf16beBom(
List< int> utf16EncodedBytes, [int offset = 0, int? length]) → bool - Identifies whether a List of bytes starts (based on offset) with a big-endian byte-order marker (BOM).
-
hasUtf16Bom(
List< int> utf32EncodedBytes, [int offset = 0, int? length]) → bool - Identifies whether a List of bytes starts (based on offset) with a byte-order marker (BOM).
-
hasUtf16leBom(
List< int> utf16EncodedBytes, [int offset = 0, int? length]) → bool - Identifies whether a List of bytes starts (based on offset) with a little-endian byte-order marker (BOM).
-
hasUtf32beBom(
List< int> utf32EncodedBytes, [int offset = 0, int? length]) → bool - Identifies whether a List of bytes starts (based on offset) with a big-endian byte-order marker (BOM).
-
hasUtf32Bom(
List< int> utf32EncodedBytes, [int offset = 0, int? length]) → bool - Identifies whether a List of bytes starts (based on offset) with a byte-order marker (BOM).
-
hasUtf32leBom(
List< int> utf32EncodedBytes, [int offset = 0, int? length]) → bool - Identifies whether a List of bytes starts (based on offset) with a little-endian byte-order marker (BOM).
-
stringToCodepoints(
String str) → List< int?> - Provide a list of Unicode codepoints for a given string.
-
utf8ToCodepoints(
List< int> utf8EncodedBytes, [int offset = 0, int? length, int replacementCodepoint = UNICODE_REPLACEMENT_CHARACTER_CODEPOINT]) → List<int?>
Typedefs
- Utf32BytesDecoderProvider = Utf32BytesDecoder Function()