Implementation
static Result encodeMinimally(Input input) {
//@SuppressWarnings("checkstyle:lineLength")
/* The minimal encoding is computed by Dijkstra. The acyclic graph is modeled as follows:
* A vertex represents a combination of a position in the input and an encoding mode where position 0
* denotes the position left of the first character, 1 the position left of the second character and so on.
* Likewise the end vertices are located after the last character at position input.length().
* For any position there might be up to six vertices, one for each of the encoding types ASCII, C40, TEXT, X12,
* EDF and B256.
*
* As an example consider the input string "ABC123" then at position 0 there is only one vertex with the default
* ASCII encodation. At position 3 there might be vertices for the types ASCII, C40, X12, EDF and B256.
*
* An edge leading to such a vertex encodes one or more of the characters left of the position that the vertex
* represents. It encodes the characters in the encoding mode of the vertex that it ends on. In other words,
* all edges leading to a particular vertex encode the same characters (the length of the suffix can vary) using the same
* encoding mode.
* As an example consider the input string "ABC123" and the vertex (4,EDF). Possible edges leading to this vertex
* are:
* (0,ASCII) --EDF(ABC1)--> (4,EDF)
* (1,ASCII) --EDF(BC1)--> (4,EDF)
* (1,B256) --EDF(BC1)--> (4,EDF)
* (1,EDF) --EDF(BC1)--> (4,EDF)
* (2,ASCII) --EDF(C1)--> (4,EDF)
* (2,B256) --EDF(C1)--> (4,EDF)
* (2,EDF) --EDF(C1)--> (4,EDF)
* (3,ASCII) --EDF(1)--> (4,EDF)
* (3,B256) --EDF(1)--> (4,EDF)
* (3,EDF) --EDF(1)--> (4,EDF)
* (3,C40) --EDF(1)--> (4,EDF)
* (3,X12) --EDF(1)--> (4,EDF)
*
* The edges leading to a vertex are stored in such a way that there is a fast way to enumerate the edges ending
* on a particular vertex.
*
* The algorithm processes the vertices in order of their position thereby performing the following:
*
* For every vertex at position i the algorithm enumerates the edges ending on the vertex and removes all but the
* shortest from that list.
* Then it processes the vertices for the position i+1. If i+1 == input.length() then the algorithm ends
* and chooses the the edge with the smallest size from any of the edges leading to vertices at this position.
* Otherwise the algorithm computes all possible outgoing edges for the vertices at the position i+1
*
* Examples:
* The process is illustrated by showing the graph (edges) after each iteration from left to right over the input:
* An edge is drawn as follows "(" + fromVertex + ") -- " + encodingMode + "(" + encodedInput + ") (" +
* accumulatedSize + ") --> (" + toVertex + ")"
*
* Example 1 encoding the string "ABCDEFG":
*
*
* Situation after adding edges to the start vertex (0,ASCII)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII)
* (0,ASCII) B256(A) (3) --> (1,B256)
* (0,ASCII) EDF(AB) (4) --> (2,EDF)
* (0,ASCII) C40(ABC) (3) --> (3,C40)
* (0,ASCII) TEXT(ABC) (5) --> (3,TEXT)
* (0,ASCII) X12(ABC) (3) --> (3,X12)
* (0,ASCII) EDF(ABC) (4) --> (3,EDF)
* (0,ASCII) EDF(ABCD) (4) --> (4,EDF)
*
* Situation after adding edges to vertices at position 1
* (0,ASCII) ASCII(A) (1) --> (1,ASCII)
* (0,ASCII) B256(A) (3) --> (1,B256)
* (0,ASCII) EDF(AB) (4) --> (2,EDF)
* (0,ASCII) C40(ABC) (3) --> (3,C40)
* (0,ASCII) TEXT(ABC) (5) --> (3,TEXT)
* (0,ASCII) X12(ABC) (3) --> (3,X12)
* (0,ASCII) EDF(ABC) (4) --> (3,EDF)
* (0,ASCII) EDF(ABCD) (4) --> (4,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) B256(B) (4) --> (2,B256)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) EDF(BC) (5) --> (3,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) C40(BCD) (4) --> (4,C40)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) TEXT(BCD) (6) --> (4,TEXT)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) X12(BCD) (4) --> (4,X12)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) EDF(BCD) (5) --> (4,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) EDF(BCDE) (5) --> (5,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) ASCII(B) (4) --> (2,ASCII)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256)
* (0,ASCII) B256(A) (3) --> (1,B256) EDF(BC) (6) --> (3,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) C40(BCD) (5) --> (4,C40)
* (0,ASCII) B256(A) (3) --> (1,B256) TEXT(BCD) (7) --> (4,TEXT)
* (0,ASCII) B256(A) (3) --> (1,B256) X12(BCD) (5) --> (4,X12)
* (0,ASCII) B256(A) (3) --> (1,B256) EDF(BCD) (6) --> (4,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) EDF(BCDE) (6) --> (5,EDF)
*
* Edge "(1,ASCII) ASCII(B) (2) --> (2,ASCII)" is minimal for the vertex (2,ASCII) so that edge "(1,B256) ASCII(B) (4) --> (2,ASCII)" is removed.
* Edge "(1,B256) B256(B) (3) --> (2,B256)" is minimal for the vertext (2,B256) so that the edge "(1,ASCII) B256(B) (4) --> (2,B256)" is removed.
*
* Situation after adding edges to vertices at position 2
* (0,ASCII) ASCII(A) (1) --> (1,ASCII)
* (0,ASCII) B256(A) (3) --> (1,B256)
* (0,ASCII) EDF(AB) (4) --> (2,EDF)
* (0,ASCII) C40(ABC) (3) --> (3,C40)
* (0,ASCII) TEXT(ABC) (5) --> (3,TEXT)
* (0,ASCII) X12(ABC) (3) --> (3,X12)
* (0,ASCII) EDF(ABC) (4) --> (3,EDF)
* (0,ASCII) EDF(ABCD) (4) --> (4,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) EDF(BC) (5) --> (3,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) C40(BCD) (4) --> (4,C40)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) TEXT(BCD) (6) --> (4,TEXT)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) X12(BCD) (4) --> (4,X12)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) EDF(BCD) (5) --> (4,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) EDF(BCDE) (5) --> (5,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256)
* (0,ASCII) B256(A) (3) --> (1,B256) EDF(BC) (6) --> (3,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) C40(BCD) (5) --> (4,C40)
* (0,ASCII) B256(A) (3) --> (1,B256) TEXT(BCD) (7) --> (4,TEXT)
* (0,ASCII) B256(A) (3) --> (1,B256) X12(BCD) (5) --> (4,X12)
* (0,ASCII) B256(A) (3) --> (1,B256) EDF(BCD) (6) --> (4,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) EDF(BCDE) (6) --> (5,EDF)
* (0,ASCII) EDF(AB) (4) --> (2,EDF) ASCII(C) (5) --> (3,ASCII)
* (0,ASCII) EDF(AB) (4) --> (2,EDF) B256(C) (6) --> (3,B256)
* (0,ASCII) EDF(AB) (4) --> (2,EDF) EDF(CD) (7) --> (4,EDF)
* (0,ASCII) EDF(AB) (4) --> (2,EDF) C40(CDE) (6) --> (5,C40)
* (0,ASCII) EDF(AB) (4) --> (2,EDF) TEXT(CDE) (8) --> (5,TEXT)
* (0,ASCII) EDF(AB) (4) --> (2,EDF) X12(CDE) (6) --> (5,X12)
* (0,ASCII) EDF(AB) (4) --> (2,EDF) EDF(CDE) (7) --> (5,EDF)
* (0,ASCII) EDF(AB) (4) --> (2,EDF) EDF(CDEF) (7) --> (6,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) ASCII(C) (3) --> (3,ASCII)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) B256(C) (5) --> (3,B256)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) EDF(CD) (6) --> (4,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) C40(CDE) (5) --> (5,C40)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) TEXT(CDE) (7) --> (5,TEXT)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) X12(CDE) (5) --> (5,X12)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) EDF(CDE) (6) --> (5,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) EDF(CDEF) (6) --> (6,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) ASCII(C) (4) --> (3,ASCII)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) B256(C) (4) --> (3,B256)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) EDF(CD) (6) --> (4,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) C40(CDE) (5) --> (5,C40)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) TEXT(CDE) (7) --> (5,TEXT)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) X12(CDE) (5) --> (5,X12)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) EDF(CDE) (6) --> (5,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) EDF(CDEF) (6) --> (6,EDF)
*
* Edge "(2,ASCII) ASCII(C) (3) --> (3,ASCII)" is minimal for the vertex (3,ASCII) so that edges "(2,EDF) ASCII(C) (5) --> (3,ASCII)"
* and "(2,B256) ASCII(C) (4) --> (3,ASCII)" can be removed.
* Edge "(0,ASCII) EDF(ABC) (4) --> (3,EDF)" is minimal for the vertex (3,EDF) so that edges "(1,ASCII) EDF(BC) (5) --> (3,EDF)"
* and "(1,B256) EDF(BC) (6) --> (3,EDF)" can be removed.
* Edge "(2,B256) B256(C) (4) --> (3,B256)" is minimal for the vertex (3,B256) so that edges "(2,ASCII) B256(C) (5) --> (3,B256)"
* and "(2,EDF) B256(C) (6) --> (3,B256)" can be removed.
*
* This continues for vertices 3 thru 7
*
* Situation after adding edges to vertices at position 7
* (0,ASCII) ASCII(A) (1) --> (1,ASCII)
* (0,ASCII) B256(A) (3) --> (1,B256)
* (0,ASCII) EDF(AB) (4) --> (2,EDF)
* (0,ASCII) C40(ABC) (3) --> (3,C40)
* (0,ASCII) TEXT(ABC) (5) --> (3,TEXT)
* (0,ASCII) X12(ABC) (3) --> (3,X12)
* (0,ASCII) EDF(ABC) (4) --> (3,EDF)
* (0,ASCII) EDF(ABCD) (4) --> (4,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) C40(BCD) (4) --> (4,C40)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) TEXT(BCD) (6) --> (4,TEXT)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) X12(BCD) (4) --> (4,X12)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) EDF(BCDE) (5) --> (5,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256)
* (0,ASCII) C40(ABC) (3) --> (3,C40) C40(DEF) (5) --> (6,C40)
* (0,ASCII) X12(ABC) (3) --> (3,X12) X12(DEF) (5) --> (6,X12)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) ASCII(C) (3) --> (3,ASCII)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) C40(CDE) (5) --> (5,C40)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) TEXT(CDE) (7) --> (5,TEXT)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) X12(CDE) (5) --> (5,X12)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) EDF(CDEF) (6) --> (6,EDF)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) C40(BCD) (4) --> (4,C40) C40(EFG) (6) --> (7,C40) //Solution 1
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) X12(BCD) (4) --> (4,X12) X12(EFG) (6) --> (7,X12) //Solution 2
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) B256(C) (4) --> (3,B256)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) ASCII(C) (3) --> (3,ASCII) ASCII(D) (4) --> (4,ASCII)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) ASCII(C) (3) --> (3,ASCII) TEXT(DEF) (8) --> (6,TEXT)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) ASCII(C) (3) --> (3,ASCII) EDF(DEFG) (7) --> (7,EDF)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) B256(C) (4) --> (3,B256) B256(D) (5) --> (4,B256)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) ASCII(C) (3) --> (3,ASCII) ASCII(D) (4) --> (4,ASCII) ASCII(E) (5) --> (5,ASCII)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) ASCII(C) (3) --> (3,ASCII) ASCII(D) (4) --> (4,ASCII) TEXT(EFG) (9) --> (7,TEXT)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) B256(C) (4) --> (3,B256) B256(D) (5) --> (4,B256) B256(E) (6) --> (5,B256)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) ASCII(C) (3) --> (3,ASCII) ASCII(D) (4) --> (4,ASCII) ASCII(E) (5) --> (5,ASCII) ASCII(F) (6) --> (6,ASCII)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) B256(C) (4) --> (3,B256) B256(D) (5) --> (4,B256) B256(E) (6) --> (5,B256) B256(F) (7) --> (6,B256)
* (0,ASCII) ASCII(A) (1) --> (1,ASCII) ASCII(B) (2) --> (2,ASCII) ASCII(C) (3) --> (3,ASCII) ASCII(D) (4) --> (4,ASCII) ASCII(E) (5) --> (5,ASCII) ASCII(F) (6) --> (6,ASCII) ASCII(G) (7) --> (7,ASCII)
* (0,ASCII) B256(A) (3) --> (1,B256) B256(B) (3) --> (2,B256) B256(C) (4) --> (3,B256) B256(D) (5) --> (4,B256) B256(E) (6) --> (5,B256) B256(F) (7) --> (6,B256) B256(G) (8) --> (7,B256)
*
* Hence a minimal encoding of "ABCDEFG" is either ASCII(A),C40(BCDEFG) or ASCII(A), X12(BCDEFG) with a size of 5 bytes.
*/
final inputLength = input.length;
// Array that represents vertices. There is a vertex for every character and mode.
// The last dimension in the array below encodes the 6 modes ASCII, C40, TEXT, X12, EDF and B256
final List<List<Edge?>> edges = List.generate(
inputLength + 1,
(index) => List.filled(6, null),
);
addEdges(input, edges, 0, null);
for (int i = 1; i <= inputLength; i++) {
for (int j = 0; j < 6; j++) {
if (edges[i][j] != null && i < inputLength) {
addEdges(input, edges, i, edges[i][j]);
}
}
//optimize memory by removing edges that have been passed.
edges[i - 1].fillRange(0, 6, null);
}
int minimalJ = -1;
int minimalSize = MathUtils.maxValue;
for (int j = 0; j < 6; j++) {
if (edges[inputLength][j] != null) {
final edge = edges[inputLength][j]!;
//C40, TEXT and X12 need an extra unlatch at the end
final size = (j >= 1 && j <= 3)
? edge.cachedTotalSize + 1
: edge.cachedTotalSize;
if (size < minimalSize) {
minimalSize = size;
minimalJ = j;
}
}
}
if (minimalJ < 0) {
// IllegalStateException
throw StateError('Failed to encode "$input"');
}
return Result(edges[inputLength][minimalJ]!);
}