The text to speak, specified as either plain text or an SSML document. If
your engine does not support SSML, you should strip out all XML markup and
synthesize only the underlying text content. The value of this parameter
is guaranteed to be no more than 32,768 characters. If this engine does
not support speaking that many characters at a time, the utterance should
be split into smaller chunks and queued internally without returning an
error.