language property
The language of the work in ISO 639-1 format. The language is automatically detected using the information we have about the work. We use the langdetect software library on the words in the work's abstract, or the title if we do not have the abstract. The source code for this procedure is here. Keep in mind that this method is not perfect, and that in some cases the language of the title or abstract could be different from the body of the work. A few things to keep in mind about this: We don't always assign a language if we do not have enough words available to accurately guess. We report the language of the metadata, not the full text. For example, if a work is in French, but the title and abstract are in English, we report the language as English. In some cases, abstracts are in two different languages. Unfortunately, when this happens, what we report will not be accurate.
Implementation
String? language;