Dictionary class that uses the Oxford Dictionaries REST API.
Oxford Dictionaries, the Oxford Dictionaries logo, Oxford University Press, OUP, Oxford and/or any other names of products or services provided by Oxford University Press and referred to in this package are either trademarks or registered trademarks of Oxford University Press.
Skip to section:
Overview
The oxford_dictionaries
library uses endpoints of the Oxford Dictionaries API to return lexical data. To use this library you need to sign up for an account to obtain API keys.
The OxfordDictionaries class implements the DictoSaurus interface that includes includes dictionary, thesaurus and term expansion utilities.
The implementation in this library uses six (out of nine) Oxford Dictionaries API endpoints to populate a DictionaryEntry object and provide some translation services to/from English:
- the EntriesEndpoint retrieves definitions, pronunciations example sentences, grammatical information and word origins;
- the ThesaurusEndpoint retrieves words that are similar/opposite in meaning to the input word (synonym /antonym);
- the LemmasEndpoint checks if a word exists in the dictionary, or what 'root' form (lemma) it links to (e.g., swimming > swim). The lemmas for a given inflected word. can be combined with other endpoints to retrieve more information;
- the WordsEndpoint retrieves definitions, examples and other information for a given dictionary word or an inflection. The response contains information about the lemmas to which the given word/inflected form is linked;
- the SearchEndpoint retrieves possible headword matches for a search term. The results are calculated using headword matching, fuzzy matching, and lemmatization; and
- the TranslationsEndpoint retrieves translations for a given word.
The endpoint classes are available in a separate endpoints
mini-library. More information on the endpoints are available from the Oxford Dictionaries API Documentation.
Refer to the references for more backgound.
Usage
In the pubspec.yaml
of your flutter project, add the following dependency:
dependencies:
oxford_dictionaries: <latest_version>
In your code file add the following import:
// import the OxfordDictionaries class.
import 'package:oxford_dictionaries/oxford_dictionaries.dart';
// import the endpoint classes from the `endpoints` mini-library.
import 'package:oxford_dictionaries/endpoints.dart';
Hydrate a OxfordDictionaries
instance and get the DictionaryEntry
for "swimming" from the OxfordDictionariesEndpoint.words
endpoint:
// sign up for an account at (https://developer.oxforddictionaries.com/#plans)
// to obtain API keys
// hydrate a `OxfordDictionaries` instance with api keys and a language
final dictionary = OxfordDictionaries(
appId: appId, appKey: appKey, language: OxfordDictionariesLanguage.en_GB);
// get the `DictionaryEntry` for "swimming" from the `words` endpoint
final props =
await dictionary.getEntry('swimming', OxfordDictionariesEndpoint.words);
// print the defintions for "swimming"
print(props?.definitionsMap());
// prints
// {
// PartOfSpeech.noun:
// {the sport or activity of propelling oneself through water using the limbs},
// PartOfSpeech.verb:
// {propel the body through water by using the limbs, or (in the case of a fish or other aquatic animal) by using fins, tail, or other bodily movement,
// cross (a particular stretch of water) by swimming,
// float on or at the surface of a liquid,
// cause to float or move across water, be immersed in or covered with liquid, appear to reel or whirl before one's eyes,
// experience a dizzily confusing sensation in one's head}
// }
API
The OxfordDictionaries class implements the DictoSaurus interface that includes includes dictionary, thesaurus and term expansion utilities.
The implementation in this library uses to populate a DictionaryEntry object and provide some translation services to/from English.
DictionaryEntry is an object model for a term or word with term, stem, lemma and language properties. DictionaryEntry also enumerates term variants with different values for part-of-speech, definition, etymology, pronunciation, synonyms, antonyms and inflections, each with one or more example phrases.
The OxfordDictionaries class implements six (out of nine) endpoints from the Oxford Dictionaries APIin a separate endpoints
mini-library:
- the EntriesEndpoint retrieves definitions, pronunciations example sentences, grammatical information and word origins;
- the ThesaurusEndpoint retrieves words that are similar/opposite in meaning to the input word (synonym /antonym);
- the LemmasEndpoint checks if a word exists in the dictionary, or what 'root' form (lemma) it links to (e.g., swimming > swim). The lemmas for a given inflected word. can be combined with other endpoints to retrieve more information;
- the WordsEndpoint retrieves definitions, examples and other information for a given dictionary word or an inflection. The response contains information about the lemmas to which the given word/inflected form is linked;
- the SearchEndpoint retrieves possible headword matches for a search term. The results are calculated using headword matching, fuzzy matching, and lemmatization; and
- the TranslationsEndpoint retrieves translations for a given word.
More information on the endpoints are available from the Oxford Dictionaries API Documentation.
Please refer to the online API documentation for more information.
Definitions
The following definitions are used throughout the documentation:
corpus
- the collection ofdocuments
for which anindex
is maintained.character filter
- filters characters from text in preparation of tokenization.Damerau–Levenshtein distance
- a metric for measuring theedit distance
between twoterms
by counting the minimum number of operations (insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change oneterm
into the other (from Wikipedia).dictionary (in an index)
- a hash ofterms
(vocabulary
) to the frequency of occurence in thecorpus
documents.document
- a record in thecorpus
, that has a unique identifier (docId
) in thecorpus
's primary key and that contains one or more text fields that are indexed.document frequency (dFt)
- the number of documents in thecorpus
that contain a term.edit distance
- a measure of how dissimilar two terms are by counting the minimum number of operations required to transform one string into the other (from Wikipedia).etymology
- the study of the history of the form of words and, by extension, the origin and evolution of their semantic meaning across time (from Wikipedia).Flesch reading ease score
- a readibility measure calculated from sentence length and word length on a 100-point scale. The higher the score, the easier it is to understand the document (from Wikipedia).Flesch-Kincaid grade level
- a readibility measure relative to U.S. school grade level. It is also calculated from sentence length and word length (from Wikipedia).IETF language tag
- a standardized code or tag that is used to identify human languages in the Internet. (from Wikepedia).index
- an inverted index used to look updocument
references from thecorpus
against avocabulary
ofterms
.index-elimination
- selecting a subset of the entries in an index where theterm
is in the collection ofterms
in a search phrase.inverse document frequency (iDft)
- a normalized measure of how rare aterm
is in the corpus. It is defined aslog (N / dft)
, where N is the total number of terms in the index. TheiDft
of a rare term is high, whereas theiDft
of a frequent term is likely to be low.Jaccard index
measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets (from Wikipedia).Map<String, dynamic>
is an acronym for"Java Script Object Notation"
, a common format for persisting data.k-gram
- a sequence of (any) k consecutive characters from aterm
. Ak-gram
can start with "$", denoting the start of the term, and end with "$", denoting the end of the term. The 3-grams for "castle" are { $ca, cas, ast, stl, tle, le$ }.lemma or lemmatizer
- lemmatisation (or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form (from Wikipedia).Natural language processing (NLP)
is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data (from Wikipedia).Part-of-Speech (PoS) tagging
is the task of labelling every word in a sequence of words with a tag indicating what lexical syntactic category it assumes in the given sequence (from Wikipedia).Phonetic transcription
- the visual representation of speech sounds (or phones) by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the International Phonetic Alphabet (from Wikipedia).postings
- a separate index that records whichdocuments
thevocabulary
occurs in. In a positionalindex
, the postings also records the positions of eachterm
in thetext
to create a positional invertedindex
.postings list
- a record of the positions of aterm
in adocument
. A position of aterm
refers to the index of theterm
in an array that contains all theterms
in thetext
. In a zonedindex
, thepostings lists
records the positions of eachterm
in thetext
azone
.stem or stemmer
- stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form (generally a written word form) (from Wikipedia).stopwords
- common words in a language that are excluded from indexing.term
- a word or phrase that is indexed from thecorpus
. Theterm
may differ from the actual word used in the corpus depending on thetokenizer
used.term filter
- filters unwanted terms from a collection of terms (e.g. stopwords), breaks compound terms into separate terms and / or manipulates terms by invoking astemmer
and / orlemmatizer
.term expansion
- finding terms with similar spelling (e.g. spelling correction) or synonyms for a term.term frequency (Ft)
- the frequency of aterm
in an index or indexed object.term position
- the zero-based index of aterm
in an ordered array ofterms
tokenized from thecorpus
.text
- the indexable content of adocument
.token
- representation of aterm
in a text source returned by atokenizer
. The token may include information about theterm
such as its position(s) (term position
) in the text or frequency of occurrence (term frequency
).token filter
- returns a subset oftokens
from the tokenizer output.tokenizer
- a function that returns a collection oftoken
s fromtext
, after applying a character filter,term
filter, stemmer and / or lemmatizer.vocabulary
- the collection ofterms
indexed from thecorpus
.zone
- the field or zone of a document that a term occurs in, used for parametric indexes or where scoring and ranking of search results attribute a higher score to documents that contain a term in a specific zone (e.g. the title rather that the body of a document).
References
- Manning, Raghavan and Schütze, "Introduction to Information Retrieval", Cambridge University Press, 2008
- University of Cambridge, 2016 "Information Retrieval", course notes, Dr Ronan Cummins, 2016
- Wikipedia (1), "Inverted Index", from Wikipedia, the free encyclopedia
- Wikipedia (2), "Lemmatisation", from Wikipedia, the free encyclopedia
- Wikipedia (3), "Stemming", from Wikipedia, the free encyclopedia
- Wikipedia (4), "Synonym", from Wikipedia, the free encyclopedia
- Wikipedia (5), "Jaccard Index", from Wikipedia, the free encyclopedia
- Wikipedia (6), "Flesch–Kincaid readability tests", from Wikipedia, the free encyclopedia
- Wikipedia (7), "Edit distance", from Wikipedia, the free encyclopedia
- Wikipedia (8), "Damerau–Levenshtein distance", from Wikipedia, the free encyclopedia
- Wikipedia (9), "Natural language processing", from Wikipedia, the free encyclopedia
- Wikipedia (10), "IETF language tag", from Wikipedia, the free encyclopedia
- Wikipedia (11), "Phonetic transcription", from Wikipedia, the free encyclopedia
- Wikipedia (12), "Etymology", from Wikipedia, the free encyclopedia
- Wikipedia (13), "Part-of-speech tagging", from Wikipedia, the free encyclopedia
- Wikipedia (14), "Damerau–Levenshtein distance", from Wikipedia, the free encyclopedia
Issues
If you find a bug please fill an issue.
This project is a supporting package for a revenue project that has priority call on resources, so please be patient if we don't respond immediately to issues or pull requests.
Libraries
- endpoints
- Mini-library that exports all the endpoints for the OxfordDictionaries API.
- oxford_dictionaries
- Dictionary class that uses the Oxford Dictionaries REST API.