Grammar and Spell Check Rest API | AI-based

Ginger's Grammar Correction API

Overview

This document describes how to use Ginger’s proofreading API to correct spelling, grammar, punctuation, vocabulary and style issues in documents.

The first section (API details) describes all the information a developer needs to integrate with the API. The next section (Common Use Cases) articulates various ways in which the API can be put to use. It is recommended to read this section before beginning implementation. Certain readers (such as product managers, for example) may find it more useful to begin with this section.

The two appendices dive in detail into two types of information returned by the API – correction types and correction categories respectively.

API details

Endpoints

Sandbox: an endpoint for development and integration purposes. https://sb-partner-services.gingersoftware.com/correction/v1/document

Production: https://doc-partner-services.gingersoftware.com/correction/v1/document

Request: To correct a document, create a POST request with the document text in the body of the request. The document text is assumed to be url-encoded.

Sample request to the production endpoint:

https://doc-partner-services.gingersoftware.com/correction/v1/document?apiKey=someApiKey

Headers

Content-Type

The document type should be specified in the Content-Type header. Supported types and subtypes (type/subtype - case insensitive):

text/plain for plain text documents
text/html for html documents

If an unrecognized type or subtype are passed, the server will return HTTP 415. For example, the following will return 415:

application/plain: only type “text” is supported
text/pdf: only subtypes “plain” and “html” are supported

accept-encoding

Compression is supported via the accept-encoding header. When set to “gzip” the transport will be encoded.

URL Params

URL params can be passed in any order

In case an illegal param value is passed, the server returns HTTP error 400 with an appropriate error message.

An unknown param name in the query string will be silently ignored.

Following is the list of supported url params and their values.

Parameters

Name

Values

Comment

apiKey

The api key string obtained
from ginger.

If you do not have an API Key yet, click here for your API key

Mandatory.

lang

US / UK / Indifferent

Not mandatory. Defaults to “Indifferent”

The English language locale to be used for correction.

UK/US will enforce British English or American English spelling respectively. “Indifferent” will make the correction agnostic to locale, so either variation (e.g. colour or color) will be considered correct.

generateSynonyms

true/false

Not mandatory. Defaults to false.

If set to true, returns contextual synonym suggestions in addition to corrections.

For more details about recommendations, see the appendix.

generateRecommendations

true/false

Not mandatory. Defaults to false.

If set to true, returns style and vocabulary recommendations in addition to other corrections.

For more details about recommendations, see the appendix.

avoidCapitalization

true/false

Not mandatory. Defaults to false.

If set to true, beginning of sentence capitalization will not be checked for.

For example, when the flag is set to true the sentence “the boy is tall” will be returned with no corrections (as opposed to suggesting to capitalize the word “the”.

Response

Below is the response for sending the following two-sentence document for correction:

Her closes the door quiet. She not hear anything.

The response contains an array of three corrections with all their details as described below.

The response consists of two arrays: Corrections and Sentences.

Corrections array:

Each element represents an error found in the document and its suggested corrections.

Confidence:

A discrete value indicating how likely the correction is to be precise. Possible values: 4 (High), 3 (Medium), 2 (Low), 1 (None). The higher the value, the more reliable the correction.

ShouldReplace

A Boolean indicating whether it is recommended to automatically replace the error word with the top suggestion. For a review of various client implementation strategies, see the “Common use cases” section in this doc.

CorrectionType:

A classification of the correction into one of the following high-level categories: 1 – Spelling, 2 – Misused Word, 3 – Grammar, 4 – Synonym, 5 – Recommendation, 6 - Punctuation

TopCategoryId:

The id of the grammatical category the top suggestion belongs to. For a detailed description and a full list of category ids see the appendix to this document.

MistakeText:

The word or words which contain an error.

MistakeDefinition:

Whenever available, a dictionary definition of the mistake word

From, To:

Zero based indices of the detected error, relative to the beginning of the document.

LrnFrg:

Whenever available, the grammatical context of the error word/s. A short, not necessarily consecutive, fragment from the original sentence which gives grammatically intact context around the error word/s.

MistakeWordsInLrnFrg:

The indices of the words with errors within the LrnFrg. The indices are zero based, relative to the beginning of the LrnFrg.

Suggestions Array:

An array of suggested corrections for the error. The suggestions are ordered by relevance. Each suggestion contains the following data:

Text: the text with which to replace the error word/s in the original document.

CategoryId: The id of the grammatical category which the suggestion belongs to.
For a detailed description and a full list of category ids see the appendix to this document.

Definition: Whenever available, a dictionary definition of the suggested word.

Recommendations Array:

Similar to the Corrections array. Contains vocabulary and style recommendations, as described in the Recommendations section of the “Contextual Synonyms and Recommendations” appendix. This array will appear only if there are recommendation corrections returned by the API.

Sentences array:

An array in which each element represents a sentence in the original document. A client can use this section to determine how Ginger’s correction engine split the document into sentences, determine which sentences were not corrected and why, compute statistics like the number of sentences in the document and so on.

Each sentence in the array contains the following information:

FromIndex,ToIndex: zero based indices of the sentence, relative to the beginning of the document.

IsEnglish: False means the sentence was flagged as not being in English and thus was not corrected.

ExceededCharactersLimit: Sentences which exceed 300 characters are not corrected. Such sentences will have “true” in this field.

Response Example:

{
  "GingerTheDocumentResult": {
    "Corrections": [
      {
        "CorrectionType": 3,
        "From": 0,
        "Suggestions": [
          {
            "CategoryId": 5,
            "Text": "She"
          }
        ],
        "To": 2,
        "TopCategoryId": 5
      },
      {
        "CorrectionType": 3,
        "From": 20,
        "Suggestions": [
          {
            "CategoryId": 23,
            "Text": "quietly"
          }
        ],
        "To": 24,
        "TopCategoryId": 23
      },
      {
        "CorrectionType": 3,
        "From": 31,
        "Suggestions": [
          {
            "CategoryId": 21,
            "Text": "doesn't hear"
          }
        ],
        "To": 38,
        "TopCategoryId": 21
      }
    ],
    "Sentences": [
      {
        "ExceededCharacterLimit": false,
        "FromIndex": 0,
        "IsEnglish": true,
        "ToIndex": 26
      },
      {
        "ExceededCharacterLimit": false,
        "FromIndex": 27,
        "IsEnglish": true,
        "ToIndex": 49
      }
    ]
  }
}

Returns a 200 response on success

Failure Responses

The response for missing/wrong parameters includes an exception type and an error message.

Failure Response example:

Error 400

A mandatory param was not passed. The Message property provides information about the missing parameter. In the example above it is the mandatory apiKey parameter.

{
  "ErrorType": "Missing parameter",
  "Message": "APIKey is mandatory"
}

{
  "ErrorType": "Authentication error",
  "Message": "The api key is not recognized"
}

{
  "ErrorType": "Authentication error",
  "Message": "The api key is expired or disabled"
}

Error 400

A url param was passed with an illegal value. The Message property provides detailed information. In the example above it is the lang parameter, which was passed with an illegal value of “USS” (instead of (US) Example: generateSynonyms=falsee, avoidCapitalization=tru etc.

{
  "ErrorType": "Illegal parameter value",
  "Message": "lang param value is invalid: uss"
}

Error 400

The document sent for correction exceeded the maximum allowed limit. The maximum document size is normally in the range of 10,000 and 50,000 characters.

{
  "ErrorType": "Server-side error",
  "Message": "The document is too long"
}

Error 415

An unsupported document type was passed

{
  "ErrorType": "Unsupported media-type in Content-Type header",
  "Message": "text/plaind is unsupported"
}

Common use cases

Ginger’s correction API returns suggested corrections for words suspected as errors. It is up to the client of the API to decide how to apply the corrections to the text. The following section outlines common use cases. These can be broadly categorized as interactive and offline scenarios. Ginger’s API contains all the data required to serve both of these scenarios.

There is some overlap between the two, so it is recommended to read both sections below.

Interactive Corrections

In this scenario, a user submits their text for grammar and spelling correction through an online text editor. The following are some common practices for displaying the results of the API.

Highlighting errors

Highlighting or otherwise marking the parts of the text which were identified as errors can help the user focus on correcting them. This can be done using the “From” and “To” fields of each “Correction” object from the “Corrections” array in the response.

Displaying alternative suggestions

When an error is identified, three options exist with regard to how to correct it: there is either a single suggested correction, multiple suggested corrections or no suggested corrections.

Single suggested correction: In terms of the API response, this is the case when the “Suggestions” array of a certain “Correction” object contains a single element. The client application should decide how to display this suggestion to the user. This can be done either by replacing the error word automatically or by only highlighting it and letting the user view the suggestion and decide whether to apply it.

Implementations which choose to replace automatically should consider the “shouldReplace” property of the Correction object. If it is false, it is advised not to perform an automatic replacement.

Multiple suggested corrections: this case is similar to the previous one except that since there are several suggested replacements the user should be able to choose the one they would like to replace the error with. The items in the “Suggestions” array in the response are ordered according to their likelihood, so it is recommended that they are displayed to the user in the same order they are received in the response from the API.

Implementations which choose to replace automatically should thus do so using the first item in the Suggestions array and also consider the “shouldReplace” property of the Correction object as mentioned above.

No suggested corrections: this is a case when Ginger detected that there is a mistake in the text, but is not able to suggest any reasonable option for correction. Here is a simple example of such a case: “My hshfjedkskhd is new.” In such cases it is still helpful to highlight the mistake text, but it is advisable to make It clear to the user that no suggestions are available either by a different highlight color, explicitly stating so in the UI or both.

Displaying additional information about the correction

Implementations may choose to reflect several other parts of the API response to users:

Confidence: A user will likely benefit from knowing a certain correction is considered low confidence and thus be more careful when deciding whether to select one of the suggested alternatives. The Confidence field in the response contains this information. Values of 4 and 3 (High and Medium) can be considered very reliable. A value of 1 indicates low confidence. It is recommended to differentiate between the two in the user interface, e.g. by using a different highlight color in each case.

Correction Types: Corrections can be thought of as belonging to one of several groups or types: spelling mistakes, grammar mistakes, punctuation issues, word usage or vocabulary mistakes, synonyms and style recommendations. This information is contained in the CorrectionType field for each correction. An implementation can choose to use this field in various ways, such as:

Help a user distinguish between different types of mistakes by using different colors for spelling, grammar and vocabulary errors
Distinguish between mistakes and style recommendations and synonym suggestions
Allow users to toggle the display of the different types of corrections on or off

Grammatical category: Each suggestion for correction is classified by the API as belonging to a certain grammatical category or topic. Knowing the category makes it possible to display to the user more fine-grained information about why the correction is being suggested. This both facilitates learning and increases the user’s confidence in the system.

For example, in the sentence “She live in my neighborhood”, the verb “live”” is corrected to “lives” so that it properly matches the subject of the sentence. Implementations can choose to explain to users at this point what subject verb agreement is, or the reason this suggested correction is offered to them. This can be done, for example, by presenting the explanation alongside the suggested correction, using a tooltip when hovering over the error word and so on.

This information is contained in the “CategoryId” field of each suggestion. To make things more convenient, the category of the first suggestion, which is the most likely one, is also given in the TopCategoryId property of the Correction object itself.

For a full list of categories, their description and examples, see the “Correction Categories” appendix in this document.

Definitions: Each suggestion often also has a dictionary definition of it in the Definition field of the Suggestion object. This info can be displayed to the user alongside the suggestion to enhance their understanding of the suggested word and thus make them more confident about their selection. Note that sometimes the definition will not appear, so the implementation must be ready to not display anything in this case.

Sentence level information

Sentence boundaries

Ginger’s API response contains information about where each sentence in the document begins and ends. This information is contained in the FromIndex and ToIndex properties of each Sentence object in the Sentences section. This can be useful for user interfaces which wish to mark sentences which contain errors, for example, or similar needs. To know which sentence contains a specific error, an implementation needs to locate the sentence for which the From and To indices of the correction are contained within the FromIndex and ToIndex of the Sentence.

The document is split into sentences according to standard end of sentence punctuation: period, question mark, exclamation mark and combinations of several such punctuation marks. Common abbreviations ending with a period (such as Dr., Mrs. and the likes) are accounted for. Sentences that do not contain an end of sentence punctuation will be considered a single sentence, which may result in inferior correction quality or exceeding the maximum allowed size of a single sentence (300 chars).

Unproofed sentences

The API response contains information which allows application to conclude which sentences were not proofread and why. This can happen for one of several reasons, each designated by a specific property in the relevant Sentence object of the response.

IsEnglish: false indicates the sentence is not in English and was thus not checked
ExceededCharactersLimit: true means the sentence was not checked because it is longer than the maximum limit of 300 characters. Implementations may wish to flag these sentences in a specific way, prompt the user to split them and recheck and so on.

Offline Document correction

There are use cases for the correction API which do not involve an interactive user interface. One such common case is when a large number of documents is being sent for correction as part of some business flow. For example, a legal firm may wish to send, at the end of each day, all the documents produced during that day for spelling and grammar proofreading. Each morning, contracts flagged with more than a certain amount of errors or containing certain error types are passed for manual review. Another example, as part of its process of reviewing a new manuscript submitted to it, a book publisher or a scientific journal may wish to automatically proofread it and based on the results send it directly to an editor, pass it to a human proofreader or reject it altogether. A website may wish to perform a periodic review of all the new published content, user reviews and so on. Many other such use cases exist. The common thing about all these cases is that the response from the API is processed automatically by a machine whose output is either data or decisions to serve the next point in the pipeline, a human readable report, a new document with proofreading applied to it and so on.

Summary Information

Interactive and offline implementation alike may find it useful to create summary reports about a single document or set of documents. These may include both document information and proofreading information.

Document level information can include data such as number of sentences in each document, average sentence length, total document length, percent of non-English sentences and percent of too long sentences.

Proofreading information can show statistics about types of errors (Spelling vs. Grammar vs. Vocabulary), a breakup of the user’s mistake by grammatical topic, the overall number of mistakes, mistakes per sentence and the like.

Implementations may even want to consider tracking such statistics over time for profiling or progress monitoring purposes.

All of the information to generate the above statistics is contained in the various fields of the API response discussed in previous sections.

Appendix: Contextual Synonyms and Recommendations

This section covers in more details two special types of suggestions, contextual synonyms and style and vocabulary recommendations.

Contextual Synonyms

English words often have a large number of synonyms. For example the word “very” has the following and many more: highly, awfully, really, extremely, terribly, deeply, seriously, selfsame, actual, much and many others. However, in the context of a particular sentence, only part of them are applicable. When the generateSynonyms param is set to “true”, if a word in the sentence has synonyms which are relevant in the context of that sentence, Ginger’s API will return them as suggestions.

Example:

My coussin is very picky about the restaurants he dines in.

Without any flags, only the spelling mistake coussin->cousin will be corrected.

With generateSynonyms=true, in addition to the spelling correction also synonyms for “very” (really) and for “picky” (particular, fussy, finicky) will be returned.

Recommendations

Recommendations are style and vocabulary suggestions.

There are three types of recommendations currently supported:

Spelling out numbers: 0-9 replaced with the spelled-out number (when in the context of counting something)
Overused Adjectives: alternative adjectives suggested whenever commonplace/banal ones are being used
Flagging of passive voice usage. The active form is not suggested, only the fact that passive voice was used is flagged.

Examples:

There are 2 very good reason for this.

Without any flags, only the mistake reason->reasons will be corrected.

With generateRecommendations=true, in addition to the grammar correction also the “2” will be spelled out to “two”
I live in a big house.

Without any flags, there will be no correction.

With generateRecommendations=true, there will be two suggestions for the adjective “big” (large, spacious)
The action was taken by her

Without any flags, there will be no correction.

With generateRecommendations=true, the use of the passive voice will be flagged

Appendix: Correction Categories

General

A correction category specifies the grammatical topic a suggestion returned by the API belongs to. A category is identified by a unique numeric id which is returned in the LrnCatId field, which is part of each member of the Suggestions array in the response:

Each suggestion has a category associated with it. Thus, if there are multiple suggested corrections for a specific word, each will have its own, potentially different, category. For example, in the sentence: “Book is on the shelf” there are two suggested corrections for “Book”: “The book” and “A book”. The first gets LrnCatId = 13 (DefiniteArticle), the second gets LrnCatId = 12 (IndefiniteArticle).

{
  "Suggestions": [
    {
      "LrnCatId": 43,
      "Text": "boy's"
    },
    {
      "LrnCatId": 43,
      "Text": "boys'"
    }
  ]
}

List of Correction Categories

Following is the full list of correction grammatical categories the Ginger API may return. Each category is listed with its id and name.

Parameters

Category ID

Category name

SplitAndMerge

CommonAndProperNouns

Pronouns

IndefiniteArticle

DefiniteArticle

Tenses

PrimaryVerbs

PresentProgressive

PastSimple

Future

SubjectVerbAgreement

AdverbialModifiers

Prepositions

PrepositionsInOnAtConfusion

Spelling

PresentSimple

PresentPerfect

PastProgressive

PastPerfect

TheInfinitive

Participles

Punctuation

Plurality

ConsecutiveNouns

UK/US English

BeginningOfSentenceCapitalization

MisusedWords

DoubleWords

Synonyms

CommaAddition

ComparativeSuperlative

QuestionMarkAddition

100

Vocabulary

102

InformalLanguage

103

OverusedWord

104

PassiveVoice

105

NumeralSpellingOut

1000

Other

Correction Categories – Description and Examples

The following section describes the types of mistakes captured by each category, with examples.

Spelling

Name

Category Id

Description

Examples

Spelling

A word was not spelled correctly

Fizix is a great sudgekt

The marble statue had a big hed

Name

Category Id

Description

Examples

Misused

Confusion between words which sound similar or have a similar spelling

This problem is to complicated

I wasn't sure what to except

I want to rid a camel

We decided to remove the item form our store

Name

Category Id

Description

Examples

SplitAndMerge

A word was accidentally split in two or vice-versa, two words were accidentally merged into a single word

The bed room is comfortable

This type of behavior can mad den me

This looks makebelieve, not real

I amnot going there

Name

Category Id

Description

Examples

UK/US

Usage of British instead of American English spelling or vice-versa. Preference is determined by the value set in the “lang” field in the request

The colour purple is my favorite [colour corrected to color is lang=US]

I live near the town center[center is corrected to centre if lang=UK]

Name

Category Id

Description

Examples

CommonAndProperNouns

A proper or common noun is not capitalized

My friend john is not well today

We will always have paris

My english teacher is stern

My freudian slip turned out to be fatal

Name

Category Id

Description

Examples

DoubleWords

Accidental repetition of a word

I went to to the store

I won't let let her do it

Grammar – Nouns

Name

Category Id

Description

Examples

IndefiniteArticle

Confusion between a/an or omission of an indefinite article when it is required

John is studying for a MBA degree

This is an great show

This is great show

Name

Category Id

Description

Examples

DefiniteArticle

Omission of the definite article (“the”) when it is required

I had time of my life on this vacation

These are some of things I have to deal with

Name

Category Id

Description

Examples

ConsecutiveNouns

Wrong usage of two or more nouns in a row, either in possessive form or as modifiers of each other

Sheryl went to the tickets office

My wife name is Sara

The boys teacher is not coming today

Name

Category Id

Description

Examples

Plurality

Confusion between the singular and plural form of a noun

We bought a number of item

Six people lost their life in the accident

I sleep in a small beds

Name

Category Id

Description

Examples

Pronouns

Using the wrong pronoun or not using a pronoun when one is required

I need you help

Mary and me just had a long conversation

Grammar – Adjectives

Name

Category Id

Description

Examples

ComparativeSuperlative

Wrong usage of comparative and superlative structures

This movie is bad than anything I have seen

She is the most pretty girl in her class

This circle is more round than it seems

Grammar – Verbs

Name

Category Id

Description

Examples

Tenses

The wrong verb tense or form is being used and the mistake does not fall under one of the more specific verb related categories (16-21 and 34-39)

Jane couldn’t located your phone number

It is important to submitted the paper on time

Deploying this in the cloud will allowing scaling

Name

Category Id

Description

Examples

PresentSimple

Error in forming or applying the present simple tense

The battery is lasting for only 2 hours

Name

Category Id

Description

Examples

PresentPerfect

Error in forming or applying the present perfect tense

I have develop this idea for days

Has we met before?

Name

Category Id

Description

Examples

PresentProgressive

Error in forming or applying the present progressive tense

Terry has writing a letter at the moment

I am go to the market

Name

Category Id

Description

Examples

PastSimple

Error in forming or applying the past simple tense

I go to the store yesterday

Yesterday I download an app for long distance calls

Name

Category Id

Description

Examples

PastPerfect

Error in forming or applying the past perfect tense

My parents has never been to Florida before this summer

Tom have already left work before he realized he forgot his notes

Name

Category Id

Description

Examples

PastProgressive

Error in forming or applying the past progressive tense

He was sell fruit on the side of the road

I am having a problem with my connection yesterday

Name

Category Id

Description

Examples

FutureTense

Error in forming or applying the future tense

Amy try going to the chess club tomorrow

I'll will review the test later

Name

Category Id

Description

Examples

SubjectVerbAgreement

Mismatch between the plurality of the verb and the subject

A bouquet of yellow roses lend color and fragrance to the room

My aunt or my uncle are arriving by train today

Name

Category Id

Description

Examples

TheInfinitive

Errors in forming or using the infinitive form

I refuse give up

She expects it is ready by five

The tutor allowed us talking as long as it was in English

Name

Category Id

Description

Examples

Participles

Errors related to incorrect usage or incorrect form of participles

We are looking for employees with proved experience

My dream has giving me strength to lead others

It's take Sam two hours to get home

Name

Category Id

Description

Examples

AdverbialModifiers

Incorrect use of adverbs instead of adjectives or vice-versa or using the wrong adverb

She is a beautifully woman

He closed the door quiet

He sings good

Grammar – Prepositions

Name

Category Id

Description

Examples

Prepositions

Applying a wrong preposition. The specific common case of in-on-at confusion is categorized separately (30)

We arrived to the station

The differences among English, Chinese, and Arabic are significant

Do you have a good picture I can incorporate in the presentation?

You can get there with bus or on foot

Name

Category Id

Description

Examples

PrepositionsInOnAtConfusion

Confusion between the usage of the prepositions in, on and at

I arrived in five o'clock

She lives in 666 Elm Street

The relevant data appeared on the rightmost column

Grammar – Other

Name

Category Id

Description

Examples

BeginningOfSentenceCapitalization

The first word in a sentence not capitalized

this is not right

what a nice house!

Name

Category Id

Description

Examples

CommaAddition

Not adding a comma where there should be one, e.g. after an introductory phrase, between list items and so on

Oh well let's go for it

Students will demonstrate understanding of spoken words syllables and sounds (phonemes)

98 West Pulaski Road Huntington Station NY

Name

Category Id

Description

Examples

QuestionMarkAddition

Not adding a question mark at the end of a sentence

Is this really true

How did you manage to do it

Word Usage

Name

Category Id

Description

Examples

Vocabulary

100

Using a semantically unsuitable word in a sentence

Can you make me a favor?

I want to stay till the finish of the party

They said us to come quickly

Name

Category Id

Description

Examples

InformalLanguage

102

Usage of slang and informal language. Enabled when generateRecommendations is set to true

A lotta things depend on his success

I coulda made it

Name

Category Id

Description

Examples

OverusedWord

103

Usage of banal or general adjectives instead of more specific ones. Enabled when generateRecommendations is set to true.

He lives in a big house

He is a nice guy

Style

Name

Category Id

Description

Examples

PassiveVoice

104

Usage of the passive voice. Enabled when generateRecommendations is set to true.

The job was done by him

The deed is being done as we speak

Name

Category Id

Description

Examples

NumeralSpellingOut

105

Writing digits instead of spelling out the numbers 0-9

We raise 3 cats and a newborn baby

The bad weather delayed me by 5 days

Name

Category Id

Description

Examples

Synonyms

Suggestions of synonyms for the given word

The trip was boring, but the road was beautiful “tiring” will be suggested as a synonym for “boring” and “route” as a synonym for “road”

How I would love a good night's sleep! “slumber” will be suggested as a synonym for “sleep”

Other

Name

Category Id

Description

General

1000

A mistake that does not fall under any of the previous categories.