Open Source Suggester Spellcheck - Spell Checking Java library

Contents:

1. What is it?
2. Advantages
3. Where to get it?
4. Requirements
5. Basic, Advanced and Enterprise versions of Suggester software
6. Documentation
7. Java Code Samples
8. Examples
9. Dictionaries
10. Release Notes
11. Licensing and Legal Issues


1. What is it?

The Suggester Spell Check is a 100% pure Java library to provide local spell checking service. Free to use with already pre-compiled dictionaries. Suggester Spell Check uses Basic Suggester as a spellchecker.

What is the Suggester software?
The Suggester is a Java program, providing recommendations for unknown words in user query for local search systems. System administrator can create a list of preferred words and assign higher weight to such words. As a basic implementation Suggester can serve as a spellchecker.

2. Advantages

Smart suggestions:
The Suggester uses shortest Edit-distance measure combined with Metaphone algorithm and private Fuzzy-matching algorithm to select the suggestion. You can adjust the influence from each algorithm using a configuration file. Try Java Applet based Spelling Suggestions Test to see how it works.

Local service:
Unlike Google's Spelling API, the Suggester library and a dictionary file is all you need to have local spellchecking service fully under your control. No need to worry about exposure on Internet, connectivity problems and availability of external service. No hidden fees as well.

Multi-lingual:
See below which dictionaries / languages are available for download.

Custom dictionaries:
The Index Builder allows user creating custom dictionaries. It also can be used to extract all words from the dictionary and modify existing dictionaries. The Index Builder is included in free Basic Suggester download package.

High dictionary compression:
The word dictionary is compressed on a hard drive as well in computer memory. A basic UK English dictionary contains about 57000 words and has a size about 90K. The English dictionary contains about 200,000 words (including names, abbreviations, geographic places, etc.) and it takes 236Kb file on a hard drive and about 2Mb space in memory. Other languages are compressed even better. For example full Russian dictionary contains more than 1,300,000 words (including variants) and it takes 315Kb file on a hard drive and again about 2Mb space in memory. Comparing with more than 30Mb file size of original word list (in UTF-8 format), the compressed file size is close to 1% of original size.

High dictionary search and suggestion selection speed:
Dictionary case dependent / independent look-up takes about 0.002 / 0.005 ms per word, which comes to speed about 500,000 / 200,000 words per second. Suggestions search speed averages about 40 ms per set of suggestions for each unknown word on Pentium M 1.4Gz (with high quality of suggestions).

Portability:
The Suggester software entirely written in Java 1.2. Runs on any Java® platform: Windows®, Mac OS®, Unix, Linux. Tested on JRE 1.2 and up.

Comparison table of Suggester and other popular spellchecking software: Apache Java library "Jazzy", web site Dictionary.com, Microsoft Word 2000 and Google search engine (the comparison was done in 2007).

Please note Important limitation: Basic suggester is not context sensitive.

3. Where to get it?

The home page for the Suggester Spell Check project can be found on the SoftCorporation LLC. web site http://www.softcorporation.com/products/spellcheck. There you also can find the information on how to download the latest release as well as all other information you might need regarding this project.
To download go to Basic Suggester project.

4. Requirements

o A Java 1.2 compatible or newer JVM for your operating system.
o There are no other requirements to run Suggester as a Spellchecker.
o To run the Index Builder you may need up to 1Gb of virtual memory.

5. Basic, Advanced and Enterprise versions of Suggester software

There are 3 different versions of Suggester software:
o Basic Suggester - (free open source) uses one dictionary, where all words have the same weight. The Suggester Spell Check uses Basic Suggester.
o Advanced Suggester - (commercial) can use multiple dictionaries with different weights assigned to each dictionary and each word. It also supports multiple languages.
o Enterprise Suggester - (not ready for distribution) uses all features from Advanced Suggester plus has an ability to compress information at much higher rate than the Advanced Suggester. It is achieved by removing repeated segments of a trie, which stores dictionary information. As a result each trie segment of the Enterprise Suggester dictionary is unique.
6. Documentation
See The Basic Suggester Project for documentation.

7. Java Code Samples
Java code samples are included in the download package. Click on a link for more information on How to use Basic Suggester Spell Check.

8. Web Examples
Advanced and Enterprise verions of Suggester software allow creating context sensitive spell-checker, which you can test here:
English Spell Check test.
Russian Spell Check test.

9. Dictionaries
Click here to see and download Suggester dictionaries, including English medical dictionary. Full English/American dictionary contains about 200,000 words, including geographical places and often used names. Full Russian dictionary contains more than 1,300,000 words (including variants).
Send us email if you need Suggester Spell Check with other languages.

10. Release Notes


11. Licensing and Legal Issues

For legal and licensing issues, please read the LICENSE.TXT file.

Java (TM) is trademark of Oracle Corporation.

Suggester Project

Java code Samples

More free downloads

E-mail to Tech Support Group

Keywords: SoftCorporation LLC., Java, free, software, spell check, spelling, spellcheck, free web service, free spellchecking web service, download, application