Hi guys I need the fastest binary search tree data structure you can find because my class has a competition for the fastest program to implement a spell checker using this tree. The documentation is below.
Ten Great Riters who cudtn even Spell
number 4 made me lol!
Assigned March 21, 2016
Due 23:50:00 April 7, 2016
Fainali, xen, aafte sam 20 iers ov orxogrefkl
riform, wi wud hev a lojikl, kohirnt
speling in ius xrewawt xe Ingliy-spiking
werld.
Excepted from A Plan for the
Improvement of English Spelling
Authorship and date unclear
Usually attributed to Mark Twain
Trees sprout up just about everywhere in
computer science. . .
Don Knuth
The Art of Computer Programming
Vol. IV-A, Combinatorial Algorithms,
Section 4.2.1.6
(2011)
1 COM S 228 Fall 2014 Assignment 4
Trees are pervasive throughout computing. While the binary tree is the most common—seen in such diverse
applications as parsing, static analysis and expression evaluation (Abstract Syntax Trees), compression (as
in Huffman Coding), and network routing (as in Butterfly Switches)—trees of other degree are also common.
Quad- and oct-trees (trees of degree 4 and 8, respectively) are often seen is spatial algorithms; an octtree
can be used, for example, to recursively subdivide a 3-dimensional space applied in a finite element-based
simulation of the interactions of celestial bodies. n-ary trees (trees without any fixed degree) are used to
represent directory structures. And, indeed, the linked lists you just implemented are really a special case of
trees of degree one.
Similarly pervasive are spell checkers, saviors of poor typists and idiots the world over. Modern spell
checkers not only detect incorrect words, but use metrics like edit distance to calculate and rank corrections
very quickly. Additional natural language processing algorithms can process input in an online fashion,
suggesting corrections based on context and even predicting future words with continually higher degrees
of accuracy. Such advanced techniques are cutting edge, cross over into the realms of research, and are well
beyond the scope of this course; however, a basic spell checker that efficiently detects misspellings without
suggesting is already within your capabilities.
For this assignment, you will build such a tool.
1
2 Objectives
In this project, you will employ the following concepts:
• Binary Search Trees
• Solving realistic problems with Java, building solutions that require many hundreds of lines of code
• Java Collections and Iterators
• Optionally (see Section 7), rotations and balanced binary search trees
3 Problem Overview
Your spell checker will operate in both interactive and non-interactive modes. An optional switch, -n, as
the first argument on the command line, will signal your tool that you wish to process this input in noninteractive
mode rather than the default interactive operation. Two file names, the first of a dictionary and
the second of a document to be spell checked are required parameters. Additionally, if the first command
line argument is the switch -e, the program should take only a dictionary and enter dictionary edit mode.
In this section are defined input and output formats, and required classes and interfaces.
3.1 Input
Input will be read from one or two files, the first a dictionary and the second (when in one of the two spell
check modes) a document.
3.1.1 Dictionary
Dictionaries consist of a list of keys in ASCII1
, with one key and a (possibly null) definition per line. The
first word on a line is the key. If the line contains a definition, it will be signaled first by a colon, with the
definition constituting the remainder of the line . Dictionaries may be in any language, so long as it can be
represented with ASCII characters.
More formally, each dictionary entry will have the following format2
:
ˆ(:)?$
where ‘’ is a mixed-case string of ASCII letters that may include hyphens and apostrophes, ‘:’ is a
literal colon, and ‘’ is everything that remains on the line, not including the newline.
1A very simple, “plain text” format. Look it up if you’re curious, but what this really means to you if that you won’t have to deal
with accents or letters that aren’t part of the English alphabet (ASCII is why you have to write “Inigo Montoya” instead of “Inigo ˜
Montoya).
2This is a notation roughly based on that used in regular expressions. ‘ˆ’ matches an empty string at the beginning of a line; it
means that the item following it—in this case the key—will be at the beginning of the line. The parenthesis are used as grouping
operators, and the appended question mark means that there are exactly zero or one instances of the expression within preceding
the group. The ‘$’ matches an empty string at the end of the line, meaning that the thing preceding it—in this case either the key
or the definition (because of the question mark!)—runs to the end of the line. None of the special characters (parenthesis, question
mark, circumflex, dollar sign) are literals in the described strings.
2
3.1.2 Document
Input documents may be any ASCII file.
3.2 Output
Your spell checker must function in interactive and non-interactive spell checking modes, and in dictionary
edit mode. Interactive spell checking mode is the default. All modes are to be be line-based. All modes
should ignore non-alphabetic characters in the document, except for hyphens and apostrophes. Spell check
output must be homomorphic with input under spell checking, meaning that you may not discard, destroy,
or otherwise modify any input except to correct misspellings. Punctuation and spacing, for example, should
not be changed by your tool.
3.2.1 Non-interactive Spell Checking Mode
Input lines from the document are spell checked in their entirety. Each line is then written to the terminal.
If a line has no misspellings, no further action is required. If the line does have misspellings, the first letter
of each misspelled word is highlighted with a circumflex (ˆ) indicating it from below in the following line.
Output continues from the next line of the input until the entire document is written and marked. Lines with
no misspellings are not highlighted in any way, i.e., there is no empty highlight line inserted into the output
after them.
The input file and dictionary are not changed; the only output is to the terminal. See Section 5.0.7 for
example output.
3.2.2 Interactive Spell Checking Mode
Interactive mode marks misspellings in the same fashion as non-interactive mode; however, instead of continuing
immediately with the next line, if a line has misspellings, the interface will prompt the users to accept
or replace each misspelling in the input, starting with the left-most. If the user accepts, the word is inserted
into the dictionary so that future instances of it will not be marked. If the user replaces the misspelling, the
output string is modified to include the replacement in place of the misspelling. Processing continues on
the same line until all misspellings are either accepted or replaced. Note that since we spell check each line
with a replacement after performing the replace operation, replacement words are themselves spell checked
and subject to the same scrutiny as the original words.
In addition to the terminal output. When a file has been interactively spell checked and corrected,
the original document should have a tilde (˜) appended to its name (e.g., les_miserables.txt becomes
les_miserables.txt˜), while the updated content should be written to a file with the original document
name. The dictionary is not updated to disc in spell check mode. Inserts performed while checking a
document persist only for the lifetime of the session.
Accept/replace prompts will be displayed on a new line immediately following marker lines. The precise
format of these lines is undefined, but must clearly convey the following three pieces of information:
1. The misspelled word itself
2. That one option allows the user to accept the spelling, including an indication that an input of the
letter ’a’ will select this option
3
3. That one option allows the user to replace the misspelling, including an indication that an input of the
letter ’r’ will select this option
The following example prompt meets these criteria with a misspelling of ’neighbor’:
nieghbor: [r]eplace/[a]ccept?
If the user chooses to accept the spelling, it is inserted into the dictionary. If the user replaces, a new
prompt is displayed, asking the user for the replacement text. For example:
Replacement text:
See Section 5.0.8 for example input and output, and note that the original document is backed up and
modified to disc in this mode.
4 Dictionary Edit Mode
Dictionary Edit Mode provides a controlled interface to make permanent changes to dictionaries, including
adding new entries, removing entries, correcting entries, and adding, removing, or modifying definitions.
When a user enters dictionary edit mode, they should be prompted (we henceforth refer to this initial prompt
as the main- or top-level prompt)) in a way that clearly conveys this, for example:
Editing english_dictionary.txt
Word:
The user enters a key. The key is checked for validity (it should be composed of letters, hyphens, and
apostrophes). If the key is invalid, the user is notified and the top-level prompt is displayed. For example
(user input in italics):
Word: h4x0r
’h4x0r’ is invalid. Please enter a valid word.
A valid key leads to dictionary search. Found entries lead to a user prompt with four options:
• Remove, selected with input ‘r’, removes the item from the dictionary, and thus from the tree;
• Get definition, selected with input ‘g’, displays the definition, if it exists, and ‘’ otherwise;
• Change definition, selected with input ‘c’, prompts the user for a new definition and updates the entry
with it; and
• Do nothing, selected with input ‘n’, performs no further action.
After performing the selected action, control returns to the main prompt.
If the key is not found, the user is given a prompt with three options:
• Add, selected with input ‘a’, adds the input word to the dictionary without a definition;
• Add with definition, selected with input ‘d’, prompts for a definition and adds the word and definition
to the dictionary; and
4
• Do nothing, selected with input ‘n’, performs no further action.
After performing the selected action, control returns to the main prompt.
The user exits the main loop by entering ’!quit’ at the prompt. The old dictionary is backed up with a
tilde (˜) appended to its name, while the edited dictionary is written to the input dictionary’s file name.
A full, example dictionary edit mode session appears in Section 5.0.9.
4.1 Required Classes
In addition to supplied classes and methods that are already complete in the template, the following classes,
interfaces and associated methods and variables are required. See the associated javaDocs for detailed
requirements.
4.1.1 Class SpellChecker
• public static void doInteractiveMode(String[] args)
• public static void doNonInteractiveMode(String[] args)
• public static void doDictionaryEditMode(String[] args)
4.1.2 public class BinarySearchTree> extends AbstractCollection
• public BinarySearchTree()
• public BinarySearchTree(Node root, int size)
• public boolean add(E item) throws IllegalArgumentException
• public boolean remove(Object item)
• public E get(E item)
• public boolean contains(Object item)
• public void clear()
• public boolean isEmpty()
• public int size()
• public Iterator iterator()
• public ArrayList getPreorderTraversal()
• public ArrayList getPostOrderTraversal()
• public ArrayList getInorderTraversal()
• private class BSTIterator implements Iterator which contains:
– public BSTIterator()
5
– public boolean hasNext()
– public E next() throws IllegalStateException
– public void remove() throws IllegalStateException
4.1.3 Class Node
• public Node getSuccessor()
• public Node getPredecessor()
4.1.4 Class Dictionary
• public Dictionary()
• public Dictionary(BinarySearchTree tree)
• public Dictionary(String filename) throws FileNotFoundException
• public boolean addEntry(String word)
• public boolean addEntry(String word, String definition)
• public boolean hasWord(String word)
• public String getDefinitionOf(String word) throws IllegalArgumentException
• public boolean removeEntry(String word)
• public boolean updateEntry(String word, String newDef)
• public void printToFile(String filename) throws FileNotFoundException
• public int entryCount()
• public ArrayList getSortedEntries()
• public static class Entry implements Comparable which contains:
– public Entry(String w)
– public Entry(String w, String d)
– public int compareTo(Entry other)
– public boolean equals(Object o)
5 Examples
Here we present example input and output.
6
5.0.5 Input Dictionary
jumped
dog:A domesticated animal with a wet nose. Drools a lot.
brown
the
quick:The tender flesh under the finger- and toenails.
over:finished. bygone. kaput.
fox
lazy:Antonym of "COM S 228 student"
5.0.6 Input Document
Teh quiche brown fox jumped over the laxy god.
Over the lazy fox jumped the quick, brown dog.
5.0.7 Non-interactive mode output
Teh quiche brown fox jumped over the laxy god.
ˆ ˆ ˆ ˆ
Over the lazy fox jumped the quick, brown dog.
5.0.8 Interactive mode session output with user input in italics
Teh quiche brown fox jumped over the laxy god.
ˆ ˆ ˆ ˆ
Teh: [r]eplace/[a]ccept? r
Replacement text: The
The quiche brown fox jumped over the laxy god.
ˆ ˆ ˆ
quiche: [r]eplace/[a]ccept? r
Replacement text: quixotic
The quixotic brown fox jumped over the laxy god.
ˆ ˆ ˆ
quixotic: [r]eplace/[a]ccept? a
The quixotic brown fox jumped over the laxy god.
ˆ ˆ
laxy: [r]eplace/[a]ccept? r
Replacement text: lazy
The quixotic brown fox jumped over the lazy god.
ˆ
god: [r]eplace/[a]ccept? a
The quixotic brown fox jumped over the lazy god.
Over the lazy fox jumped the quick, brown dog.
7
5.0.9 Dictionary edit mode session output with user input in italics
Editing dictionary.txt
Word: the
’the’ was found.
[r]emove/[g]et definition/[c]hange definition/do [n]othing: g
Word: dog
’dog’ was found.
[r]emove/[g]et definition/[c]hange definition/do [n]othing: g
A domesticated animal with a wet nose. Drools a lot.
Word: h4x0r
’h4x0r’ is invalid. Please enter a valid word.
Word: brown
’brown’ was found.
[r]emove/[g]et definition/[c]hange definition/do [n]othing: c
Definition: The color of chocolate, sweeeeet chocolate!
Word: jumped
’jumped’ was found.
[r]emove/[g]et definition/[c]hange definition/do [n]othing: r
Word: jumped
’jumped’ not found.
[a]dd/add with [d]efinition/do [n]othing: a
Word: jumped
’jumped’ was found.
[r]emove/[g]et definition/[c]hange definition/do [n]othing: n
Word: drool
’drool’ not found.
[a]dd/add with [d]efinition/do [n]othing: d
Definition: Foul dog kiss leavings
Word: drool
’drool’ was found.
[r]emove/[g]et definition/[c]hange definition/do [n]othing: g
Foul dog kiss leavings
Word: !quit
Wrote dictionary.txt. Original backed up to dictionary.txt˜.
6 Getting started
Much of the lecture material apropos to this assignment will not have been covered by the time this assignment
is released. That should not prevent you from getting started.
The user interfaces for interactive and dictionary edit modes—which amount to a large portion of the
code for this project—can be developed completely without any other functionality present. Placeholder
methods can be used for the actions on menu selections, with behavior that you define to ensure you can
fully exercise all aspects of your interface. Input can be read from the terminal, e.g., with a Scanner.
8
Additionally, the Dictionary class can be implemented using an ArrayList rather than a BinarySearchTree.
A Dictionary built on an ArrayList will not yield acceptable performance, but ArrayList does provide
the necessary functionality (with different method names, of course) for a working Dictionary implementation.
7 Contest
We are conducting a contest with this assignment. Participation is optional. The prize? Eternal Glory!
The supplied template includes a timer class, and the main method includes the necessary calls to initialize,
start, stop, and display timing information from the timer. In order for you to be eligible for the
competition:
• You may not modify main() except to set the values of the name and section variables.
• You must modify the name and section variables appropriately with your full name and section.
• You may not modify the Timer class.
• All methods used directly or indirectly by the timed code must work as specified in this document, or,
if that code has bugs, those bugs, as judged by the grading TA, must not lead to incorrect output, nor
to incorrectly accelerated execution (e.g., by unsafely skipping required functionality).
• You Dictionary class must be built upon and use your BinarySearchTree implementation. Though
it is theoretically possible to get better performance with other data structures, for example with a
hash table, the purpose of this exercise is to implement and work with trees. Inserting a different data
structure for its better performance characteristics on this project will disqualify you.
If you do not wish to participate in the contest, leave the default values assigned to name and section
in main().
Each section will have one winner who will be acknowledged in lecture and awarded with the prize
of Eternal Glory!, together with all the rights, privileges, and responsibilities thereunto appertaining. The
winner will be the student who’s spell checker correctly processes a document and dictionary of our choosing
in non-interactive mode in the least time. If you hope to win Eternal Glory!, you would do well to
implement a tree balancing algorithm in your BinarySearchTree class. Possible choices include AVL trees
(https://en.wikipedia.org/wiki/AVL tree), red-black trees(https://en.wikipedia.org/wiki/Red-black tree), and
Splay trees (https://en.wikipedia.org/wiki/Splay tree), but you are not limited to these, and may choose any
balancing algorithm and optimizations you like so long as you correctly implement the specified functionality
and adhere to the rules above.
Implementing balanced binary trees is challenging but rewarding, an excellent learning activity, and well
within the capabilities of students in this course. We hope that many of you will choose to do it, but if you do,
please complete the basic, required, unbalanced functionality first—and back it up—before implementing
tree rotations and balanced inserts and removes, that way you won’t run the risk of failing to complete the
base assignment because you were too focused on Eternal Glory!
8 Submission
Place all Java source files for all classes in the edu.iastate.cs228.hw4 package. Remember to save
the directory structure (e.g., edu/iastate/cs228/hw4 for UNIX or edu\iastate\cs228\hw4 for Win-
9
dows) in a zip file. Turn in the zip file. Do not turn in your class files. Your zip file should be named
Forename_Surname_HW4.zip where Forename and Surname are to be replaced by your given name and
family name, respectively. You may submit a draft version of your code early to see if you have any submission
problems with Blackboard. We will grade only your latest assignment.