Listing all the unique words in a piece of text
This lesson demonstrates how to list each unique word in a piece of text.
The uniqueWords function
The uniqueWords function takes one parameter, pText. The function uses a repeat loop to check each word in turn creating an array variable named tWordsList. Each element of tWordsList is associated with a different word; the element's key is the word, and the element's contents is a number. For example, if the first word of the string is "Cans", then after the first word is processed, the array "wordsList" contains one element, named "Cans", which contains the number 1.
When a word is processed, the handler adds 1 to the element corresponding to that word. If there is no array element with that name already, one is created automatically by the add command. In general, changing a variable, a chunk in a variable, or an element in an array variable creates the variable, chunk, or element automatically, if it doesn't already exist. If there is already an element with that name, that is, if the word already exists in the array, 1 is added to that existing element.
After all the words have been processed, the function exits the repeat loop. At this point, the array variable tWordsList contains an element for each unique word, whose name is the word itself. The keys of tWordsList, therefore, is a list of all the unique words in the string.
LiveCode chunk expressions
This form of word-by-word processing is possible because LiveCode uses chunk expressions to manage text. A chunk expression is a way of describing a specific portion of a container. LiveCode can directly address individual words, characters, lines, and items (delimited by any character).
In this example, we use the repeat for each chunk form of the repeat control structure:
repeat for each word tWord in tString
This repeat structure loops through each word in the parameter pString, putting the current word into a variable called tWord. You can also loop through other chunk types in a repeat structure, processing each character, line, or item.
The uniqueWords function code
function uniqueWords pString
local tWordsList
repeat for each word tWord in pString
add 1 to tWordsList[tWord]
end repeat
return the keys of tWordsList
end uniqueWords
A note on efficiency
This example uses the repeat for each word form of the repeat control structure. When looping over chunk types in a string, this form is the fastest. The following repeat structure is functionally equivalent to the one in this example, but is much slower:
repeat with x = 1 to the number of words in pString
add 1 to wordsList[tWordsList x of pString]
end repeat
0 Comments
Add your comment