| K. Bretonnel Cohen's home page |
In all the world, no language has better support for string-handling than Perl. However, Java is becoming increasingly popular, so it behooves the linguist to learn string-handling in Java. This page gives some techniques for performing common string-handling tasks in Java. I'll add to the page as the demands of my job dictate that I learn new techniques. :-)
This page is intended for readers who already have some familiarity with Java.
Contents:
String class versus the StringBuffer class
String class versus the StringBuffer classString and StringBuffer classes. Having these two separate classes probably allows Java to be much more efficient in its use of memory, but it's a minor pain in the rear for the developer.
The two classes differ in that instances of the String class cannot be modified in place, while instances of the StringBuffer class can be modified in place.
Since instances of the String class cannot be modified in place, calling a method such as toUpper or toLower on a String typically causes another String to be returned. This can force the use of lots of temporary variables.
If you're building a string, the best way to do it is with a
StringBuffer and the StringBuffer.append()
method. Adding to a String is waaaay slower.
StringTokenizer class. The latter must be approached algorithmically.
String with your input in it
char to hold individual chars
String.charAt() method
String.length() method
String.indexOf() method
String.charAt() returns the character at an indexed position in the input string. For example, the following code fragment analyzes an input word character-by-character and prints out a message if the input word contains a coronal consonant:
// the next two lines show construction of a String with a constant
String input = new String ("mita");
String coronals = new String("sztdSZ");
int index;
char tokenizedInput;
// the String.length() method returns the length of a String. you
// subtract 1 from the length because String indices are zero-based.
for (index = 0; index < input.length() - 1; index++) {
tokenizedInput = input.charAt(index);
// String.indexOf() returns -1 if the string doesn't contain the character
// in question. if it doesn't return -1, then you know that it
// does contain the character in question.
if (coronals.indexOf(tokenizedInput) != -1){
System.out.print("The word <");
System.out.print(input);
System.out.print("contains the coronal consonant <);
System.out.print(tokenizedInput);
System.out.println(">.");
}
}
This produces the output The word <mita> contains the coronal consonant <t>.
StringTokenizer class
StringTokenizer.hasMoreTokens() method
StringTokenizer.nextToken() method
// make a new String object
String input = new String("im ani le?acmi ma ani");
// make a new tokenizer object. note that you pass it the
// string that you want parsed
StringTokenizer tokenizer = new StringTokenizer(input);
// StringTokenizer.hasMoreTokens() returns true as long as
// there's more data in it that hasn't yet been given to you
while (tokenizer.hasMoreTokens()) {
// StringTokenizer.nextToken() returns the
// next token that the StringTokenizer is holding.
// (of course, the first time you call it, that
// will be the first token in the input. :-) )
String currentToken = tokenizer.nextToken();
// ...and now you can do whatever you like with
// that token!
checkForCoronalConsonants(currentToken);
m// and s// operators, and the Perl split() function. You can get to the Perl5Util documentation, albeit somewhat circuitously, through this link.
java.util.regex
package. Furthermore, popular functionalities can be accessed through
the String class, via the matches(),
replaceFirst(), replaceAll(), and
split() methods.
A short quote from Flanagan's Java In A Nutshell, 4th edition, which I strongly suggest that you buy:
Thematches(),replaceFirst(),replaceAll(), andsplit()methods are suitable for when you use a regular expression only once. If you want to use a regular expression for multiple matches, you should explicitly use thePatternandMatcherclasses of thejava.util.regexpackage.
| K. Bretonnel Cohen's home page |