Counting
Counting the occurance of a word/phrase
I wanted to know how many times a searched for word/phrase occurred in a block of text. Since I was going to have to split anyway, why not just split on the word/phrase ($find) and subtract one (since $count is the number of pieces, not the number of splits, it would otherwise always be one high).
my @words = split(/\b$find\b/i); # case insensitive exact match, $_
$count += @words-1;
[ comment | link | top ]
Links
- Word counting with \W+ split
- To count the words, I need to break each line up by words, and then add the number of words into the counter, not the number of lines. Just a few tweaks will do it.
#!/usr/bin/perl
while (<>) {
@words = split(/\W+/);
$count{$ARGV} += @words;
}
foreach $file (sort by_count keys %count) {
print "$file has $count{$file} words\n";
}
sub by_count {
$count{$b} <=> $count{$a};
}
The list @words gets created for each line by splitting the line up by the regular expression /\W+/. This regular expression matches sequences of non-alphanumerics. The split operator drags this regular expression through the string (in this case, the contents of $_, because I didn't specify anything else). Every place the regular expression matches gets ripped out of the string as a delimiter -- everything else becomes an element of the list to be returned.
Once I have a list in @words, I can add the length of the list to the count. The name @words in a scalar context is the length of array @words. This will keep the elements of %count as a running total of words now, not lines.
http://www.stonehenge.com/merlyn/UnixReview/col02.html
[ comment | link | top ]
- Word counting on \s+ split
- Given a text, return a list of words and word counts
while() {
$_ =~ s/^\s+//; # Good idea to always do this. If the line
# starts with blanks, then the first element
# of the array after splitting wound be null
@words_in_line = split(/\s+/,$_);
# splits the line into an array of words
for ($count=0;$count<=$#words_in_line;++$count) {
$word_count{$words_in_line[$count]}++;
}
}
while(($key,$val) = each %word_count) {
print ``$key $val\n'';
}
http://www.cs.jhu.edu/~hajic/perlguide.txt
[ comment | link | top ]
- Line, word and character count
-
# exercise4.2.pl: a program that reads lines from standard input until
# end-of-file, then prints the number of lines, words
# and characters in the input, followed by the input
# in reverse order (both lines and characters).
# usage: exercise4.2.pl
# 2000-03-03 zavrel@uia.ua.ac.be
# intitialize a line buffer
@lines = ();
# read lines of input
while(defined($line = <>)){
chomp $line; # this means that newlines will not be counted
$nlines++; # counts the lines
@words = split /\s+/, $line;
$nwords += @words; # counts the words
@chars = split //,$line;
$nchars += @chars; # counts the characters
# reverse the characters in the line
# and push this onto a stack
@chars = reverse @chars;
$string = join "", @chars;
push @lines, $string;
}
print "lines: $nlines, words: $nwords, characters $nchars\n";
print "reversed:\n";
# by popping lines off the stack they come out in the
# reverse order:
while($line = pop @lines){
print "$line\n";
}
http://lcg-www.uia.ac.be/~erikt/perl/so04.html
[ comment | link | top ]