Home : Website : Perl :

Counting


Counting the occurance of a word/phrase

I wanted to know how many times a searched for word/phrase occurred in a block of text. Since I was going to have to split anyway, why not just split on the word/phrase ($find) and subtract one (since $count is the number of pieces, not the number of splits, it would otherwise always be one high).

my @words = split(/\b$find\b/i); # case insensitive exact match, $_
$count += @words-1;

[ comment | link | top ]admin

Links

Word counting with \W+ split
To count the words, I need to break each line up by words, and then add the number of words into the counter, not the number of lines. Just a few tweaks will do it.

       #!/usr/bin/perl
       while (<>) {
                @words = split(/\W+/);
                $count{$ARGV} += @words;
       }
       foreach $file (sort by_count keys %count) {
                print "$file has $count{$file} words\n";
       }
       sub by_count {
                $count{$b} <=> $count{$a};
       }

The list @words gets created for each line by splitting the line up by the regular expression /\W+/. This regular expression matches sequences of non-alphanumerics. The split operator drags this regular expression through the string (in this case, the contents of $_, because I didn't specify anything else). Every place the regular expression matches gets ripped out of the string as a delimiter -- everything else becomes an element of the list to be returned.

Once I have a list in @words, I can add the length of the list to the count. The name @words in a scalar context is the length of array @words. This will keep the elements of %count as a running total of words now, not lines.

http://www.stonehenge.com/merlyn/UnixReview/col02.html
[ comment | link | top ]admin
Word counting on \s+ split
Given a text, return a list of words and word counts

while() {
       $_ =~ s/^\s+//; # Good idea to always do this. If the line
                        # starts with blanks, then the first element
                        # of the array after splitting wound be null
       @words_in_line = split(/\s+/,$_);
                # splits the line into an array of words
       for ($count=0;$count<=$#words_in_line;++$count) {
                $word_count{$words_in_line[$count]}++;
         }
}
while(($key,$val) = each %word_count) {
       print ``$key $val\n'';
}


http://www.cs.jhu.edu/~hajic/perlguide.txt
[ comment | link | top ]admin
Line, word and character count

# exercise4.2.pl: a program that reads lines from standard input until
#                end-of-file, then prints the number of lines, words
#                and characters in the input, followed by the input
#                in reverse order (both lines and characters).
# usage:      exercise4.2.pl
# 2000-03-03 zavrel@uia.ua.ac.be

# intitialize a line buffer
@lines = ();

# read lines of input
while(defined($line = <>)){
   
    chomp $line;       # this means that newlines will not be counted
    $nlines++;          # counts the lines
    @words = split /\s+/, $line;
    $nwords += @words; # counts the words
    @chars = split //,$line;
    $nchars += @chars; # counts the characters

    # reverse the characters in the line
    # and push this onto a stack
    @chars = reverse @chars;
    $string = join "", @chars;
    push @lines, $string;
}
print "lines: $nlines, words: $nwords, characters $nchars\n";
print "reversed:\n";

# by popping lines off the stack they come out in the
# reverse order:
while($line = pop @lines){
    print "$line\n";
}


http://lcg-www.uia.ac.be/~erikt/perl/so04.html
[ comment | link | top ]admin

Back to: Perl