HomeSearch

Python Word Count Method (re.findall)

This Python article counts the number of words in a string. A regular expression method, wordcount is introduced.
Word Count. How many words are in a string? Here we develop a Python method, wordcount, that uses re.findall to count words. It locates and counts non-whitespace characters with a special pattern.
Example. The re.findall method is the most important part of this solution. It does not simply find a match. It finds all matches within a string. When we count them, we can count matching patterns.

Pattern: We specify the pattern \S+ in the re.findall method. This means "one or more non-whitespace characters."

Len: We use the len() built-in to count the number of elements in the resulting list. This equals the number of words in the input string.

Python program that counts words import re def wordcount(value): # Find all non-whitespace patterns. list = re.findall("(\S+)", value) # Return length of resulting list. return len(list) value = "To be or not to be, that is the question." print(wordcount(value)) value = "Stately, plump Buck Mulligan came from the stairhead" print(wordcount(value)) value = "" print(wordcount(value)) Output 10 8 0
I verified that the method counted correctly the number of words in both (trivial) examples. On more complex samples, such as ones involving markup, results may be less accurate. Note how the second phrase has no trailing punctuation.

So: The example method does not count "word endings" but rather the words themselves.

Summary. The regular expression based method for counting words does not exactly mirror all word counting implementations. Microsoft Word, for example, uses a slightly different algorithm. But this version is often within 0.05% of its results.
© 2007-2019 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.
Home
Dot Net Perls