Sunday, December 21, 2008

C# - Counting the number of occurrences of a substring

How many times have you needed to find the total number of times a substring has appeared within a string? This common programming problem can be approached from many different angles.

You could write brute force code that walks through the string and looks at it character by character, keeping a tally of how many complete matches with the substring was found.

You could use IndexOf within a while loop, incrementing the counter and moving the start point on each iteration of the loop, exiting the loop when IndexOf returns that there are no more occurrences of the substring.

But, there is only one way that I know how to do it without a loop. Use the Replace method of string.

Here is a code snippet showing how to use this very simple yet powerful way to count the number of occurrences of a substring within a string.


string myString = "Lorem ipsum dolor sit amet, " +
"consectetur adipisicing elit, sed do eiusmod " +
"tempor incididunt ut labore et dolore magna " +
"aliqua. Ut enim ad minim veniam, quis nostrud " +
"exercitation ullamco laboris nisi ut aliquip " +
"ex ea commodo consequat. Duis aute irure " +
"dolor in reprehenderit in voluptate velit " +
"esse cillum dolore eu fugiat nulla pariatur. " +
"Excepteur sint occaecat cupidatat non proident, " +
"sunt in culpa qui officia deserunt mollit anim " +
"id est laborum.";

string mySubString = "dolor";

int Count = (myString.Length -
myString.Replace(mySubString, string.Empty).Length)
/ mySubString.Length;


This will return a count of 4, the correct number without a single loop in your code!

This code is case sensitive. To create a case insensitive version, just move myString and mySubString to lowercase.


string mySubString = "lo";

int Count = (myString.Length -
myString.ToLower().Replace(mySubString.ToLower(),
string.Empty).Length) / mySubString.Length;


The case sensitive version would have only returned 4 because it would not have found the first lo at the beginning because the L in Lorem is capitalized. But with the case insensitive version, the code correctly finds all 5 instances of our substring.

This may seem like such a small thing, but everyday programming is filled with these "Old School" tricks!

1 comment:

jay said...

nice way of counting occurances.. cheers