That is true. A very simple Regular expression:
new Regex(mySubString).Matches(myString).Count
But, that simple line of code is not the reason for this post. I've found an even deeper issue, and it all comes down to how you define the problem (doesn't it always).
Lets say that my string is "BOBOBOB"
How many instances of the substring "BOB" is in "BOBOBOB"? 2 or 3?
My way in the previous post and the regular expression both agree, the number is 2.
But, say you are writing a game and you are using a string to store previous moves. Each move can have only two choices: a 0 or a 1. So after a number of moves, the storage string could look like this "0111110010100111110110111101".
Now, how many times did the player choose to play a 1 after playing a 1 the prior two moves, i.e. a "111"? If you did a substring search, you would get a 3. But if you count the number of times "111" appear including overlap, the number is much higher: 8.
So, an AI routine that was based on prior moves would need to find all combinations of "111" even if one set is partially contained within another.
So, here is my LOOP that counts the number of substrings within a string, even if the substring is partially contained within another substring.
for (var i = 0; i <= myString.Length - mySubString.Length; i++)
{
if (myString.Substring(i, mySubString.Length) == mySubString)
{
count++;
}
}
11 comments:
Hello ,
it's should be vice versa in the way we use it (checked right now) .
The correct way is :
new Regex(mySubString).Matches(myString).Count;
For example , to know the number of occurrences of blank space " " in the string str , we write :
new Regex(" ").Matches(str).Count;
Lior .
Oooopppppsssss!
This is sad ... for two reasons.
1 - I should have verified my code before I posted - I usually do, I do not know why I didn't this time, but that is no excuse.
2 - I't has taken this long for someone to notice.
Thanks again, I've corrected the main text.
However , using Regex is not recommended for dealing with large amount of text .
I've just written an algorithm which uses Regex for counting the number of occurrences of a given string within another one ;
the algorithm runs like a turtle ...
Replacing the Regex use with a "naive" method , which scans the string from the beginning ,
improved the actual running time of the whole algorithm dramatically .
Lior
True, its always a balance between using the generic tools in the Language/Framework and "rolling your own".
One works with very little coding but may be so generic or built with so many options that speed is affected for large data sets.
On the otherhand if you only have a little 2k string that you need to "look at", custom code could be a big time waster.
Thanks a lot.
It's help me lot
there is no need to loop, just split the string using the search substring as the separator, and return the resulting array length
So, for simple counting of occurrences of a substring we are now up to 3 different ways (See previous post - link at top of page).
string[] s = {"BOB"};
Console.Write("BOB".Split(s,StringSplitOptions.None).Count() - 1);
Of course you need to subtrack 1 because both Length and Count give you the total number of elements in the array and if you split an empty string, you will get a count of 1.
But, the purpose of this post was to find all substrings even if the substring is partially contained within another substring.
Your way produces the same number for strings BOBBOB and BOBOBOB: 2, where it should return 3 for the second case.
Very, very good. Thanks for sharing!
In this example Regex is rather slow. LINQ is also one of the slowest ways as well. One of the fastest ways to do it without looping is the following:
//So if key = "02", and strToSearch = "01,02,03", the number of occurrences should be (8-6) / 2 = 1
int substr_count(String strToSearch, String strKeyToLookFor)
{
return (strToSearch.Length - strToSearch.Replace(strKeyToLookFor, String.Empty).Length) / strKeyToLookFor.Length;
}
Performance benchmarks can also be seen on this post:
http://stackoverflow.com/questions/541954/how-would-you-count-occurences-of-a-string-within-a-string-c
PS: anyone who has done PHP programming will appreciate the method name. :-)
Wow, anonymous. I'm glad you read my blog ... wait, you didn't.
This was a take-two as the title clearly states. If you had taken the time to read the original post, it would have shown that I posted about this method, oh, 3 and a half years ago!?!
Thanks for your comment.
If you're looking for the fastest way, check out this blog:
http://blogs.davelozinski.com/curiousconsultant/csharp-net-fastest-way-to-check-if-a-string-occurs-within-a-string
as it benchmarks about a dozen different ways, including parallel loops.
Post a Comment