β Checking if a string only contains specified characters (Order doesn't matter)
I am trying to validate user input. My current approach is to iterate over every char of the input string and check wether the char has been specified. Is there any more efficient way?
26 Replies
there could be a shorter way if you use linq, but more efficient... i mean i don't know if you really want to squeeze out performance in this code, if it's really that important
1. write unit tests for code like this
2. don't optimize code until you have diagnosed a bottleneck using proper benchmarking and profiling
3. use BenchmarkDotNet for your benchmarks
optimizing user input validation like this is typically a worthless task. you care more about your validation being accurate, not being fast
in this case... unit tests > benchmarks
@jordinho .IndexOfAny Method on string
why...? looping is just fine here
and in practice most apps tend to use regex anyways
Because IndexOfAny will be the most performant option and it will shorten the code to one line
no one will ever care about performance in this code
ever
it is meaningless to optimize this
you care more about the code being clean, easy to read, and logically sound
did you read the part where I said it will shorten the code to one line?
there are lots of ways to reduce this down to one line
ha-ha you're trolling, right? π
no
a HashSet will likely outperform
IndexOfAny
depending on the number of allowed characters because it is using hashing rather than nested looping through the valid chars
However, in practice, it is common for people to use Regex for these types of things
regex allows you to have the pattern as a configurable property that can be modified without source code changes if needed, but also regex is a common pattern language with lots of documentation and industry knowledge/standardUse this pattern in Regex source gen and look at the generated code, probably it will use IndexOfAny
You can becnmark IndexOfAny against your HashSet solution and see what happens. IndexOfAny applies vectorization so I doubt anything will outperform it
@David_F here you go. add your recommendation:
https://gist.github.com/ZacharyPatten/5477ad5287ae287b8b9e88daf7464952
@ZP ββββββ ΜΆΛΎΜΆΝβ β€ββββ was it hard to write down
input.IndexOfAny(array) >= 0
? https://learn.microsoft.com/en-us/dotnet/api/system.string.indexofany?view=net-7.0String.IndexOfAny Method (System)
Reports the index of the first occurrence in this instance of any character in a specified array of Unicode characters. The method returns -1 if the characters in the array are not found in this instance.
or better omit the return value of the call like you did with others
that isn't the correct logic, that will get the first index of the first valid character. it will not get the index of any invalid characters
In your
Regex()
and HashSet()
you don't find indexes either, right?the point of this code is to validate the entire string (user input). getting the indexes isn't necessary beyond the goal of validating the string.
In your Regex() and HashSet() you don't find indexes either, right?I'm not sure exactly what you are asking
getting the indexes isn't necessaryyeah but in the future having the error pointing at or marking a precise place in the string wouldn't be awful to me
Then use
input.AsSpan().IndexOfAnyExcept()
method
@ZP ββββββ ΜΆΛΎΜΆΝβ β€ββββ I understand that we don't need to continue the validation further if we find a character that is not valid, right?IndexOfAnyExceptyes that would likely be correct π
I ran your benchmark @ZP ββββββ ΜΆΛΎΜΆΝβ β€ββββ , Regex is the best and HashSet with LINQ is the worst.
Regex generates this code
yeah I updated the gist too
not that is matters
because all of theme have relatively similar performance and will not be a bottleneck
but o well that was fun
keep in mind I didn't increase the # valid characters though. Hashset efficiency will improve as that goes up in comparison to the other methods
I don't understand how HashSet with Linq alllocated so much...the lamda is static....
always avoid delegates if you want performance
struct generic parameters > delegates
Was this issue resolved? If so, run
/close
- otherwise I will mark this as stale and this post will be archived until there is new activity.