C
C#17mo ago
cesarz

❔ Unique Integer ID GEN from string input

Hi there, I've been trying to think of a unique integer ID generator but let's just say I'm not too good at this blobthumbsup
public ulong UniqueIdGen(string str)
{
char[] chars = str.ToCharArray();
ulong ID = 0;
foreach (char c in chars)
ID = 1000 * ID + c + 67;
return ID;
}
public ulong UniqueIdGen(string str)
{
char[] chars = str.ToCharArray();
ulong ID = 0;
foreach (char c in chars)
ID = 1000 * ID + c + 67;
return ID;
}
The one I made is quite simple and it just shifts all the characters by 67 (so that all int values of the characters (33 to 126) are in range of 100 to 193) and adds the numbers together (side by side, so 102 '+' 192 => 102192) It works and the ID's are unique but the issue is quite obvious -> the ID's are way too long. I could ignore the last 4 characters in my initial range (which are { | } ~) and I would be able to squeeze my range into 10 to 99, but still: for ulong that would mean a character length limit of 9... which is quite low. I searched online for some unique integer ID generating algorithms but couldn't find anything useful for my case, or I'm just blind and didn't notice something that actually is useful. Any help in form of advice or links is appreciated.
13 Replies
Tvde1
Tvde117mo ago
how big or small can your strings get?
cesarz
cesarzOP17mo ago
The string input will be a name given to a file in my game so I guess there is no limit to how big these can get, depends on the situation, but most likely not bigger than 32 characters I suppose (even if they are bigger I can just reduce them to 32 chars, with this length there shouldn't be any problems)
Tvde1
Tvde117mo ago
Mathematics Stack Exchange
How can I convert this unique string of characters into a unique nu...
I have an unusual programming problem and the math side of it has me stumped. It's probably a simple answer but math isn't my strongest area. I've generated a unique string of 7 characters which are
cesarz
cesarzOP17mo ago
Thanks, I'll take a look For ulong limit 18_446_744_073_709_551_615 Testing this method using highest-value character z (int 35) we get a limit of 12 characters, any higher that that and there's an overflow... It's better than my previous 9 character limit but still far away from 32 character limit...
public static ulong UniqueIdGen(string str)
{
char[] chars = str.ToCharArray();
Queue<int> queue = new Queue<int>();
foreach (char c in chars)
{
switch (c)
{
case >= '0' and <= '9':
queue.Enqueue(c - '0');
break;
case >= 'A' and <= 'Z':
queue.Enqueue(c - 'A' + 10);
break;
case >= 'a' and <= 'z':
queue.Enqueue(c - 'a' + 10);
break;
}
}
int length = queue.Count;
ulong ID = 0;
for (int i = 0; i < length; ++i)
ID += (ulong)queue.Dequeue() * (ulong)Mathf.Pow(36, i);
return ID;
}
public static ulong UniqueIdGen(string str)
{
char[] chars = str.ToCharArray();
Queue<int> queue = new Queue<int>();
foreach (char c in chars)
{
switch (c)
{
case >= '0' and <= '9':
queue.Enqueue(c - '0');
break;
case >= 'A' and <= 'Z':
queue.Enqueue(c - 'A' + 10);
break;
case >= 'a' and <= 'z':
queue.Enqueue(c - 'a' + 10);
break;
}
}
int length = queue.Count;
ulong ID = 0;
for (int i = 0; i < length; ++i)
ID += (ulong)queue.Dequeue() * (ulong)Mathf.Pow(36, i);
return ID;
}
Here's the current implementation I could possibly try and reduce the possible characters by also ignoring numbers / or not adding 10 to the letters resulting in a range of 0 to 25 instead of 0 to 35. Numbers are rarely used in my file names anyways so there is little to no risk in there being any collisions with this I suppose. Now there's a 13 character limit... Any progress is good. 13 characters is quite a lot to work with already. I can divide it in 'half' => 6 and 7 characters Take 6 characters from the beginning of the input And 7 characters from the end of the input Unless the input length is < 13, then I just take every character. That would probably work out fine enough? Should probably work fine as long as nobody uses names such as "hhhhhhhhhh" and "hhhhhhhhhhhhhhh" which will then probably generate the same ID
public static ulong UniqueIdGen(string str)
{
char[] chars = str.ToCharArray();
Queue<int> inputQueue = new Queue<int>();
Queue<int> idGenQueue = new Queue<int>();
foreach (char c in chars)
{
switch (c)
{
case >= '0' and <= '9':
inputQueue.Enqueue(c - '0');
break;
case >= 'A' and <= 'Z':
inputQueue.Enqueue(c - 'A');
break;
case >= 'a' and <= 'z':
inputQueue.Enqueue(c - 'a');
break;
}
}
int length = inputQueue.Count;
if (length < 13)
{
while (inputQueue.Count > 0)
idGenQueue.Enqueue(inputQueue.Dequeue());
}
else
{
for (int i = 0; i < 6; ++i)
idGenQueue.Enqueue(inputQueue.Dequeue());
while (inputQueue.Count > 7)
inputQueue.Dequeue();
while (inputQueue.Count > 0)
idGenQueue.Enqueue(inputQueue.Dequeue());
}
ulong ID = 0;
length = idGenQueue.Count;
for (int i = 0; i < length; ++i)
ID += (ulong)idGenQueue.Dequeue() * (ulong)Mathf.Pow(26, i);
return ID;
}
public static ulong UniqueIdGen(string str)
{
char[] chars = str.ToCharArray();
Queue<int> inputQueue = new Queue<int>();
Queue<int> idGenQueue = new Queue<int>();
foreach (char c in chars)
{
switch (c)
{
case >= '0' and <= '9':
inputQueue.Enqueue(c - '0');
break;
case >= 'A' and <= 'Z':
inputQueue.Enqueue(c - 'A');
break;
case >= 'a' and <= 'z':
inputQueue.Enqueue(c - 'a');
break;
}
}
int length = inputQueue.Count;
if (length < 13)
{
while (inputQueue.Count > 0)
idGenQueue.Enqueue(inputQueue.Dequeue());
}
else
{
for (int i = 0; i < 6; ++i)
idGenQueue.Enqueue(inputQueue.Dequeue());
while (inputQueue.Count > 7)
inputQueue.Dequeue();
while (inputQueue.Count > 0)
idGenQueue.Enqueue(inputQueue.Dequeue());
}
ulong ID = 0;
length = idGenQueue.Count;
for (int i = 0; i < length; ++i)
ID += (ulong)idGenQueue.Dequeue() * (ulong)Mathf.Pow(26, i);
return ID;
}
Here's the final implementation. If anyone has any ideas on how to improve this further then feel free to ping me
Tvde1
Tvde117mo ago
that looks like a really inefficient method, using queues 😬
cesarz
cesarzOP17mo ago
What would you recommend as an alternative? Would an array, where -1 is to be skipped, be a better alternative?
Tvde1
Tvde117mo ago
You're taking the first 13 and last 7 chars of a string, right?
cesarz
cesarzOP17mo ago
-1 would be placed during the first loop with the switch in places of characters which aren't A-Z/a-z/0-9 alongside with an incrementing length variable for the ones which are first 6 and last 7 and some are ignored @#$_&-+()/ etc
public static ulong UniqueIdGen(string str)
{
char[] chars = str.ToCharArray();
int length = chars.Length;
int correctChars = 0;
for (int i = 0; i < length; ++i)
{
switch (chars[i])
{
case >= '0' and <= '9':
++correctChars;
chars[i] -= '0';
break;
case >= 'A' and <= 'Z':
++correctChars;
chars[i] -= 'A';
break;
case >= 'a' and <= 'z':
++correctChars;
chars[i] -= 'a';
break;
default:
chars[i] = '-';
break;
}
}
ulong ID = 0;
for (int i = 0, x = 0; i < length; ++i)
{
if (chars[i] == '-')
continue;
if (x != 6)
{
ID += chars[i] * (ulong)Mathf.Pow(26, x++);
--correctChars;
}
else
{
if (correctChars > 7)
{
--correctChars;
continue;
}
++x;
}
}
return ID;
}
public static ulong UniqueIdGen(string str)
{
char[] chars = str.ToCharArray();
int length = chars.Length;
int correctChars = 0;
for (int i = 0; i < length; ++i)
{
switch (chars[i])
{
case >= '0' and <= '9':
++correctChars;
chars[i] -= '0';
break;
case >= 'A' and <= 'Z':
++correctChars;
chars[i] -= 'A';
break;
case >= 'a' and <= 'z':
++correctChars;
chars[i] -= 'a';
break;
default:
chars[i] = '-';
break;
}
}
ulong ID = 0;
for (int i = 0, x = 0; i < length; ++i)
{
if (chars[i] == '-')
continue;
if (x != 6)
{
ID += chars[i] * (ulong)Mathf.Pow(26, x++);
--correctChars;
}
else
{
if (correctChars > 7)
{
--correctChars;
continue;
}
++x;
}
}
return ID;
}
How about this one? Now there are no queues, only the initial char array It skips the incorrect characters (-), adds the first 6 to the ID, if 6 are already added then it skips through all the other correct chars until there are only 7 left
Tvde1
Tvde117mo ago
That'll be much more performant!
cesarz
cesarzOP17mo ago
and idk, maybe an additional check in the loop if there are any correct characters left? There's a possibility that someone will for some reason name a file "Abc######" so the loop would continue for 6 more times for no reason, but idk if it makes too much of a difference
Tvde1
Tvde117mo ago
Measure and you'll find out :p
cesarz
cesarzOP17mo ago
Well, anyways, thanks for the help!
Accord
Accord17mo ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.
Want results from more Discord servers?
Add your server