Concurrent Dictionary and Delay - C#
Reference: ConcurrentDictionary
In a high throuput and low latency enviorment - we had a peice of code in the critical path - which helped in throttling. After version #2.0 of the software was released (internally) - we saw the througput has gone down significantly almost 3X - compared to previous Version #1.0 - 4mins to 11 minutes.
We looked at the history/change sets between the versions: it was huge, 100s of commits and software was running with 100s of threads - on a 2Milliion LOC software stack. So, we were unsure which peice of code nor thread - was causing the delay. We were all over the place, trying to see if its a system issue or memory trend or some fragmentation vs memory/back pressure vs excess logging etc…. And, after 3 day (yes: 3days slog) - we found its a delay caused by the ConcurrentDictionary class!!! 😳
It was an unpleasant suprise, since we didn’t expect such a huge delay cause due to a .Net Framework class.
1. Version 1.0: [4mins]
One API with its backing data store with one Dictionary.
private Dictionary<int, User> userInfos;
public List<User> GetUsers() {
return userInfos.Values;
}
2. Version 2.0: [11 minutes]
In version #2: We changed the data store, which was holding the user info from a simple dictionary to a two level ConcurrentDictionay.
There was some obvious reason for using ConcurrentDictionay - I won’t go into the details. But, after changing the data store- we ended up blindly updating all usages to consume from the new data store.
And the API (GetUsers) implementation was updated, interface remained the same - but: it was in the critical path.
private ConcurrentDictionary<int, ConcurrentDictionary<int, User>> userInfos;
public List<User> GetUsers(){
var users = new List<User>();
for(var userGroup in userInfos.Values){
for(var user in userGroup.Values){
users.add(user)
}
return users;
}
}
3. Version 3.0: [4 minutes]
After looking into the usages “in the critical path” - we only required the count of the active user groups.
So, we wrote another API to just return the count. With this change, we were able to bring it back the throughput back to #Version 1.0. Huff!! 😌
private ConcurrentDictionary<int, ConcurrentDictionary<int, User>> userInfos;
public List<User> GetActiveUserGroups(){
return userInfos.Count;
}
Learning:
On highsight, the fix looks trivial and straight forward. But the challenge was the huge debugging effort to find the root cause. And little did we suspect a .Net Framework class can cause such a huge delay and slow down overall execution time. It was a great learning for us as a team - involved in debugging. BTW, we are talking about a code base of about few Million lines of code.
BTW: Concurrent dictionary exists for a reason: Usage of it as our data store was the right choice.
Represents a thread-safe collection of key/value pairs that can be accessed by multiple threads concurrently.
If a class should be thread safe, then there has to be a lock/Mutex implemented. A lock has its own tradeoff in terms of execution time (Thread Scheduling, Prempt etc..) - so think carefully on its usages.