My application facilitates online consultations between clients and employees across various companies.
Employees: Belong to specific companies and can be in one of four states: idle, offline, busy, or init (initializing a session).
Clients: Request sessions on specific topics.
Booking Process:
Clients send booking requests via websockets to a company.
The backend searches for an idle employee within that company.
If found, chnage the employee status to init the backend sends a request to the employee to accept or reject the session.
If rejected, the backend searches for another idle employee within the same company and repeats the process.
Upon acceptance, the employee’s status is changed to busy. On rejection, it reverts to idle.
Clients can only send booking requests if idle employees are available within the company.
Technical Implementation:
I’m using an ETS table to store employee data.
Challenges:
Efficient Data Access: What is the optimal key structure for the ETS table to quickly locate idle employees within a specific company?
Should I use a tuple key {company_id, employee_id} and rely on match operations? because i only have company id so i do use match with only company id.
Should I store a list of employees under each company_id key?
Would creating separate ETS tables per company be more efficient?
Are there indexing strategies to optimize lookups?
Concurrency and Scalability: How can I handle a high volume of concurrent client requests without encountering bottlenecks or race conditions?
Is a single process sufficient for ETS manipulation and access?
If using multiple processes, what locking mechanisms should I implement to ensure data consistency?Should i use shards
Are there alternative approaches to improve scalability without compromising data integrity?
Key Concerns:
Fast Lookup: I need to find idle employees quickly (ideally O(1) time complexity).
Scalability: The system should handle thousands of concurrent client requests.
Data Consistency: Ensure correct employee status updates and avoid race conditions.
I would greatly appreciate any insights, suggestions, or best practices to address these challenges and design a robust, scalable solution for managing employee bookings.
Nothing tried so far. Right now the current implementation is every company have Genserver and Genserver have state like online employee list , company status based if have any idle employee and it work but i want good implementation. So that’s why i start researched on it and found these things which i mention above.I am thinking to go with ets with single process. i should start implementing this but i want solid foundational ground to cover normal load. Maybe i wrong, these are premature optimal solution i talked about.
Are You talking about Mnesia or postgres.
But ets or redis have speed advantage.They are much faster for data access then database.Although i have database table for company, client, employee
Your speed concerns seem overblown. Are you sure you’re not over-optimizing for a problem that you’ll never have? Do you have hard requirements to respond within 1ms or less?
The central thing to think about when designing with ETS is “what operations does this need to support?”. There are specific things that ETS can do very efficiently, so you want to structure your code to align with those.
For instance:
ordered_set tables have an efficient next operation for a given key, even if the specific key is not present in the table.
So imagine an ETS table with the structure:
{{company_id, timestamp_ms}, employee_id}
where an employee “becoming available” inserts a tuple.
The scheduler can ask :ets.next(some_table, {company_id, 0}) - and get the tuple with a matching company_id and the lowest timestamp.
There are some corner-cases to watch out for:
if there are NO tuples with that company_id, next will return one from the next company. The scheduler could either deal with this by inserting a sentinel “no more employees” value, or by checking the result.
two employees that become available at exactly the same millisecond will cause one of their inserts to fail. Increasing the timestamp precision to microseconds would reduce this chance, or the scheduler could retry.
since employee_id isn’t part of the key, an employee could have multiple tuples with different timestamps. This may be a feature, but if it isn’t see below.
The other thing is to use multiple ETS tables to handle different questions. For instance, the “employee can have multiple entries” case, you might have an auxiliary ordered_set table like :
{{company_id, employee_id}, timestamp_ms}
The scheduler would attempt to insert into this table first when recording an employee’s availability.
The above table can also answer useful-sounding questions like:
“is this company’s employee available?” (:ets.member)
“since when has this company’s employee been available?” (:ets.lookup_element)
“what employees are available for this company?” (:ets.next etc)
You raise a valid point about the potential for over-optimization. While a 1ms response time isn’t a strict requirement, I do believe that minimizing latency is crucial in this scenario. The nature of the booking system demands a good user experience. Delays in finding available employees could lead to frustrated clients and missed opportunities.
While my concerns might not warrant microsecond-level optimizations, I’m keen on avoiding unnecessary database round-trips for each booking request. This is why I’m leaning towards in-memory solutions like ETS or Redis. They generally offer faster retrieval times than traditional databases, helping to keep latency low and responsiveness high.
While the proposed two-table ETS structure elegantly solves the issue of efficient employee lookup and data integrity, I’m still concerned about the potential bottleneck of using a single process to manage table manipulation.
I understand that a single process guarantees that a company won’t accidentally assign the same employee to multiple clients, as requests are handled sequentially. However, as the number of booking requests grows, this approach may become a performance bottleneck.
On the other hand, introducing multiple processes to handle the workload raises the risk of race conditions. There’s a small but real chance that two processes could simultaneously attempt to assign the same employee to different clients, leading to conflicts and inconsistencies.
I’m considering using a concurrency mechanism like ConCache for locking to mitigate the risk of race conditions in a multi-process environment. However, I’m still weighing the trade-offs between the simplicity of a single process and the scalability of a multi-process approach.
For the initial MVP, I plan to implement a single process solution to validate the overall design and functionality. However, I recognize the need for a more scalable solution in the long run to accommodate increasing traffic and ensure a smooth user experience.
I would greatly appreciate any insights or advice on the most suitable concurrency strategy for this scenario. Are there other locking mechanisms or architectural patterns I should consider? How can I best balance the need for data consistency with the desire for a high-performance, scalable system?
Let me give you my 5 cents on the problem you are trying to solve: if you don’t have a hard requirement for response time, then as others mentioned above, use a database.
There are multiple reasons why the ETS solution is harder to maintain, some of them:
all the data stored in ETS tables will vanish when you restart the server or redeploy (with exception if you are doing hot-code reloading, but that in itself a hard thing to get right);
you can have potential problems as you mentioned with the inbox overflow if you don’t know exactly what you are doing;
the complexity of having persistence will most likely outweigh the small performance gain you will get from storing the data in memory.
What I would personally do, is try implementing a prototype using ecto + sqlite, check response times when the throughput is to your expected maximum and decide whether that is a good or bad solution.
You can also later add a caching layer in ETS by using one of available caching libraries like cachex (this library also handles message overflows for you). Having the source of truth in the database will save you a lot of trouble and will offer you a structured way to store your data.
Look up how cachex does transactions in its source code. That could potentially solve both your performance and race condition worries at the same time.
But as @D4no0 said, I’d use cachex together with SQLite. It will live inside the OS process of your app so any extra latency will be extremely minimal.
Because it’s much easier to query stuff with SQL. (D)ETS and Mnesia aren’t known for being easy to cover part of the data fetching, filtering and sorting scenarios.
So if i will go with db approach(For MVP) i can also spin some worker process per company to handle client request which client send via websocket and Db will handle the data race problem if multiple request same resource.
The DB will handle it, yes, but I also don’t think you need to spawn any processes yourself. Your app simply responds to requests, right? Though you also mentioned WebSockets so maybe it’s a two-way communication?
As every user communication is with websocket and websocket is process itself i can use it too. But i think its good to have centralize control for these things.