Concurrent data exchange via queues - without locks : Possible?

Concurrent data exchange via queues - without locks : Possible?

It's the first blog in a series of two on the topic.

Concurrency and complexities

In general computer science, concurrency means two or more tasks (similar or different) happen in parallel, which consists of contention for access to resources. The contended resource may be a database, a file, a socket, some object or even a memory location.

Concurrent code execution requires mutual exclusion and may affect the visibility of change. Mutual exclusion is how concurrent/contended update to some resource is managed between multiple workers and is typically achieved with the help of a semaphore or a lock in Java. This ensures only one writer makes the update at a time and other threads wait for their turn. Visibility of change is about controlling when such updates are visible/available to others. In a concurrent environment, synchronizing a contended write access is the most costly operation, in terms of wasted CPU cycles and latency, as this requires complex coordination between contending threads.

Can we avoid mutual exclusion? Hopefully yes, if we have only one writer, i.e. by ensuring the required resource is modified by only one thread, thus making mutual exclusion typically unnecessary. Once updated, the updated resource can be made available/visible to other threads.

Concurrent data exchange via queue

Many applications use some sort of queue to exchange data between different processing stages, where one or more producers want to push data into the queue, and one or more consumers take that data out of the queue for processing. You can imagine what kind of coordination is required between the producers and consumers in this scenario.

Let's first talk about producers: as we saw above, concurrent access to the queue to write data requires a locking mechanism between multiple threads (mutual exclusion). This contention may be avoided if we ensure only one producer ever writes to the queue. Simple? not really. Let's go further.
Once a producer adds data to the queue, this addition generally changes its 'size', 'head' and 'tail', and if care is not taken, a consumer may read the wrong size or read data from the wrong place, which means this addition requires exclusion between the producer and consumer as well. On the consumers' side, it takes out (deletes) data, again resulting in size, head & tail changes. And for the multiple consumers scenario, coordination is required between the consumers as well.

Problem of queues

Queues use either linked-list or arrays as the backend storage, and typically need to be bounded, however large, since nobody has truly unlimited memory. An unbounded queue may result in OOM (OutOfMemory) error if producers work too hard and consumers are not able to process with the same speed unless the developer makes producers aware of this condition.

For easy implementation of concurrent queue usage, we can use a lock on the whole queue, but it results in a serious bottleneck in terms of throughput (the number of operations done in a unit of time). Now imagine if a solution requires multiple queues at different stages, then the cumulative cost of enque & deque of work units at each stage is multifold.

The use of a queue also presents the problem of garbage collection - once dequeued, the data objects need to be re-claimed/collected.

Solution

A team working on building a faster trading platform faced the above problems and devised a simple solution that supports data exchange without locks between producers & consumers.

We'll look at the solution in detail in the next blog. Stay tuned.