RabbitMQ and Quorum queues

I had a RabbitMQ cluster with two nodes, and the applications sometimes failed to reconnect. Finally I had time to investigate and fix the issue.

I did some experiments. For example, killing instance 0 caused the irreversible failure of the applications, while killing instance 1 did not disrupt anything. Interestingly killing 1 and then 0 put the applications in a retry-state that eventually led to reconnections, while killing 0 and then 1 causes again the irreversible failure.

That happened with classic queues, both durable and transient.

2022-06-28 12:40:31.265 [ERROR] [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#0-2] [traceId: ] run(SimpleMessageListenerContainer.java:1205) - Consumer threw missing queues exception, fatal=true
 org.springframework.amqp.rabbit.listener.QueuesNotAvailableException: Cannot prepare queue for listener. Either the queue doesn't exist or the broker will not allow us to use it.
     at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.handleDeclarationException(BlockingQueueConsumer.java:693)
     at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.passiveDeclarations(BlockingQueueConsumer.java:627)
     at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:607)
     at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.initialize(SimpleMessageListenerContainer.java:1348)
     at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1193)
     at java.base/java.lang.Thread.run(Thread.java:834)
 Caused by: org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[myqueue]
     at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:743)
     at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.passiveDeclarations(BlockingQueueConsumer.java:620)
     ... 4 common frames omitted
 Caused by: java.io.IOException: null
     at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:129)
     at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:125)
     at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:147)
     at com.rabbitmq.client.impl.ChannelN.queueDeclarePassive(ChannelN.java:1012)
     at com.rabbitmq.client.impl.ChannelN.queueDeclarePassive(ChannelN.java:46)
     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
     at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$CachedChannelInvocationHandler.invoke(CachingConnectionFactory.java:1157)
     at com.sun.proxy.$Proxy103.queueDeclarePassive(Unknown Source)
     at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:721)
     ... 5 common frames omitted
 Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node 'rabbit@rabbitmq-0.rabbitmq-headless.svc.cluster.loca
     at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66)
     at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)
     at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:502)
     at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:293)
     at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:141)
     ... 14 common frames omitted
 Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node 'rabbit@rabbitmq-0.rabbitmq-headless.svc.cluster.loca
     at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:517)
     at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:341)
     at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:182)
     at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:114)
     at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:739)
     at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:47)
     at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:666)
     ... 1 common frames omitted

I understood that the issue could be related to queue mirroring, so I updated the code to use quorum, durable, queues.

  return QueueBuilder.durable("myqueue").quorum().build();

With this change, the error also changed.

2022-06-28 13:49:45.709 [ERROR] [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#1-2] [traceId: ] run(SimpleMessageListenerContainer.java:1214) - Consumer received fatal exception on startup
 org.springframework.amqp.rabbit.listener.exception.FatalListenerStartupException: Mismatched queues
     at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.redeclareElementsIfNecessary(AbstractMessageListenerContainer.java:1897)
     at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.initialize(SimpleMessageListenerContainer.java:1347)
     at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1193)
     at java.base/java.lang.Thread.run(Thread.java:834)
 Caused by: org.springframework.amqp.AmqpIOException: java.io.IOException
     at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:70)
     at org.springframework.amqp.rabbit.connection.RabbitAccessor.convertRabbitAccessException(RabbitAccessor.java:113)
     at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:2194)
     at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:2140)
     at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:2120)
     at org.springframework.amqp.rabbit.core.RabbitAdmin.initialize(RabbitAdmin.java:604)
     at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.attemptDeclarations(AbstractMessageListenerContainer.java:1916)
     at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.redeclareElementsIfNecessary(AbstractMessageListenerContainer.java:1893)
     ... 3 common frames omitted
 Caused by: java.io.IOException: null
     at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:129)
     at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:125)
     at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:147)
     at com.rabbitmq.client.impl.ChannelN.queueDeclare(ChannelN.java:968)
     at com.rabbitmq.client.impl.ChannelN.queueDeclare(ChannelN.java:46)
     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
     at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$CachedChannelInvocationHandler.invoke(CachingConnectionFactory.java:1157)
     at com.sun.proxy.$Proxy145.queueDeclare(Unknown Source)
     at org.springframework.amqp.rabbit.core.RabbitAdmin.declareQueues(RabbitAdmin.java:709)
     at org.springframework.amqp.rabbit.core.RabbitAdmin.lambda$initialize$12(RabbitAdmin.java:606)
     at org.springframework.amqp.rabbit.core.RabbitTemplate.invokeAction(RabbitTemplate.java:2229)
     at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:2188)
     ... 8 common frames omitted
 Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - inequivalent arg 'x-queue-type' for queue 'myqueue' in vh
     at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66)
     at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)
     at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:502)
     at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:293)
     at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:141)
     ... 20 common frames omitted
 Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - inequivalent arg 'x-queue-type' for queue 'myqueue' in vh
     at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:517)
     at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:341)
     at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:182)
     at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:114)
     at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:739)
     at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:47)
     at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:666)
     ... 1 common frames omitted

I guessed that there was a type mismatch, so I deleted the queues on Rabbit, restarted the service, and the queues were recreated as quorum queues. With quorum queues, when I killed the cluster, the services reconnected with no problem. It was promising, however, the secondary node of the cluster never became the leader for my queue, and that sounded like a problem.

I tried to spin a third instance of RabbitMQ because clusters should always have an odd number of nodes. The new instance joined the cluster that immediately elected queue leaders as expected.

Everytime I killed a node, the applications reconnected to another node, killing the leader led to the election of a new one, and even killing the entire cluster was handled correctly with infinite reconnection attempts.

Problem solved.

Tags: RabbitMQ