Let’s look at the specific scenario:
- The first logic app reads files from blob storage (triggered by EventGrid). The files are debatched and messages are sent to the EventGrid one by one.
- The second logic app reads messages from the EventGrid and sends these messages to Azure Service Bus after transforming them to a common data format.
- The third logic app reads the messages from the servicebus using peek-lock. Messages are either completed or send to the servicebus again using deferred or scheduled messaging.
The first two logic apps seemed to work correctly, the third logic app showed strange behavior when sending larger message quantities. After sending 3000 messages to the ServiceBus and we actually saw the first 2800 messages being processed like a baby. But then, for the last 200 records we saw some sort of dropping behavior. The logic app ran at very irregular intervals, processing one message at a time. Some times it processed one message per minute, sometimes five messages per minute, sometimes one message in two minutes. Very strange behavior indeed. Most probably this behavior was caused by some sort of retry mechanism kicking in.
Anyhow, we continued searching. In the end it turned out we ran into all sorts of Azure limitations. It’s very hard to pinpoint the exact problem, but it’d good to refer to the following link.
The first problem was with the second logic app receiving messages from the EventGrid. If you look at the trigger history of the second logic app, it seems like all triggers are being processed. If you look more carefully, you will notice the trigger end time will increase to multiple minutes trough-put instead of just a few seconds. So, the assumption that the first two logic apps were running correctly, was actually wrong. Larger amounts of messages quickly lead to an overload of messages. In other words, you will have to dispense the message load. You can do this by replacing the EventGrid (push-push) with Azure Service Bus (push-pull). The first logic app sends messages to the servicebus, the second logic app reads messages from the servicebus in a loop construct. This way we can prevent 3000 concurrent logic app runs from being triggered at once via the EventGrid. This in turn will also prevent overflooding of the third logic app.
The third logic app makes service calls via a Http Action. Here we run into a limit of 2500 concurrent outgoing calls. Initially we had a servicebus trigger running every minute. This construct was replaced by an EventGrid trigger on every new ServiceBus message. This trigger was again followed by a loop construct processing 50 batches of 20 messages.
Problem solved with acceptable performance, but a lot of extra work. In my opinion, this is quite a disqualifier for the EventGrid solution. Carefully look at the scenario and the type of messages sent (data messages or event messages) before opting to use Azure EventGrid.
Side note. You can check the logic app for throttling behavior by going to the Metrics section. There you can select metric Trigger Throttled Events, Action Throttled Events or Run Throttled Events. You will now see throttling behavior via a visual graph representation.