The Mule is, by nature, a messenger; it carries messages (or information) from one isolated system to another…
… and we all know what happened to the messenger in ye olde days!
The messenger always got shot when delivering bad news.
The modern IT landscape is not very different. Here’s how things generally work when errors occur.
The Problem
The Mule receives an inbound message. It starts delivering that message to the target system(s). During this process, any one of the following events might occur:
- A target endpoint is down; message undeliverable
- Target endpoint receives message & sends an error response
- Source system created an invalid record because of faulty data
- An interim API could not match the record; required data now missing
- An external API made a minor change & now starts rejecting previously valid requests
- Client Credentials expire for an external API
- Target system runs out of disk space & can’t process any more messages
- Target database throws a data truncated error (Database structural change)
- Network error occurred
- Target system running hot & generates response timeouts
… and the list goes on.
In every one of the above scenarios, a failure occurred due to an issue outside of the Mule platform and beyond the Mule’s control.
When this happens, the Mule reacts! It logs the error and, if the design / implementation team built anything more sophisticated, the Mule application deals with the error in some way. It might route the failure to a Dead-Letter Queue on a Message Broker, or it might send an error notification of some sort.
In short, the Mule sends a message to its stakeholders that something went wrong.
When this happens, the relevant stakeholders receive the bad news. They respond with varying degrees of angst and rage. Who brought them this message? Their messenger — the Mule.
And, so, the messenger gets shot!
They begin to ask questions like, ‘Is Mule a up to the job?’
Or they make statements like, ‘Mule is always failing. Is this product really stable enough for an enterprise environment?’
This is unfair in the extreme. Consider the above list of fairly common errors. In these cases, the Mule neither caused the error nor had the ability to correct the problem on external systems over which it had zero control. The Mule did its job and reported the issue.
But… the messenger always gets shot!
The Solution Conundrum
There is one important, and all too common, variation on the above scenario.
Very often an external application issue can be resolved by the Mule application interacting with it.
For instance, one of my projects uncovered a bug in an external application; external API was unable to insert an employee record with isActive flag set to false. There was no good reason why this should not be possible on said external API. Furthermore, it was a vital project requirement.
To fix this on the external application would take months and cause critical project delays.
So we implemented a simple solution in Mule System API; always set isActive to true on the initial insert and then, if false, send an immediate update request & change the value. This solved the problem and produced the required outcome at a stroke — and it took only a few minutes to implement.
This type of situation is quite common on integration projects. Ironically, it usually back-fires on the Mule development team. Instead of singing the Mule’s praises for being the ONLY component able to resolve the issue, non-technical users usually interpret this the other way around and blame the Mule for the problem in the first place.
By non-technical reasoning, if the problem was fixed in Mule, then clearly it must have been a Mule problem to begin with. So, instead of thanking the Integration team, stakeholders heap blame on them (and Mule) for the issue.
The Solution
This is why we need to protect our Mule solutions with a bulletproof vest.
One way to achieve this is to be proactive and maintain an Errors List for each project. Excel or Google Sheets are great tools for this. Your list might contain a headline, a description, a workaround and solution column. Frankly, you can make this table as comprehensive as you like.
The crucial field, however, is the Error Root Cause field. This should be a defined list of values like the following:
- Requirements Change Request
- New Use-Case
- Invalid Data
- Environment Issue
- Connectivity Issue
- External Application Issue
- Mule App Bug
- Mule App Config Issue
- Mule Platform Issue
Every issue raised must be assigned to one of the Root Causes in your list. From there, you can create a chart that shows how many instances of each Root Cause occurred or have been identified. This might produce a chart like the one below.
One important point to note; start this early in your project and make stakeholders aware of this report before you begin any testing. If you are reactive and produce this report after the accusations begin to fly, you will simply look defensive. Be proactive. If your stakeholders are aware of this report before testing starts, and if they receive regular updates, this report will make it very clear to non-technical users what the root cause of each issue is, and whether the problem is with a given application, the environment, data or within the messenger (Mule) itself.
We can’t blame non-technical users when they don’t understand the complexities of our solutions. All they see is a bunch of errors — every one of them reported by Mule. In their eyes, that usually means the error occurred within Mule. How can they see it any other way, unless we are able to explain it to them?
I would submit that the vast majority of the projects on which I have worked look fairly similar to the chart above. Loads of data-related, external API & Network errors versus very few actual Mule issues (after initial code test & debug cycles, of course).
It must be said; if an issue is found in your Mule application or on the Platform, own it! Furthermore, if there is a way to resolve the issue in a way that is architecturally sound, fix it. This is not about finger-pointing or mud-slinging. If your Mule application is not performing as it should, take responsibility and figure out how to fix the problem. A true whisperer is never afraid to take responsibility for something over which they have control. A true whisperer will go back to the drawing board and find the solution.
However, for the most part, this simple tool will save you a world of pain when you enter the System / User Acceptance Testing phase of your project. This is the time when project managers become jittery and when deadlines begin to slip. This is the time when accusations will be levelled and when stakeholders will start asking hard questions about the products chosen for their particular solution. This is the time when you need a user-friendly chart to explain to stakeholders in simple terms where the issues have actually occurred.
This is a bulletproof vest that will divert stakeholders from the messenger and focus their attention on the actual root cause for every issue. It will make for happier stakeholders and developers alike.