Correlation is an easy way to allow different systems to communicate with one another. Usually, we use a correlation ID as a common reference between multiple systems.

When a system completes a given task, it can notify other systems in the network which, in turn use the correlation ID to determine which of their tasks said notification relates to.

However, there are times when the humble correlation ID actually causes a solution to trip over its own boot-laces… 

One such instance occurred in a solution we produced for the broadcast industry. In this solution, editors would define time codes to indicate how a piece of raw content should be sliced and diced to produce the output for a given show. They would also specify the output format, a file name, and the list of source files from which to compile this output.

This done, they would generate a message from their system which our middleware would transform into a message that a Media Transcoder would understand. Our solution would generate the request to said Media Transcoder, specifying everything defined by the editors, and the Transcoder would get to work.

The Media Transcoder responded with a Job ID which we could use as a correlation ID; when the job completed, several hours later, the Transcoder would trigger an event into the system, specifying the Job ID, along with the output file name, the transcoded format and some key metadata, including the physical start and end timecodes, as defined by the editors.

This all went swimmingly, as long as jobs completed successfully. If a particular transcode job failed, the Media Transcoder would trigger a Failed Job event and our system would pick that up and notify other systems within the API network.

You would think that is a workable solution… and you would be wrong.

 

The Challenge failed Jobs

It turns out that these Transcode jobs can fail for any number of reasons. Furthermore, due to the high levels of urgency in the broadcast media business, there is a team of dedicated personnel who monitor these jobs and deal with failures as they occur.

So, when a job fails for some unexpected reason, one of these users quickly restarts the system, clears some disk space or does whatever else is required and then retries the job. One of the particular idiosyncrasies of the Media Transcode application is that it does not simply retry a job; it creates a duplicate, with a new job ID and then runs that job instead. The successful result is still the same. Only the Job ID is different.

This enables the Media Transcoder to maintain an audit trail of successful / failed jobs. 

However, you can see where we have a problem!

We now have a successful job with no corresponding correlation (i.e. Job) ID in the middleware or anywhere else in the API landscape. While the job relates to an expected task, initiated by the editors, the system has no way of correlating it back to the original Job ID, which remains listed as failed.

 

 

The Solution

 Instead of using the auto-generated Job ID, which is too brittle in the given landscape, we elected to generate an MD5 Hash from the key metadata passed into or out of the Media Transcoder, including:

  • Output File Location
  • Output Transcode Format
  • Output Start and End Time Codes

This allowed us to generate a unique Correlation ID when initiating each Transcode job. And then whenever a Transcode job completed, we were able to recreate that Correlation ID from the output metadata provided by the Transcoder, no matter which job actually produced the required output file.

In essence, we created a unique signature for each job that could be recreated on completion to identify each job.

This allowed the Media Transcode team to operate as they have always done while still enabling the new integration requirements for the business.

Building your Correlation ID from a job signature is a useful and robust way of managing correlation in an integration environment that relies on external systems and processes over which you have no control.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>