Tuesday, September 2, 2025

#1083 Monitoring Integration Processing Time Statistics


Introduction

I originally began this post with the goal of explaining best OIC users can go about monitoring asynchronous flows. However, it soon became clear to me that most of what I cover equally applies to synchronous flows.

Monitoring, or Observability as we term it in OIC, covers a large area. However, at the end of the day, OIC users have concrete questions they want answered; questions such as -
  • What is the average processing time for my async integration createOrder?
  • When do I see peaks?
  • What else is happening when those peaks occur?
  • Anomalies? e.g. scheduled job usually takes 30 minutes, now it's taking 1 hour.
  • I don't sit in front of a monitoring dashboard all day, so can I be alerted to such?  
The above is not an exhaustive list, so just see it as a starting point for this post. 

OCI Log Analytics nicely complements OIC Observability, and I will be covering how we can leverage both of these tools to answer such questions.

OIC Observability

The starting point, as usual, is a very simple integration -

async-processOrder is a dummy integration that does nothing more than wait a certain amount of seconds, before completing. The request payload field - waitInSecs - controls how long we wait.



I've ran 11 flows with different values for waitInSecs - let's get an overview.

Note the default view, highlighting min, max, std. deviation and mean - 

I can un-check/check as required -

Note also the support for percentiles - 

I can also see the flow count breakdown over time - 


Now let's do a load test - starting point -

I run a load test from SOAP-UI - and see the async queue building up


Test completes, but still requests queued -

Let's look at the figures above -

  • Received is 1967
  • Processed is 1773
  • Succeeded is 1773
That means 194 flows are either in progress or queued. 

I check my async concurrency limit in the OIC Observability Dashboard - 

As you can see, I have a limit of 50. This can, of course, be increased by adding more message packs to this instance.

So from the 194 "open" flows, I can safely say that at least 144 are still queued.

I run a couple of more load tests and review the graph again - 

Note the max execution time here of ca 48 seconds - for the load test, the async integration has been invoked with the following payload - 

All the integration does is execute the Wait action, this load test had the request field waitInSecs set to 30. So we can say that this flow was in the queue for ca. 18 seconds, before being popped.
Earlier tests had a lower value for this field, thus the variation in the graph.

Naturally your integrations will contain orchestration logic along with invokes of services etc., so such statements won't be possible for you. 

But I hope you get the idea. 

 

Now to the final widget on the page - 

I've highlighted an icon on the left -

This can be dragged to decrease/increase the time interval - e.g. default view is for 1 day and I want to focus in on a particular part of that day. 




OCI Log Analytics

I have posted many time on OCI Log Analytics, I've even dropped referring to it as OCI Logging Analytics!

This time we will look at getting insight into the time taken to execute flows over a certain time window.

I execute the following integration - 


I ran with waitInSecs set to 30, 20 & 10 seconds.




Now to Log Explorer in OCI Log Analytics -
  
What we're looking at here is the result of a Time Taken Analysis query.

Here is the actual query I used and a BIG THANKS to my OCI Log analytics colleague, Sreeji, for this!

'Log Source' = 'OCI Integration Activity Stream Logs' and Integration = ASYNCWITHWAIT | link 'OPC Request ID' | eventstats distinctcount(Instance) as Instances, distinctcount(Identifier) as Integrations, distinctcount('User ID') as Users, distinctcount('OCI Resource Name') as Environments | stats unique(Instance) as Instance, unique(Identifier) as 'Integration Id', unique(Endpoint) as Endpoint, unique('User ID') as User, unique('OCI Resource Name') as 'OCI Env' | extract field = 'Integration Id' '(?P<Integration>[^!]*)' | rename 'Group Duration' as 'Time Taken' | classify 'Start Time', 'Integration Id', Integration, 'Time Taken' as 'Flows Execution Time Analysis' | fields target = ui -Instances, -Integrations, -Users, -Environments

Now this complex query will be productized in an out of the box dashboard, but, of course, you can start using it now.

I run the integration again, this time specifying a wait of 120 seconds -







I now run another integration validateOrder 3 times - this is a sync integration.

I also run my async integration twice -

In Log Explorer, I delete the integration filter - 

e.g.

'Log Source' = 'OCI Integration Activity Stream Logs' and Integration = ASYNCWITHWAIT | link 'OPC Request ID' | eventstats ...
to 
'Log Source' = 'OCI Integration Activity Stream Logs' | link 'OPC Request ID' | eventstats ...

I run the query and now see the stats for both integrations - 

Summa Summarum

As the 19th century  American humourist, Seba Smith, was wont to say - " there are more ways than 1 to skin a cat" - rather morbid, but do check out her short story - The Money Diggers. Likewise, there are many ways to monitor/observe what's going on in OIC. I mention the following tools, when discussing this topic with customers -

  • OIC Observability
  • OCI Service Metrics for Integration
  • OCI Logging
  • OCI Dashboards
  • OCI Alarms
  • OCI Log Analytics

Here we looked at processing time statistics monitoring using OIC Observability and OCI Log Analytics. 

Please note, this post is not an exhaustive discourse on OIC monitoring, but I do hope it helps answer some of the questions posed in the introduction. 

No comments: