Wednesday, June 12, 2024

#1018 - Monitoring scheduled jobs with OCI Service Metrics



Many customers make use of OIC scheduled integrations, often with tight schedules. Sometimes these schedules cannot be adhered to e.g. I have a job running every 3 minutes, importing  sales orders to Netsuite. Job1 starts at 09:00 am, job2 is scheduled to start at 09:03. However, job2 cannot start until job1 has completed. Job1 could be delayed for a variety of reasons - 

  • volume of data to be processed
  • complexity of the integration's orchestration logic
  • capacity of the downstream app (e.g. Netsuite) to process the data
Normally, the bottleneck is not OIC itself. However, there are still OIC limits that are applied to scheduled jobs. First port of call is the OIC3 Service Limits doc. Here you will see the following under concurrency -

Asynchronous concurrent executions: 50.
Asynchronous concurrent execution includes scheduled + triggered + connectivity agent.

So there is a limit here, however, compared to OIC gen2, which had stricter limits on scheduled jobs, this should not be an issue. Maybe though, just have it in the back of your mind, when planning your schedules.

As I have already mentioned, delays in the execution of scheduled jobs, may be caused by tight schedules and resource contention.  

The goal of this post is to see how we can monitor our scheduled jobs and isolate such issues. We have a plethora of tools at our disposal. 

OIC3 Observability allows us to monitor scheduled jobs -


OIC Observability also gives the duration view of the activity stream - letting us see, at a glance, which actions are taking the most time -


However, OIC does not currently provide an aggregated, time-based view of this data. An aggregated view of resources being used e.g. multiple scheduled jobs using the same adapter, would also be very useful.

So let's see how OCI Metrics can assist us here.
 
Simple use case here - 2 scheduled jobs that invoke an Oracle DB via the DB adapter.
The processing logic is banal, but will suffice for demo purposes -


































AA-DB-Job
has an integration property defined -  waitTimeInSecs controls the Wait action. This property is also passed to a DB stored procedure, via the Invoke action - the procedure just does a sleep.

 CREATE OR REPLACE PROCEDURE wait_a_while
(
   waitPeriod IN NUMBER
)
AS
BEGIN
  sys.dbms_session.sleep(waitPeriod);
  EXCEPTION
   WHEN VALUE_ERROR THEN
      dbms_output.put_line('Something went wrong');
END;
/

I clone this integration and rename the clone to BB-DB-Job.
Simple stuff!

I do 4 runs -
  • AA-DB-Job with property set to 10 secs
  • AA-DB-Job with property set to 20 secs
  • BB-DB-Job with property set to 20 secs
  • BB-DB-Job with property set to 30 secs - this I execute, while the previous job is still running.
Now to OCI Service Metrics - here is the first widget, based on the MessagesReceivedCount metric. 


Note the difference, when I set the interval to 5 minutes -

Takeaway here? The interval may be important to you. 

The next widget is based on the MessagesSuccessfulCount metric.

I now add a new scheduled job, CC-NoDB-Job, that just contains a Wait action. I execute this once.

Note I've also added a new widget - Number of DB Adapter Invokes

This is configured as follows -

Note the use of the dimension - adapterIdentifier.
 
The next widget looks at the durations of the DB invokes -

The scale is in msecs - 
remember my test runs? 
AA-DB-Job 10 secs and 20 secs, then
BB-DB-Job 20 secs and 30 secs.

I now run AA-DB-Job and BB-DB-Job again with waits of 60 and 50 secs respectively.

Here we see that both jobs have been started around the same time.

We also see concurrency at DB adapter invoke level - 









 


Finally I add a widget, based on a saved search of the OIC activity stream for my scheduled integrations.

This widget is based on the following query - 

search "yourActivityStream log group" |data.integrationFlowIdentifier='AA_DB_JOB!01.00.0000' or data.integrationFlowIdentifier='BB_DB_JOB!01.00.0000' or data.integrationFlowIdentifier='CC_NODB_JOB!01.00.0000' | sort by datetime desc



Summa, summarum - this dashboard gives us an overview of integrations running and their use of specific resources; in this case, the DB.

The top 2 widgets, per default,  will show all integrations, ergo, it includes more than scheduled jobs. However, this may be very useful when considering resource contention issues.

You can, however, use OCI Logging Analytics to create widgets based on the OIC activity stream and limit this to your scheduled jobs -

I have discussed OCI Logging Analytics in other posts. Please see those here.

The widget above is driven by the following query - 


'Log Source' = 'OCI Integration Activity Stream Logs' and 'OCI Resource Name' = yourOICOCID and Integration in (cc_nodb_job, aa_db_job, bb_db_job) | timestats distinctcount(Instance) as logrecords by Integration | sort -logrecords'

The syntax may look complex, but I built it by selecting fields from the Fields palette. I'll recreate the above query, step by step.

Select the Log Source - here I select the OIC activity stream.

 
Note the query has changed - 
I have 3 OIC instances pushing activity stream data to the same OCI Logging Analytics log group, but here, I just want to select a specific instance - 





I now want to limit the query to my scheduled integrations. Here I need to filter on them. The field I use is Integration.



The query is now set to - 'Log Source' = 'OCI Integration Activity Stream Logs' and 'OCI Resource Name' = myOICOCID and Integration in (cc_nodb_job, aa_db_job, bb_db_job) | stats count as logrecords by 'Log Source' | sort -logrecords

The default visualisation is Pie - 

Naturally, this is not very useful, except for telling me that instances of my 3 scheduled integrations generated a total of 194 log messages.

What I want to know is how many instances of each scheduled job executed in the specified timeframe.

Each instance flow has its own integration id, so what I need is a distinct count of these. Here I edit the query manually - 
replacing stats count as logrecords with timestats distinctcount(Instance) as logrecords by Integration. We've already met the Integration field, Instance is the field containing the instance id. 

The visualisation immediately changes to Records with Histogram


I could also choose the Pie visualisation -






Tuesday, June 4, 2024

#1017 OIC 24.06 Release New Features

The mid-summer release of OIC3 has a lot in it. So, without further ado, let's begin.

For me, the most compelling feature is the design time dependencies viewer we get in Projects.

DesignTime Dependencies

Note the new icon on the Projects page -

I have a couple of artefacts in this project, so let's view the dependencies - 

Note the filters available -


Clicking on Integrations, I see the following - 




I now click on the integration, AA-Subscribe2NewSR -

Clicking on Events

You get the idea!
Clicking on Connections, I see the following. As you can see, the REST connection, BB-REST-Trigger, is not being used. 


Net, net, very enlightening, even for this small project.

Observability

View activity stream by duration - 






Note the support now for rollups.

OIC Projects support for Events

As you have seen above, Pub/Sub support is now available within the context of a project.

OIC Projects - simplifying Deployment creation

With 24.06, we have improved the UI for creating deployments.






Service Limits Increase

Activated Integration limit increased from 700 to 800.


Active Integration Count by Projects






Debug Tracing reset after 24 hours


Setting trace to debug causes all actions and payloads to be logged. This is an expensive operation. Therefore, starting with 24.06, the trace level is reset from Debug to Production after 24 hours.

You can set the trace level to Audit, if they need to capture inbound payloads, etc.

B2B Export Design time Artefacts








OIC Logging

there are new fields in the OIC activity stream as well as new fields pushed to OCI Logging.

OIC activity stream
























I run the integration and download the activity stream from OIC.
























Note the net new field, billedMsgUnits. Also note the 2 replay fields - replayableFrom and replayableContent. The latter 2 are there because I checked Allow to run again.

I run the integration flow again - 



and check out the activity stream - 


Activity Stream in OCI Logging

There is a new field here - instanceCompletionState. The value, "ProcessCompleted" signals the flow completed successfully.

If there's an error - 

parentEventId is another new field surfaced in the activity stream pushed to OCI Logging -


the value -1, indicates this is the first event. 

Here's another example - 

The replayedParentInstance field, shown in a previous OIC Activity Stream screenshot, is also pushed to OCI Logging. The field name used in OCI Logging is replayedParentInstanceId. 

Having this in OCI Logging makes it very easy to create an OCI Dashboard widget based on this -

Clicking on Open with logging search -


and I see the salient details.
 
 

OIC Connectivity


Naturally, there are a plethora of new features in respect of Connectivity, please see the post here for details.