Friday, October 31, 2025

#1100 - Log Analytics alerting on Aborted OIC Instances

Introduction

An OIC customer wants to be informed, when an OIC instance aborts. This leads on from one of my recent posts on monitoring aborted instances. Please recognise the difference between flows that fail due to errors and those that abort. Here we are concerned purely with the latter.

I covered the creation of an OIC Errors Dashboard in the previous post. It contained the following widget, among many others - 




 

In this post I cover triggering an alarm from OCI Log Analytics, when an integration flow aborts.

Alarms in OCI Log Analytics

I begin with the activity stream log in Log Analytics. I can access this via Administration - Sources - 

Here I click Customize

and then click on Labels - Add Conditional Label -

As you can see, I added the following - 


I now click on Detection Rules

Note the use of the conditional label. 

Here I specified my own metric namespace, resource group and name. Also note the dimensions I have included. Here I added Integration, Identifier and Project Id.

The context sensitive menu for my detection rules contains the following option - Create Alarm, which I click.


  
The query is - 
new-oic-instance-aborted-metric[5m]
{rule_ocid = "ocid1.loganalyticsingesttimerule.myOcIDg
"}.grouping().sum()


This I amend to - 

new-oic-instance-aborted-metric[1m]{rule_ocid = "ocid1.loganalyticsingesttimerule.myOCID", Integration =~ "*", Identifier =~ "*", Project_ID =~ "*"}.sum() > 0

I need to include the dimensions, so they will be added to the alarm email.

I complete the Alarm, giving it a name etc. -











I also define the email body and set the severity to critical -















Let's test it out. I'll run an async integration and then abort it.











My Dashboard -









Now to the Alarm - I can check the Alarm Data section - 

The alarm has fired, so I check my email - 

I can return to the Dashboard and punchout -

This brings me to Log Explorer, where I can see who actually aborted the flow - oh no! It as me!






Summa Summarum

This is an excellent feature in Log Analytics and a BIG thank you to my esteemed colleague Varun K., for pointing this out to me. You can, of course, extrapolate from this and create other alarms, based on whatever conditions you want. Remember the conditional label I created? It's really as simple as that.

 

 





 






 





Wednesday, October 29, 2025

#1099 - Detailed Error Dashboards in OCI Log Analytics

 

Introduction

I can easily create an OIC Errors dashboard, with OCI Log Analytics.

The hardest part is just coming up with the widgets to include. I decided on the following, but please extrapolate from my simple demo.

OIC Errors Breakdown  

I decided on the following widgets -
  • Aborted Instances by Integration
  • Error Breakdown by Action Type
  • Integrations with Faults
  • Connectivity Agent Errors by Action
  • Integrations with Assign Errors
  • Integrations with Mapping Errors
  • Invoke Errors by Action
  • Errored Sync Flows by Integration
Let's go through each of these...

Integrations with Mapping Errors

I'm starting with this widget, realising it's out of sequence; however, I use it as an example of how I crafted the query.

Here is the underlying activity stream log message -

"datetime": 1761724999408,
  "logContent": {
    "data": {
      "actionType": "Mapper",
      "errorId": "u1I_CLSdEfCkHu9h7DM6qA",
      "eventId": "u1I_B7SdEfCkHu9h7DM6qA",
      "executedTime": "2025-10-29T08:03:19.408Z",
      "instanceId": "tGUtBrSdEfCH0M9O6IAqnQ",
      "integrationFlowIdentifier": "READWRITELARGEFI_1!01.00.0000",
      "message": "Error processing message in Mapper  ,error details : XPath expression failed to execute, error summary : Error during evaluation of XPath \"ora:doXSLTransformForDoc('resources/processor_342/resourcegroup_345/req_34ea3884757a4ce1b4a650b2f7e42c8a.xsl', $messagecontext_18, 'getFileRefFromFS', $messagecontext_312)\" Error at line 28, column 206, <fault><requestId>...</requestId><errorType>InternalServiceLimitError</errorType><origin>stagefile-service-564cd4c67f-wlzbs</origin><errorCode>SF1001</errorCode><faultName>{http://schemas.oracle.com/bpel/extension}runtimeFault</faultName><retriable>false</retriable><reason>Error occurred while processing activity readContent. Error : File size 150.20 MB exceeds maximum threshold size of 10.00 MB.Please make sure that the file size does not exceed threshold</reason><details>File size 150.20 MB exceeds maximum threshold size of 10.00 MB.Please make sure that the file size does not exceed threshold</details></fault>",
      "opcRequestId":...",
      "parentEventId": "uzic8bSdEfCE4DFbv7Kdwg",
      "projectCode": "...",
      "userId": "niall.commiskey@oracle.com"
    },
    "id": "bcdefa02-b49d-11f0-b9ec-798ce9fc19fc",
    "oracle": {
      "compartmentid": "...",
      "ingestedtime": "2025-10-29T08:03:22.042Z",
      "loggroupid": "...",
      "logid": "...",
      "tenantid": "..."
    },
    "source": "...",
    "specversion": "1.0",
    "time": "2025-10-29T08:03:19.408Z",
    "type": "com.oraclecloud.integration.integrationinstance.activitystream"
  },
  "regionId": "us-phoenix-1"
}

Here is the Log Explorer query - 

'Log Source' = 'OCI Integration Activity Stream Logs' and Message like '%Error processing message in Mapper%' | timestats distinctcount(Instance) as 'Flow Instances with Mapping Errors', distinctcount(Instance) by Integration

The Log Explorer view is as follows - 

Note the Group By Integration - 

I added this field from the Other field list -

Invoke Errors by Action


The query here is - 'Log Source' = 'OCI Integration Activity Stream Logs' and Message like '%Error processing message in Invoke%' | timestats distinctcount(Instance) as 'Flow Instances with Invoke Errors', distinctcount(Instance) by Action

Note the group by Action.

So why am I not grouping on Integration? Because it's not included in the log message - 

{
  "datetime": 1761722207212,
  "logContent": {
    "data": {
      "actionName": "write2File",
      "actionType": "Invoke",
      "endPointConnectionId": "AA_LOCALFILE",
      "endPointName": "write2File",
      "endPointType": "file",
      "errorId": "OwtBybSXEfCwkOV7JiyKKQ",
      "eventId": "OwtByLSXEfCwkOV7JiyKKQ",
      "executedTime": "2025-10-29T07:16:47.212Z",
      "instanceId": "jPvHSrSWEfCH0M9O6IAqnQ",
      "message": "Error processing message in Invoke write2File ,error details : oracle.cloud.cpi.common.core.CpiException: No response received within response time out window. Connectivity Agent may not be running, or temporarily facing connectivity issues to Oracle Integration Cloud Service. Please check the health of the Connectivity Agent in Agent Monitoring page. For additional insight into request eed027e0-c1f2-49c8-bc16-59417b0e9593 processing refer to the Connectivity Agent logs., error summary : oracle.cloud.cpi.common.core.CpiException: No response received within response time out window. Connectivity Agent may not be running, or temporarily facing connectivity issues to Oracle Integration Cloud Service. Please check the health of the Connectivity Agent in Agent Monitoring page. For additional insight into request eed027e0-c1f2-49c8-bc16-59417b0e9593 processing refer to the Connectivity Agent logs.",
      "opcRequestId":..",
      "parentEventId": "jhou_7SWEfCkHu9h7DM6qA",
      "userId": "niall.commiskey@oracle.com"
    },
    "id": "3c54b135-b497-11f0-b9ec-798ce9fc19fc",
    "oracle": {... 

The log field, actionName, surfaces in log explorer as action, hence my use of it.
I could also group by connectionId(e.g. AA_LOCALFILE) or endPointType (e.g. file, ftp etc.). 
 

Aborted Instances by Integration

Maybe check out my post on monitoring aborted instances, before continuing.

Underlying query is - 

'Log Source' = 'OCI Integration Activity Stream Logs' and 'Action Type' in (Abort) | timestats distinctcount(Instance) as 'Aborted Flow Instances', distinctcount(Instance) by Integration

Error Breakdown by Action Type

Underlying query is -
'Log Source' = 'OCI Integration Activity Stream Logs' and Message like '%Error%' | timestats distinctcount(Instance) as 'Flow Instances with Assignment Errors', distinctcount(Instance) by 'Action Type'

A couple of words on 'receive' - the errors here are referring to sync integrations and the return on retrieve.

Integrations with Faults   

Here I am referring to execution of the Throw New Fault action in integrations.

Underlying query is - 
'Log Source' = 'OCI Integration Activity Stream Logs' and 'Action Type' in (Raise_new_error) | timestats distinctcount(Instance) as 'Flow Instances with Faults', distinctcount(Instance) by Integration

Connectivity Agent Errors by Action

Underlying query is - 
'Log Source' = 'OCI Integration Activity Stream Logs' and Message like '%Error%Connectivity Agent may not be running%' | timestats distinctcount(Instance) as 'Flow Instances with Connectivity Agent Errors', distinctcount(Instance) by Action

Integrations with Assign Errors

Underlying query is -
'Log Source' = 'OCI Integration Activity Stream Logs' and Message like '%Error processing message in Assignment%' | timestats distinctcount(Instance) as 'Flow Instances with Assignment Errors', distinctcount(Instance) by Integration

Errored Sync Flows by Integration

Underlying query is - 
'Log Source' = 'OCI Integration Activity Stream Logs' and Message like '%Error in reply to%' and 'Action Type' = receive | timestats distinctcount(Instance) as 'Sync Flows with Errors', distinctcount(Instance) by Integration

Summa Summarum

The above are just some examples of what you can create in OCI Log Analytics. I hope they are of some use to you, and also provide you the basis for creating your own.

Happy Monitoring!









Tuesday, October 28, 2025

#1098 - OIC Monitoring - checking for aborted instances

Introduction 

OIC flow instances can abort for the following reasons -

  • timeout - e.g. async flow taking longer than 6 hours.
  • flow aborted by OIC admin.

So how can I see such instances?

OIC Observability

Set the status filter to Aborted to just see aborted instances.

OIC Factory API

https://design.integration.us-phoenix-1.ocp.oraclecloud.com/ic/api/integration/v1/monitoring/instances?integrationInstance=yourOICInstance&q={timewindow: '1h', status:'ABORTED'}

{
    "dataFetchTime": "2025-10-28T07:44:29.235+0000",
    "hasMore": false,
    "id": "instances",
    "items": [
        {
            "creationDate": "2025-10-28T07:44:06.204+0000",
            "date": "2025-10-28T07:44:21.311+0000",
            "duration": 109,
            "fifo": false,
            "flowType": "ASYNC_ONE_WAY",
            "hasRecoverableFaults": false,
            "id": "4cIoOrPREfCH0M9O6IAqnQ",
            "instanceId": "4cIoOrPREfCH0M9O6IAqnQ",
            "instanceReportingLevel": "Production",
            "integration": "ASYNCTEST|01.00.0000|AA_PROJECT2",
            "integrationId": "ASYNCTEST",
            "integrationName": "asyncTest",
            "integrationVersion": "01.00.0000",
            "invokedBy": "niall.commiskey@oracle.com",
            "isDataAccurate": true,
            "isLitmusFlow": false,
            "isLitmusSupported": false,
            "isPurged": false,
            "lastTrackedTime": "2025-10-28T07:44:21.311+0000",
            "links": [
                {
                    "href": "https://.../ic/api/integration/v1/monitoring/instances/4cIoOrPREfCH0M9O6IAqnQ?integrationInstance=....",
                    "rel": "self"
                },
                {
                    "href": "https://.../ic/api/integration/v1/monitoring/instances/4cIoOrPREfCH0M9O6IAqnQ?integrationInstance=....",
                    "rel": "canonical"
                }
            ],
            "litmusResultStatus": "",
            "mepType": "ASYNC_ONE_WAY",
            "nonScheduleAsync": true,
            "opcRequestId": "5OLUCU9R8EPMK53I4O9J4P9OHQS3WWSI/X3PTAN6JYB3GKUVWIYHYWMYVOJBRBIFG/1Y53WJY8OE8WF8TYR5X7GSR8PWXE27VM",
            "outboundQueueNames": [],
            "processingEndDate": "2025-10-28T07:44:06.671+0000",
            "projectCode": "AA_PROJECT2",
            "projectFound": true,
            "projectName": "AA-Project2",
            "receivedDate": "2025-10-28T07:44:06.562+0000",
            "replayable": false,
            "replayed": false,
            "status": "ABORTED",
            "trackings": [
                {
                    "name": "orderNr",
                    "primary": true,
                    "value": "123"
                }
            ]
        }
    ],
    "links": [
        {
            "href": "https://.../ic/api/integration/v1/monitoring/instances?integrationInstance=....",
            "rel": "self"
        },
        {
            "href": "https://.../ic/api/integration/v1/monitoring/instances?integrationInstance=....",
            "rel": "canonical"
        }
    ],
    "totalRecordsCount": 1,
    "totalResults": 1
}

The above response is for an async flow I aborted. Note the following - 
                      "hasRecoverableFaults": false,
  "mepType": "ASYNC_ONE_WAY",
            "nonScheduleAsync": true,
            ...
  "replayable": false,
            "replayed": false,
            "status": "ABORTED",
            "trackings": [
                {
                    "name": "orderNr",
                    "primary": true,
                    "value": "123"
                }
Aborted flows cannot be recovered (re-submitted). Note the message exchange pattern value, it is set to "ASYNC_ONE_WAY". Also note the tracking fields. 

Now to scheduled integrations - here is an extract from the response - 

        "mepType": "SCHEDULED",
            "nonScheduleAsync": false,
            "opcRequestId": "oci-...",
            "outboundQueueNames": [],
            "processingEndDate": "2025-10-28T09:56:35.832+0000",
            "projectCode": "AA_PROJECT2",
            "projectFound": true,
            "projectName": "AA-Project2",
            "receivedDate": "2025-10-28T09:56:35.750+0000",
            "replayable": false,
            "replayed": false,
            "status": "ABORTED",
            "trackings": [
                {
                    "name": "param_dateLastRun",
                    "primary": true,
                    "value": "2025-10-27T00:00:00.000Z"
                }
            ]

Note the message exchange pattern value, it is set to "SCHEDULED". Also note the tracking fields.
I've used a scheduled parameter, that holds the value of data last run as my primary tracking field. 

OCI Logging

This is the relevant activity stream log entry in OCI Logging - 


{
  "datetime": 1761645414299,
  "logContent": {
    "data": {
      "actionName": "ABORT",
      "actionType": "Abort",
      "eventId": "buZXXrPkEfCgjGnZvXiYSA",
      "executedTime": "2025-10-28T09:56:54.299Z",
      "instanceId": "Y7JMh7PkEfConmFtqrFmpg",
      "integrationFlowIdentifier": "LOADNEWORDERS!01.00.0000",
      "message": "Instance aborted by niall.commiskey@oracle.com  , Source: User , Reason: Abort the in-progress/recoverable instances",
      ...
      "parentEventId": "ALAFIN",
      "projectCode": "AA_PROJECT2",
      "userId": "niall.commiskey@oracle.com"
    },
    "id": "702f03b0-b3e4-11f0-a6d2-ff2cb7d37a5c",
    "oracle": {
    ...
    "specversion": "1.0",
    "time": "2025-10-28T09:56:54.299Z",
    "type": "com.oraclecloud.integration.integrationinstance.activitystream"
  },
  "regionId": "us-phoenix-1"
}

Note the message value - "Instance aborted by niall.commiskey@oracle.com  ,
Source: User ,
Reason: Abort the in-progress/recoverable instances",
     
Also note the actionName & actionType values. I can use these in a search filter -












I can save the search - 











I can use the saved search as the basis for an OCI Dashboard widget - 












Granted, this is of limited use, as it does not include the name/id of the relevant integration.

However, I can click the open with logging search link -











Still not optimal, 
so let's look at OCI Log Analytics -

OCI Log Analytics

Let's try out the following query in Log Explorer -

'Log Source' = 'OCI Integration Activity Stream Logs' and 'Action Type' in (Abort) | stats distinctcount(Instance) as 'Aborted Flow Instances', trend(distinctcount(Instance))

I now group by Integration

This looks good - now to include this in a dashboard - 

Summa Summarum

AS my auld grandmother from the Monaghan/Armagh border used to say - there's many ways to skin a cat! 

I hope you find the observability options I've covered useful. Next step is to raise an alarm on aborted instances, the topic of a future post.