Wednesday, May 28, 2025

#1071 OIC invoking OCI AI Vision Service


Introduction

Yet another post in the OIC for OCI AI Services series. Today we're looking at AI Vision service. Firstly, what does this service offer?

Why begin with a picture of McSorleys? Because, we'll use this image in some of the following invokes to OCI AI Vision service.

What does AI Vision offer?


You can check out the OCI AI Vision home page here

Net, net the service offers the following -
  • Image Classification
  • Text Detection
  • Face Detection
  • Object Detection
  • Video Analysis

Let's look at the basic 3 steps when using AI Vision -

  • Ingesting data - e.g. images from object storage or anywhere. You can use OIC to pull in data from anywhere. We ship with the native action for OCI Object Storage as well as a plethora of adapters.
  • Understanding data - here's where AI Vision does it's magic, recognising images, parsing text etc. OIC can easily invoke AI Vision, this is what we'll cover today
  • Using the Intel - here we take the result(s) from AI Vision and use them in our business processes. OIC is THE business process automation toolkit, so let's kick the tyres!

OCI AI Vision 

Here is the Vision menu in OCI. 

Object Detection

This feature allows one to identify objects and their location within an image along with a confidence score.

I try it out - 

Now with a picture with more action in it - 

Yes, the above screenshot does not include the confidence values.

But you get the idea. I want to know what's going on in the image, AI Vision tells me that, assigning a degree of confidence to what it finds.

So how can we do this in OIC?

First thing I do is check out the python code - 

Now to the api docs for OCI AI Vision, here I find the API endpoints -

I'm in PX, so I will use - 

https://vision.aiservice.us-phoenix-1.oci.oraclecloud.com

 Now to the api for object detection -

post /20220125/actions/analyzeImage

The complete url -https://vision.aiservice.us-phoenix-1.oci.oraclecloud.com/20220125/actions/analyzeImage

Request Payload - the basic input here is the image for analysis.

Let's just go with OBJECT_DETECTION here.

The final request payload is as follows  

{
  "features": [
    {
      "featureType": "OBJECT_DETECTION"
    }
  ],
  "image": {
    "source": "INLINE",
    "data": "base64"
  },
  "compartmentId": "yourCompartment_ocid}}"
}

The response payload is as follows  -

{
  "imageObjects": [{
    "name": "Person",
    "confidence": 0.98758954,
    "boundingPolygon": {
      "normalizedVertices": [{
        "x": 0.6116622686386108,
        "y": 0.584307074546814
      }, {
        "x": 0.6986929178237915,
        "y": 0.584307074546814
      }, {
        "x": 0.6986929178237915,
        "y": 0.9633761644363403
      }, {
        "x": 0.6116622686386108,
        "y": 0.9633761644363403
      }]
    }
  }, {
    "name": "Chair",
    "confidence": 0.984481,
    "boundingPolygon": {
      "normalizedVertices": [{
        "x": 0.2508918046951294,
        "y": 0.7415730953216553
      }, {
        "x": 0.32072916626930237,
        "y": 0.7415730953216553
      }, {
        "x": 0.32072916626930237,
        "y": 0.9100103378295898
      }, {
        "x": 0.2508918046951294,
        "y": 0.9100103378295898
      }]
    }
  }, {
    "name": "Footwear",
    "confidence": 0.9828044,
    "boundingPolygon": {
      "normalizedVertices": [{
        "x": 0.5381702184677124,
        "y": 0.9290227890014648
      }, {
        "x": 0.5808274149894714,
        "y": 0.9290227890014648
      }, {
        "x": 0.5808274149894714,
        "y": 0.9576336741447449
      }, {
        "x": 0.5381702184677124,
        "y": 0.9576336741447449
      }]
    }
  }, {
    "name": "Person",
    "confidence": 0.9810399,
    "boundingPolygon": {
      "normalizedVertices": [{
        "x": 0.5125582814216614,
        "y": 0.5717782378196716
      }, {
        "x": 0.5918540954589844,
        "y": 0.5717782378196716
      }, {
        "x": 0.5918540954589844,
        "y": 0.9574788808822632
      }, {
        "x": 0.5125582814216614,
        "y": 0.9574788808822632
      }]
    }
  }, {
    "name": "Footwear",
    "confidence": 0.97873676,
    "boundingPolygon": {
      "normalizedVertices": [{
        "x": 0.5209354758262634,
        "y": 0.9121176600456238
      }, {
        "x": 0.5540853142738342,
        "y": 0.9121176600456238
      }, {
        "x": 0.5540853142738342,
        "y": 0.9327118396759033
      }, {
        "x": 0.5209354758262634,
        "y": 0.9327118396759033
      }]
    }
  }],
  "labels": null,
  "ontologyClasses": [{
    "name": "Chair",
    "parentNames": ["Furniture"],
    "synonymNames": []
  }, {
    "name": "Footwear",
    "parentNames": ["Clothing"],
    "synonymNames": []
  }, {
    "name": "Person",
    "parentNames": [],
    "synonymNames": []
  }, {
    "name": "Clothing",
    "parentNames": [],
    "synonymNames": []
  }, {
    "name": "Furniture",
    "parentNames": [],
    "synonymNames": []
  }],
  "imageText": null,
  "objectProposals": null,
  "detectedFaces": null,
  "detectedLicensePlates": null,
  "imageClassificationModelVersion": null,
  "objectDetectionModelVersion": "2.0.3",
  "textDetectionModelVersion": null,
  "objectProposalModelVersion": null,
  "faceDetectionModelVersion": null,
  "licensePlateDetectionModelVersion": null,
  "errors": []
}

I create the connection in OIC -

then on to the integration -

The AI Vision Invoke is configured as follows -

You've already seen the request and response payloads, so I'll skip them.

I only want to return a precis of the AI Vision response, so my trigger response has been defined as follows - 

{{
 "imageObjects" : [ {
    "name" : "Person",
    "confidence" : 0.98758954
  }, {
    "name" : "Chair",
    "confidence" : 0.984481
  } ],
  "ontologyClasses" : [ {
    "name" : "Chair",
    "parentNames" : [ "Furniture" ]
  }, {
    "name" : "Footwear",
    "parentNames" : [ "Clothing" ]
  } ]
}
 
I complete the mapping and test - 

Regarding the image I used -

Mc Sorley's is an institution in New York, the oldest pub in the city, in the hands of the Irish up til this very day. They only serve 2 types of beer, a dark beer, which is rather unpalatable and a lager, which is to everyone's taste. The beer is served in very small glasses, ergo, you don't order 1 you order 4 and if you're with me and the bauld Peter Meleady, 24.     

Image Classification

according to the docs - Image classification assigns classes and confidence scores based on the scene and contents of an image

So this is a subtle difference to the OBJECT_DETECTION feature, detailed above.

The tailored response to this api invoke is as follows - 

This invoke, as expected, does not return any X, Y co-ordinates.

Face Detection

As the name suggests, detects faces and their X, Y positions in the image.

Text Detection

Let's try this out in OCI - 

Looks good! Now to OIC -


Just to note here, we need to set the featureType=TEXT_DETECTION

Here's the Request payload for the api invoke -

{
  "features": [
    {
      "featureType": "TEXT_DETECTION"
    }
  ],
  "image": {
    "source": "INLINE",
    "data": "base64"
  },
  "compartmentId": "yourCompartment_ocid}}"
}
 
The Response payload I initially set to {}. I then run the integration in Debug mode, then copy and paste the json response shown in the activity stream.
























I configure the REST trigger to return only a subset of this data -

Video Analysis 


The video analysis includes - 

  • Label Detection
  • Object Detection
  • Text Detection
  • Face Detection

Summa Summarum

AI Vision is yet another cool AI service in the OCI stack. This post is just an introduction to the service, but I hope it has whetted your appetite!

Bon appetit!
  


 




No comments:

Post a Comment