Introduction
Have you ever needed to extract and process data from complex content in any document format? Intelligent Document Processing, courtesy of OIC allows you to do so easily. Lots of the data we deal with today doesn't adhere nicely to xsds or JSON payloads. The data we are often interested in is embedded in documents and images - pdfs, jpegs etc. So we need to extract the salient details from such. Think of a sales person having a coffee with a customer, she takes a picture of the receipt - this being her input to expenses. OIC can take the pain out of this, converting unstructured data into business data. This is the basis for End to End Process Automation.
I will implement the following use case - sales person uploads expense documents to OCI Object Storage. Object Storage emits an event - which causes an integration to be triggered. This post will cover all the details of how that actually happens. Then the integration invokes an AI service that determines the type of document and extracts the key fields - e.g. the document is a receipt for 2 coffees at The Hare of The Dog Coffee Dock - only €4 each and these were Irish Coffees! The Integration then routes the receipt's salient data to the relevant downstream system, in our case an OIC business process, which allows the sales person's manager to approve the expense.
The high level technical flow of my demo is as follows -
OCI Object Storage - OCI Events - OCI Notification Service - OIC Integration - OCI AI - OIC Process
OCI Object Storage is the document store.
OCI Events Service is used to create a rule to publish new object events from Object Storage
OCI Notification Service - has the Topic to which the event is published. It also has a topic subscriber, which will be configured to invoke an OIC integration.
OIC Integration A will be invoked by the subscriber, it will invoke Integration B. The latter will retrieve the document from Object Storage and then invoke OCI AI Vision to analyze the document. The response will contain document type e.g. Receipt or Invoice. It will also include key fields such as items ordered etc.
OIC Process will be used to implement the Approval workflow.
Now alternative flows are possible, for example -
Document is stored in Content & Experience Cloud (CECS) - this triggers OIC Process (Process is bound to a CECS folder) - Process invokes OIC Integration - Integration retrieves the document from CECS and invokes OCI AI to analyze the document - result is returned to Process, which visualises the salient document data in a Process form for Approval. The process also has access to the original doc (as Process attachment).
However, this post will focus on OIC and the OCI Services mentioned above and, as regular readers of my blog know, my examples are always simple; I show you how to do it, and you can extrapolate. This post is no exception.
Here is the demo receipt I will be using -
No Irish Coffees here, unfortunately.
First stop is OCI Object Storage.
OCI Object Storage - create Bucket
Create a bucket for the documents -
OCI Notification Service - create Topic
Create a Topic for the new doc event.
OCI Events Service - create Rule
Create a rule for the new doc event - this defines what happens when a doc is uploaded to my OCI Object Storage bucket.
I can add more conditions to specify the compartment and bucket name.
I then specify the action to be executed, once the rule conditions have been met.
So that's clear enough I hope. We created a bucket for our documents, also a topic for publishing the event - new doc uploaded. We then created a rule joining the 2.
We can also view the payload that will be published to the topic -
Next step is to create a REST based integration to process the new doc event -
Oracle Integration - create Integration
Note, the Request payload is set to the example even payload above.
I simply log the event type -
All that's missing now is the Topic Subscription. This subscription will subscribe to the topic and then invoke the OIC integration we just created.
I copy the integration endpoint and return to OCI Notifications Service -
OCI Notification Service - create Subscription
Here is my integration endpoint https://myOICInstance/ic/api/integration/v1/flows/rest/AA_NOTIFICA_TRIGGER_DEMO/1.0/sub2Notification
I augment this with username and password. The format is as follows -
https://user:password@myOICInstance/ic/api/integration/v1/flows/rest/AA_NOTIFICA_TRIGGER_DEMO/1.0/sub2Notification
We need to confirm this subscription - i.e. Notification Service invokes the OIC endpoint, passing a confirmation url. We need to take this url and executed it in a browser.
The confirmation URL format is as follows -
https://yourNotificationService/20181201/subscriptions/subscriptionOCID/confirmation?token=yourToken&protocol=CUSTOM_HTTPS
I put this in my browser and get the subscription confirmation -
I check the subscription status -
Next step - let's publish a valid message to the Topic -
Note the event type value.
Final test in this section - let's upload the receipt to the bucket and then check out the request payload of the triggered integration.
I navigate to OCI Object Storage and do the final step here - configure the bucket to emit events -
Now I upload my receipt -
I check out the payload -
Note all the salient information - doc name, bucket name, compartment name etc.
Now to OCI AI Service - in this post I will be leveraging the Vision AI Service -
OCI AI - Vision AI Service
I test to my receipt - btw. this is also the demo available in Document AI
AI analyzes the doc - sees it's a receipt and also parses the salient fields -
This is invoking the analyzeDocument REST api and, as you can see, I can view the request and response payloads. These can be used when defining the REST invoke in my OIC integration.
The OCI Vision REST API docs are available
here
OIC - create Integration to invoke Vision API and parse result
The integration has a REST trigger with the following Request Payload -
{"compartment":"c1",
"bucket":"b1",
"doc": "d1"}
Next step is to retrieve the doc from Object Storage -
The 3 request fields being used in the mapping.
Once we have the doc, invoke the analyzeDocument api from AI Vision -
Then process the response -
Remember I mentioned the Response format is available in the OCI AI Vision - Document tester -
So this is what I need to parse.
The "interesting" data in in the documentFields structure -
The line items - coffee and 2 waters are found in the Line_Item_Group structure -
So my OIC integration has a couple of For-Each actions to access and parse these values.
I test the integration -
The Response -
Final step is just to have the subscribing integration invoke this one.
Summary
Part 1 has covered automated document retrieval, classification and data extraction. In part 2 we will look at the approvals business process. OIC, in concert with other OCI Services, makes it very easy to implement End to End Business Process Automation. The OCI AI Service supports many document types, and also makes it very easy to create custom models for your specific document types - subject matter for a separate blog post. Such Intelligent Document Processing use cases reinforce the business value-add of the OIC and the OCI ecosystem, remember, it's almost always not just integration!
No comments:
Post a Comment