A recent use case was for an insurance company based in the United States. For that, we were recording or collecting the data from the insurance brokers who used to fill their documents. We had to find a few segments on the basis of them. We were collecting the data and confirming whether those brokers were coming from an authentic source. They had a stamp or a legal insurance number, and we were maintaining a few dictionaries containing the images of their signatures. Once we received a document from a broker, we passed the whole document into different segments, and then we just validated the signature part to see if it was coming from an authentic source. We validated that the signature and the image looked similar, and there was at least 80% similarity.
We were extracting the IPIN number from the Microsoft Intelligent OCR. We were able to extract almost 85% to 90% of the numbers. It contained digits that were being imposed on a stamp that we had provided to them, so there was less complexity because there was less human intervention. They were not manually writing those numbers where it could be a bit difficult for us to diagnose whether it was a four or a nine. With a digitized number imposed on the stamp, it was a bit easier for us to read it out. This is the use case that we just finished and deployed, and it is processing 150 to 230 requests on a daily basis.
I have mostly been automating banking, financial services, and insurance (BFSI) processes.
With Document Understanding, we have been able to process both structured and unstructured documents. It does not matter whether a document is structured or unstructured. The only thing is that data should be concise, and it should be constant. If we are getting 70% unstructured data and 30% structured data, we are good to go, but we should be aware of how much structured and unstructured data we are getting. If we get a picture, then based on that, we serialize them. It is either a standardized process, or we have to use some APIs or some logic to make it structured. We initially filter out based on the picture view. If the visibility of the data is less than 45% or 65%, it means that the data is not as structured. We then move it to a different folder to process it later. If it is standard and structured, we process it immediately. We do not need to worry about the chunks. There is a positive output in our hands when we have achieved 45% or 65% of our target. We can then work on the remaining part to make it more centralized, so it is a bit easier for us.
With Document Understanding, we are able to handle things like varying document formats, handwriting, and signatures. The approach we take depends on the nature of the data that we are getting. For example, a requirement from the insurance company was to mandatorily verify whether the source is authentic or not. They had metrics at their end to say who were the legal brokers and who were not legal brokers. It was not challenging for us there to extract that data from their backend because they already had all the information. We just used their APIs. We just read the data out and compared the data from there.
In terms of human validation required for Document Understanding output, we needed to finalize if the data coming from Document Understanding was correct or not. If it was not correct, we moved it to the process folder. As we marked it as incorrect, it asked us the exact location that we were looking for to get, for example, the grand total. We defined that, and then it got stored in its knowledge base system, and then it got processed. It can be processed as an attended bot or as an unattended bot. It totally depends on how much data or knowledge it has been gaining from humans, and day by day, with more knowledge, it becomes more capable of processing the data independently.
The average handle time depends on the number of cores that the operating system has. If you have 14 to 16 cores CPU in your machine, 3 minutes would be required to process a 3 MB file. It also depends on the number of pages or the complexity. If data visibility is clear and the page number is not more than five, it can process the file in 3 minutes.
After automating the process with Document Understanding, it takes two minutes to process a single PDF. I do not have the exact data of how much time humans used to take. They were probably putting in nine hours per day, and after automating the process with Document Understanding, they are putting in two hours per day, so they are saving seven hours per day. Monthly, there is a saving of 150 hours.
In terms of error reduction, in the beginning, we were getting a lot of machine errors, but as the process got smoother and the knowledge base system stabilized, the machine errors reduced, and the human errors also reduced.
Document Understanding helped free up the client's staff’s time for other projects. Before automation, they had seven people on their team, and after automating the process, they cut their budget and reduced the manpower from seven to four. They were able to free three staff members for other projects. They saved 35% to 45% of manpower.
Document Understanding has better machine learning or ML capabilities, and that is why I prefer Document Understanding.
It would be much easier if UiPath increased the count of pages. Currently, they are allowing one million pages for $10,000 per month. I would prefer to increase the page count or reduce the dollar count in terms of processing the documents. I would prefer $6,000 per month for processing 2 to 3 million pages per month. It will then be much easier for companies with a low budget to use this product.
I have been using UiPath Document Understanding for more than two years.
It is stable. They always come up with a proper and stable approach.
It is scalable. If they increase the page count or file count, our solution will not have any issues, and it will process them. The more you train the bots, the more the efficiency of the processes.
They were helpful. If you have a paid license key, they will help you a lot.
I have worked with IQ Bot, but as Document Understanding got more stabilized and more well-known in the market, I started to move from IQ Bot to Document Understanding. I used IQ Bot when Document Understanding was not there. In 2021, when UiPath came out with the Document Understanding solution, I left IQ Bot behind and started developing my skills in Document Understanding. I have expertise in Document Understanding and IQ Bot. Document Understanding has better ML capabilities, so I prefer Document Understanding.
My whole six years of development experience is in the BFSI sector. I did only one retail sector project, but for that, we did not use UiPath Document Understanding. We used Magic OCR, which is not a Document Understanding or IQ Bot model. Those who are not willing to invest that much amount in UiPath or Automation Anywhere prefer to automate by using some open APIs. We used Magic OCR to scale the picture into a proper frame. We used to scale them as per our dimension or as per our frame, and then we used to perform all those activities that were required. If they came up with a cash memo, we had defined a few parameters for the grand total, discount, advance payment, overdue payments, and so on.
UiPath provides two options: the first one is a public cloud and the second one is on-premises. It is based on the package that you purchase from them. If you purchase the cloud version, then they will share with you the public cloud. If you go with the on-premises option, they will ask you to arrange a server. They deploy or install Orchestrator on the IIS server, and from there, we operate it.
We are using it on the cloud because AI fabric and lots of functionality are available on the cloud. Our cloud provider is Microsoft Azure.
The deployment process depends on the approach or SOPs of the company. The company I have been working with recently has its own DevOps team, but one of the companies I have worked with did not believe in the DevOps part. The developers were the ones gathering the data, developing the requirements, and fulfilling those requirements by doing the development and then deploying it on the production. It depends on the company model. I have worked on both scenarios, and there was not much issue with the deployment of the Document Understanding model. It is already based on the package. We added that package and then directly deployed it on Orchestrator. From Orchestrator, we operated them.
In terms of maintenance, it does not require any maintenance from our side.
The ROI is in terms of efficiency. There are time savings for humans and the accuracy of the results.
I would recommend Document Understanding. I prefer Document Understanding over IQ Bot as they have multiple flavors of machine learning models. If a person is capable, they can also easily achieve the same thing with programming.
I would rate Document Understanding an eight out of ten, but they can improve the costing part.