What is our primary use case?
We are a logistics company, and, due to the fact that a lot of documents get generated on a daily basis, documents need to be processed and data extracted. That's our primary use case for this solution. We use it for a lot of data extraction.
What is most valuable?
The fact that IBM Datacap uses ABBYY's core engine for OCR, is quite useful for us. It makes them one of the key players as far as OCR extraction is concerned.
The fact that they've packaged it very well in the sense that you have these field extraction batches which enable you to auto claim the documents, and you have these templates which are structurally laid out, which enables you to mark a specific zone from which data can be extracted.
It saves us a lot of manual work.
The automation on offer is good.
Technical support, for the most part, is helpful.
What needs improvement?
A lot of things could be improved, especially if you look at the number of support tickets that we have created. Looking at those would indicate the things that can be improved.
For example, more often than not, we are forced to rely on template creation in order to bring the accuracy of extraction to 100%.
Even though the training works, it's not as good as it is made out to be. In our case, we've got thousands of suppliers and it is just not humanly possible for us to keep creating templates for each one of these suppliers. What we would need is an out-of-the-box extraction to be even more accurate, or for the training to improve in such a way that just the training should be able to give us a reasonable amount of accuracy. I know that 100% is too high a limit to aim at, however, even if we get close to 85% to 90% margin, it would be much better. Something that could do that out of the box would solve most of the problems.
ABBYY introduced a reporting data warehouse last December, however, that reporting data warehouse has its own fair share of problems. Something like accuracy and completeness of out-of-the-box recognition is something that should be available as part of your default reports in the tool itself. Right now, that is missing.
ABBYY provides a document definition for an invoice, however, that invoice is a supplier invoice. In our organization, we deal with commercial invoices most of the time. The problem there is that supplier invoices. If you put them into the system, it gives you around 80% to 85% accuracy out of the box. However, with commercial invoices, you only get around 60% accuracy out of the box.
Based on the input that we have provided, ABBYY is now working on including the commercial invoices as well as part of that default invoice document definition. They are working on that as well. There are a lot of minor things that have come up. For example, when you create a template, ABBYY claims that you can export the default template that is generated as part of the training, and then make changes to that and then import it back. What we found was that the concept applies only to certain things. The training of the line items cannot be exported as a template. They said it's to do with the way machine learning was being applied and all that. However, these are all things which kind of increase the development time.
For how long have I used the solution?
I have a couple of years' experience with this solution. I've been working with it since about 2019.
What do I think about the stability of the solution?
The performance can improve. Even though we have tried out the solution's provider, ABBYY, in terms of having more processing stations, all still boils down to the fact of how your application is configured. For example, in our case, we have about five processing stations in production. When we started off, it took almost 60 seconds to process one page because of the complex nature of our application. We had around five document types, and each type has to be classified based on a set of keywords, and then you had to apply the document definitions on top of them. Then it goes through this concept of the generic layout and the additional layout and so forth. Almost every page used to take about 60 seconds on average. Then we made a lot of tweaks to the application and then we brought it down to around 22 seconds per page.
That said, even then, when you're talking about the space that we are in, the domain that we are working in, sometimes a 100-page document has to be processed within around 30 minutes. If you take a 100-page document, only for processing that itself, it takes almost close to an hour. Then, what happens is that additional stages like verification, et cetera, by the time the data is exported and uploaded into the target system, it's taken too long.
How are customer service and technical support?
With the normal ABBYY support, you go to support.abby.com and you raise tickets. The support is almost immediate. There are constant responses that come to you. However, what I've also seen is that there are a lot of complex problems for which we have not gotten the solution we need. While the response has been great, if you asked me whether all the queries for the problems that we raised have been solved, the answer is no. On the other hand, we also have a chance to interact with the ABBYY professional services team directly due to the contract that has been established with our organization. They are a wonderful bunch of people and there's this personal rapport that we have with them. We get in touch with them on a weekly basis and our query is resolved.
How was the initial setup?
In our organization, it is the IT team that takes care of the installation. I do not have much information about the process and I wasn't really a part of it.
What other advice do I have?
We're just a customer and an end-user.
I'd advise users that, when you start off with your first project on FlexiCapture, choose a project which has at least something which can accommodate a very high pattern. For example, if you're going to go for a use case wherein documents have to be processed within the next 30 minutes, you're going to face a lot of problems. ABBYY is a wonderful tool, however, it has got its own set of constraints. It's very important to understand the constraints of the tool.
The strength of the ABBYY FlexiCapture license is the OCR engine. However, if you're extracting it to process millions of documents within a very small amount of time, that is not going to happen.
It's very, very important for the final operators who are going to use the tool to understand what OCR is all about. The problems that we face are not technical at all. It is about trying to convince people, people who have been doing operations manually to change their processes. If you have a situation where people are looking at documents, looking at the data, and entering it into a system, and then come and tell them that the solution will extract that information automatically and they just have to verify that information, they'll need to change their approach.
It's very important to get the business team who will be doing the verification into a meeting, in order to help them understand what OCR is, and also its limitations.
You will need to specify the stock field and the zones. OCR is able to learn on its own and understand how to extract data from it. However, there are certain things that you will have to teach people. You have to tell them the fact that a page is converted into black and white, and that the gray areas either cannot be black or white as certain characters that are being recognized may turn out to be something else. OCR tells you the level of confidence based on what gets extracted. All this has to be translated to the business team so that they understand the tool and its limitations. If you do this, you can ensure the success of the project.
Therefore, while ABBYY, as a tool, is great, there is a lot of work that needs to be done before you start implementing it.
I'd rate the solution overall at a nine out of ten. We have had to initiate a lot of fixes, however, overall, it's quite good.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.