Challenge and Opportunity A Keiter Technologies client conducted a survey over a few weeks that had several thousand responses sent in, primarily by mail. These were all then scanned which made the responses digitally available but without any direct...
Challenge and Opportunity
A Keiter Technologies client conducted a survey over a few weeks that had several thousand responses sent in, primarily by mail. These were all then scanned which made the responses digitally available but without any direct way to extract the text from the PDFs. Without some sort of Machine Learning solution, the client would have to manually count all the responses and link them to the correct users in their database.
Normally, a problem like this might merit training a custom model to correctly extract all relevant text from each PDF. However, because the survey was only being conducted over a few weeks this means they would not get enough data to train the model until the model was no longer useful.
Approach
The Keiter Technologies team used an already trained Optical Character Recognition (OCR) model to read the PDF text. They identified a pattern that allowed the OCR to recognize hand checked boxes. The team sorted the PDFs into directories based on predictions and reviewed the data for discrepancies.Results
Using the OCR model saved many hours of manual data entry. The PDF sorting also made it easier to do any recounts that may be requested or needed.
Learn More about our Innovative Data Solutions Services
The post IDS Case Study: Using OCR to Tally Survey Responses first appeared on Keiter CPA.