How Is the Total Mileage Read on a Locomotive
one. Introduction
In a competitive client-driven auto insurance landscape, businesses are constantly changing the way they collaborate with customers to improve attraction and retentivity. Amend customer experiences and more efficient interactions with customers lead to satisfaction which is one of the meridian differentiators that touch customer loyalty. Digitization and process automation allow service providers to unveil timely opportunities to offer effective and fourth dimension-saving interactions to improve customer experience.
With the incorporation of more sophisticated condom features in modern cars, the increase in merits price due to the replacement of modern devices is outpacing the decline in claim frequency. Hence, there is pressure on insurance companies to create a more effective mode to handle auto claims. Filing a claim is an example of one of the few directly interactions customers have with their insurer and information technology comes at a time when they are under stress and will about probable capeesh a streamlined process.
However, the typical experience offered today from most insurers when you have an accident involves a procedure to manage the claim filing that can be slow, expensive (for the insurer) and may involve several insurance representatives. The same thought applies when a potential new customer is inquiring about a new insurance policy.
When asking for a quote for a new policy, potential customers can upload photographs that can be used for retrieving quick information about the car from their telephone to a web-based app, which can be analyzed in seconds. This results in a quick and authentic quote. Past reducing human error and accelerating the information collection process, we tin can make processes that involve customer interactions smoother, hence simplifying the policy claims ecosystem for the agent, the client and the insurer.
Optical character recognition (OCR) is a widely researched trouble in computer vision. Text extraction from scanned documents or from pictures taken nether controlled lighting has seen meaning improvement with the appearance of deep learning architectures. Still, text extraction from images in the wild is even so very challenging. The general purpose OCRs do not work well for images from uncontrolled sources. In this paper, nosotros describe a novel solution for extracting mileage readings from odometer images. In the insurance domain, particularly for machine insurance quotes and claims processing, there are 3 key pieces of information; license plate number, odometer mileage reading and vehicle identification number (VIN). License plate recognition and VIN recognition from images are very popular problems and there exist commercial solutions for both. Information technology is important to note that VIN recognition is a significantly easier problem since for mod cars the VIN number plates are standardized. To the best of our noesis, few or no work has been washed for odometer mileage extraction from images and there are not reliable available commercial solutions for this use-case.
There are several open source and commercial OCRs available in market such equally Tesseract [ane], and the built-in OCR toolbox in Matlab [two] to name a few. These OCRs systems are designed to read characters from high quality pictures taken by scanners or a camera under good lighting conditions. They utilise epitome pre-processing and grapheme partitioning techniques that are very specific to document images. They are trained to recognize printed characters which are different from characters in a odometer display since odometer images contain huge variation in color, intensity, font, and texture. For all these reasons, these OCR systems perform poorly on odometer images. Google cloud vision API [3] is another interesting commercial option that does a improve job in extracting text from images in the wild, but its performance on odometer images is nowhere close to our accuracy expectations and does not meet our performance requirements.
We separate the mileage extraction problem into ii parts; identifying odometer display and extracting characters inside the display. We leveraged existing object detection architectures to solve each office and finally designed a post processing algorithm to extract mileage number. We tested two unlike object detection architectures Single Shot Detector (SSD)[iv] and Faster RCNN [5]. Our system differs from open source OCR such every bit tesseract and other commercial OCRs both on the system architecture and the dataset used for training. We used hand labeled odometer pictures to train the character recognition which makes our model much more customized to odometer characters than any other OCRs. Nosotros also designed the post processing algorithm to distinguish mileage reading from other characters in odometer display such as tripmeter reading, temperature, etc.
The remainder of the paper flows as follows: In department 2 nosotros nowadays relevant related piece of work that uses recent machine learning techniques to excerpt text from pictures taken in non-restrictive environments and background on FasterRCNN and SSD object detectors. In section three nosotros describe the data used to railroad train our system which is described in detail in sections 4 (system workflow). After that, nosotros share results derived from our empirical evaluation of the organisation in section five followed by a clarification of how the organisation is being deployed in section 6. We end the paper with conclusions and lessons learned and discuss future piece of work in section 7.
ii. Preliminaries
ii.i. Related Piece of work
Equally mentioned before, automated license plate recognition (ALPR) is a mostly commercially solved problem. Besides traffic monitoring, this technology is used in many applications such as, highway toll collection, border and custom checkpoints, parking admission control system and more recently, homeland security. The ALPR problem is like in some aspects to our problem proposed here since most ALPR system breakdown the problem into similar sub-tasks: number plate detection, character partition, character recognition. Deep convolutional networks have been used recently to ameliorate accuracy on ALPR systems [6] and in Bulan et al. [7] they advise the utilize of synthetically generated images to improve CNN performance while reducing the need for human being labeling. A more comprehensive survey view of such system can be seen in Sanap and Narot [8], Sonavane et al. [ix], and Du et al. [10].
Faster RCNNs have been successfully used to extract text from pictures taken in the wild, for example in Nagaoka et al. [eleven], the authors propose an architecture that takes into consideration the characteristics of texts by using multiresolution feature maps to detect texts of various sizes simultaneously. A faster RCNN approach is also used in Rosetta [12], a recently proposed scalable system to extract text from web images.
There are many recent real-world applications to detect text in images where the faster RCNN and the Single Shot Detector (SSD) architectures take been used successfully. A practiced representative example of such arrangement is presented in Yang et al. [thirteen], where the goal is to extract (detecting and recognizing) text from biomedical literature figures.
Even so, to best of our cognition there is few or no work related to extracting mileage readings from odometer pictures.
2.2. Faster RCNN
Early on object detectors used pyramidal sliding windows over the input image followed by an image classifier to detect objects at diverse location and scales. The Fast RCNN architecture introduced by Girshick [5] made significant improvement over these architectures by using selective search for region proposals and convolutional feature maps equally input. Although, Fast RCNN was significantly faster than the previous architectures, the region proposal technique was still also slow for most existent-fourth dimension applications. Faster RCNN, introduced in Ren et al. [14], solves this trouble past using a dissimilar region proposal network.
Faster RCNN can be roughly viewed as a combination of two networks: The region proposal network (RPN) and a classifier as shown in Figure 1. The RPN takes convolutional characteristic map inputs and outputs a set up of rectangular object proposals and an objectness score for each proposal. But before that, the commencement pace is translating the image to convolutional feature maps past passing the image through a series of convolution layers. In faster RCNN, RPN is modeled with a fully convolutional network [fifteen]. Region proposals are generated by sliding a pocket-sized sub-network over this convolutional feature map output. The sub-network looks at n × n spatial windows of input feature maps and projects it into a lower dimensional characteristic vector. At the cease of the sub-network architecture there are two siblings fully connected layers: a box-regression layer and a box-classification layer. The regression layer outputs delta coordinates to adapt the reference anchor coordinates for each spatial window. The box-classification layer predicts the possibility of an ballast box being either background or an object. For the next stage of processing, only the anchors with high scores are retained. The second role of the faster RCNN architecture is a classifier that predicts the class label for the regions proposed past the RPN. The classifier also contains a regression layer that outputs offset coordinates to further tighten the proposed box. The output region from the RPN is passed through a ROI pooling layer to map them to a fixed shape before feeding them to the classifier. The classifier consists of a fully connected layer that outputs softmax scores across all the class labels.
Effigy 1. Faster RCNN detector.
two.3. SSD
The unmarried shot multiBox detector (SSD) was introduced by Liu et al. [4]. The Faster RCNN algorithm produces authentic results but the network is however computationally intensive for use in some existent-fourth dimension applications [4]. The SSD algorithm proposed a serial of improvements over the existing object detection architectures for accelerating running time. The main thought behind SSD is predicting category scores and box offsets for a fixed set of default bounding boxes using small convolutional filters applied to feature maps. SSD and then generates predictions from different scales of feature maps thereby producing predictions for all of them. Similarly to the faster RCNN algorithm, the input to SSD is a convolutional feature map. In the original paper, the convolutional characteristic map is generated by passing an image through the Conv5_3 layer of a VGG-xvi network. The feature map is downscaled using convolutional filters to get feature maps at multiple scales. Figure 2 shows original feature maps along with half-dozen downscaled ones. Each feature map is processed independently using different convolutional models to observe objects at item scales. There is a set of default boxes associated with each jail cell of the characteristic maps. The convolutional model predicts get-go coordinates relative to the default boxes and class scores for that box. The offset coordinates move and tighten the default boxes for a amend localization of objects. The architecture is trained end-to-stop past minimizing the weighted sum of the localization loss and the classification loss.
Figure ii. Single Shot Detector extracts detections from feature map at multiple scales.
ii.4. Transfer Learning
The success of Deep Learning is contributed mostly by the large datasets available for training the model. Nevertheless, data acquisition and notation is plush and time consuming. Both SSD and the Faster RCNN detectors contain deep architecture with large number of parameters. Hence, preparation them from scratch with small dataset can lead to overfitting.
Transfer learning allows deep networks to be trained on ane domain and reused on a different domain. The first few convolution layers of a CNN trained on images learn universal representation of image features. These layers tin exist reused to build an image classifier with a different dataset. The reused layer can either be fine-tuned on the new network or kept frozen allowing only the newly added layers to exist updated. At that place are several different ways to adopt transfer learning in object detection. Figures ane, 2 show that the get-go step for both the SSD and the faster RCNN detector is transforming the images to convolutional feature maps using a feature extractor. This feature extractor tin can be constructed from the beginning few layers of pre-trained image classification architectures such as VGG [16], Inception [17], Resnet [18], etc. trained on a large image classification dataset such as imagenet [19]. When training the object detection model, the layers in the feature extractor can either be kept frozen or updated with a very minor learning rate depending on the size of the dataset. Another style of adopting transfer learning in a detection domain is by preparation a detection model end-to-end using a large object detection dataset such equally Pascal VOC [20], MS COCO [21], and fine-tuning information technology with a new dataset.
3. Data
Grooming object detection architectures such as SSD and Faster RCNN requires a big corpus of annotated training samples. Our initial dataset independent simply effectually six k (6,000) odometer images. These images were uploaded by customers when filing an auto insurance claim. Before any further processing, we manually filter the dataset to remove images with potential personally identifiable data (PII). Nosotros also removed images that do not contain odometers in them. Finally, the gathered dataset has full of 6,209 odometer images. The images came from uncontrolled sources and hence in general the quality of images in the dataset is poor. Almost images suffer from not-uniform illumination, insufficient lighting, wrong orientation and low picture resolution.
iii.1. Labeling
The process to label the dataset tin exist divided in two stages. In the first stage, nosotros aimed to manually segment the odometer brandish by drawing a bounding box enclosing the display. Here, the term odometer display refers to LCD screens from digital odometers or mechanical meter from analog odometers. In the 2d stage, our goal was to generate boxes enclosing each individual grapheme inside the odometer brandish and label the characters with the respective digit.
Both of the note stages involved labor intensive and repetitive tasks. Hence, we resorted to crowdsourcing equally a feasible solution for these tasks. At that place are several commercially available platforms that facilitate crowdsourcing labeling tasks. We used two popular crowdsourcing platforms: Amazon Mechanical Turk (AMT) [22] and Figure Eight (previously known as Crowdflower) [23].
Amazon Mechanical Turk is 1 of the largest crowdsourcing platforms operating today. At whatever given time, information technology has hundreds of active workers gear up to work on the given task. It provides flexibility to build customized user interfaces using HTML, CSS and javascript. It also provides some bones customizable templates for annotations tasks like sentiment analysis, image classification, NER, etc.
For our first stage of the annotation procedure, i.east., manually segmenting the odometer brandish, we used AMT. For this task, we modified the UI opensourced by Russell et al. [24]. The modified UI allows workers to draw a box over the image, drag it and resize it. We collected 3 boxes from different labelers for each image in order to capture possible annotation errors.
Figure 8 is some other crowdsourcing platform that works similar to AMT. In addition to supporting HTML, CSS and Javascript for UI blueprint, it has rich UI templates for labeling different objects in images. It has built-in functionality such every bit zoom-in, zoom-out, scrolling, etc. that are very relevant for the states when drawing character level bounding boxes. The zoom-in functionality facilitates the ability to draw tighter boxes. This platform besides monitors the quality of piece of work done past its workers. All workers have to laissez passer tests before they can work on any note job. For all these reasons, we plant the quality of the annotations on Figure 8 to be improve than the ones obtained when using AMT but this comes at an extra price. Hence, we decided to utilize both platforms for each of our first and second stage of annotation, depending on the trade-off between the cost of labeling vs. the quality of the annotations.
For any sort of annotation job completed through crowdsourcing, information technology is important that the workers understand the expected result of the solicited annotations. It is essential to provide clear and detailed labeling instructions, covering all the corner cases and at the aforementioned time being as precise every bit possible. We completed the annotation tasks in several batches, nosotros evaluated the annotations quality for each batch and identified the central sources of defoliation among the workers. Nosotros and then changed the instructions accordingly before sending out the next batch. Figures three, four prove some sample odometer images and the annotation labels.
Figure three. Sample odometer images.
Figure 4. Sample annotations. (A) Labeling odometer display. (B) Labeling characters.
Tabular array 1 and Figure five testify the distribution of the characters in the dataset. 73% of all labeled characters are digits while only 27% of them are letters. With 52 possible alphabet letters (26-lowercase and 26-uppercase), the number of samples for each alphabet class is too small and highly imbalanced. This later inspires us to grouping all the alphabet characters together in a unmarried class when training the character recognition model.
Table 1. Dataset and distribution.
Figure v. Distribution of characters; 10 represents non-digit characters.
We as well collected additional data from the labelers most the quality of the images in our dataset. During an initial manual inspection, we noticed that a meaning portion of the images in the dataset were non of good quality. To confirm this, during annotation, we asked the annotators to rate the image quality of the characters into different categories. Table two shows the distribution of the images in five categories. Notation that a significant portion of the images (21%) are marked every bit existence of poor or extremely poor quality.
Table ii. Image quality distribution.
four. General Workflow of the Organisation
The proposed solution consists of ii cascaded object detection classifiers followed past a mail service-processing algorithm (See Effigy vi). Algorithms for object detection have seen significant improvement over the last few years. In order to leverage the effectiveness of these models, we split our trouble into ii sub-problems that can directly exist seen equally problems in the object detection domain:
• The first is odometer localization where the goal is to locate the odometer display given an input image.
• The second is grapheme recognition where the goal is to locate and recognize characters within the odometer display.
Figure 6. Pipeline of proposed architecture.
We next proceed to explain in detail each one of these sub-problems.
4.1. Odometer Localization
The start stage of the pipeline is to isolate and excerpt the odometer brandish from the residual of the paradigm. There are commonly two types of odometers: analog and digital. Digital odometers have LCD displays containing a mileage reading and may exist accompanied by other information such as temperature, time, fuel status, etc. The analog odometer consists of a mechanical rolling meter. Although, there is large variation in advent of analog and digital odometers, we do not differentiate these two types for this phase. In order to railroad train the odometer localization model, nosotros trained an object detection model with odometer images where the odometer brandish box is the object of involvement. The position of odometer display is supplied as coordinates (x-eye, y-heart, summit, width) of the odometer display box. Object detection algorithms are usually trained to localize and classify objects in the image. However, for odometer localization at that place is a single class i.e., odometer display, so the simply output we want from the model is the localization coordinates. During inference, the localization model takes an epitome and output back the coordinates(10-center, y-center, width, top) of the odometer display.
4.two. Character Recognition
The second stage of the pipeline consists of a graphic symbol recognition model. This an object recognition model trained on images and labels generated in the second stage of annotation. The training images for this phase come up from the odometer display labeled in the first stage. We crop the odometer display for each image in the dataset and feed information technology to the model along with the annotations from the second stage. The second stage produces annotations of position (x-center, y-center, height, width) of each individual character and the corresponding course label. We do make some changes to the class labels before grooming the classifier. Since we only intendance well-nigh getting the mileage number in the images, it's sufficient to recognize only digits in the images and non the rest of the alphabet characters. Furthermore, if nosotros look at distribution of characters in Table ane, we have very few samples per grade for the messages in the alphabet. Training a model to recognize individual alphabet characters means we would accept very few examples for most class labels and we would risk overfitting. Instead, we categorize characters into 11 different grade labels, ten for the digits 0–9 and 1 "not-digit" class for all the alphabets.
iv.3. Post-processing
The grapheme recognition stage identifies private characters inside the odometer display along with their coordinates. In the last part of the pipeline, we want to isolate the digits that are function of the mileage reading. The mail-processing step combines nearby characters to form words/numbers and selects the most likely number as the mileage reading. In some digital odometers, we tin find additional information existence displayed alongside the mileage reading. Some of the nigh oftentimes seen additional pieces of information include temperature, time, warning letters, trip meter reading, fuel status, etc. It is essential to distinguish the bodily mileage reading from other numbers being displayed on the screen. Similarly, for an analog odometer we notice two variants: virtually models have six digits while a few older models have vii digits. Usually the 7th digit changes every ane/10th of a mile and is not considered a pregnant part of the mileage reading.
In order to deal with special cases like this, we designed a postal service-processing algorithm that takes intendance of all these corner cases. The processing algorithm is described in particular beneath in Algorithm one.
Algorithm 1: Mail-processing algorithm
5. Evaluation and Empirical Results
v.ane. Experimental Settings
Nosotros randomly selected a small portion of the grooming set and used it as validation set for all experiments. The hyperparameter selection for all architectures is based on performance in the validation set. We used the object detection API included in tensorflow models [25] to train and evaluate the models. Huang et al. [26] provides in-depth comparison of speed and accuracy of dissimilar meta-architectures supported by the API. We used a Amazon Web Services (AWS) Elastic Cloud Compute instance containing 8 GPU with 12 GB memory each for training and testing the models. For both odometer localization job and graphic symbol recognition task, we train SSD and faster RCNN architectures with several choices of CNN model for Feature extraction such as inception v2 [27], resnet101 [18], inception resnet [28], mobilenet [29], etc. Nosotros experimented with both approaches of transfer learning described in the previous section: (a) we fine-tuned a detection model trained on the MS COCO dataset and, (b) we used a classification model trained on the imagenet dataset for characteristic extraction and trained the remaining layers from scratch. We detect that using the detection model trained on the MS COCO dataset gave the best results. Furthermore, SSD got the best performance with inception v2 as features extractor and Faster RCNN got the best results with inception Resnet every bit the feature extractor. We report the mean average precision for the best performing SSD and faster RCNN for the two stages; Odometer localization and character recognition. We study the final accuracy and error analysis for the faster RCNN architecture which is a winner between the ii architectures for both stages.
The best performing faster RCNN model is finetuned version of a faster RCNN detector originally trained on MS COCO dataset. The MS COCO detector was trained with inception resnet architecture [detailed in Szegedy et al. [28]] as feature extractor and 90 unlike categories in MS COCO dataset equally output objects. Nosotros finetuned this model past modifying the terminal layer to detect one class(odometer brandish) for odometer localization. Similarly, for character recognition we modified the last layer to output eleven classes(0,1,.,nine, Ten). We used a grid anchor generator with scales of 0.25, 0.5, one.0, and 2.0, aspect ratios of 0.5, i.0, and 2.0 and strides of 8 for both summit and width. This ways a total of 12 proposal boxes for each anchor position in the grid. The post processing phase is set to turn down all the detections with score < 0.3. The IOU threshold is set to 0.six for Non maximum suppression. The loss being minimized is the sum of localization loss and classification loss both of which are equally weighted. We used learning charge per unit of 0.0003 and trained the model for l, 000 steps with a batch size of viii.
five.ii. Results
A mutual evaluation technique for object detection models is to measure mean average precision (map) [20] for a certain threshold of the Intersection Over Union (IOU) ratio. A prediction is a truthful positive if the IOU ratio betwixt the predicted bounding box and the actual box is greater than the IOU threshold. Table three shows themap values (at IOU = 0.v) of the SSD and the faster RCNN models for both the odometer localization the and grapheme recognition job. The results clearly indicate that the faster RCNN algorithm is a winner for both tasks.
Table 3. Mean Average Precision of Faster RCNN and SSD architectures for odometer localization and character recognition phase.
Our mileage extraction model contains two object detectors working in conjunction. Rather than detecting an object/character, the objective is to extract the actual mileage reading. To exercise so, the model has to predict every single digit correctly. For our organisation, getting those numbers right is more important than getting perfect localization of the odometer brandish or the individual characters.
In order to measure organisation performance, We divers a binary measure of end-to-end organisation accuracy in the following fashion: the model gets a score equal to 1 if extracted mileage equals the annotated mileage and 0 otherwise. Furthermore, in most business use-cases, it is sufficient to get the mileage within a given mistake range. For example; if a model predicts the mileage to be 45,607 when the bodily mileage is 45,687 so there is an error of lxxx miles. For use cases such as insurance quote generation or claims processing a perfectly adequate margin of error is around a thousand (1,000) miles. Taking this into account, we introduce one more boosted end-to-cease system evaluation metric in the following mode: the model gets score = 1 if absolute(extracted mileage–annotated mileage) < threshold and 0 otherwise (where threshold = i,000 miles).
Since the overall quality of images in our odometer images dataset is not so good, we performed a further assay on the effect of the image quality on the performance of the model. We created a subset of the test prepare comprised of only the proficient quality images. These images are selected from the test set up based on their corresponding annotator rating. This "good-quality images" subset ended up containing 362 images. Figure 7 shows end-to-cease arrangement accuracy for the faster RCNN model for both the original exam set and the "good-quality images" subset. For the original test fix, nosotros obtain finish-to-cease accuracy of 85.four% using faster RCNN for both stages. Similarly, we achieve an accuracy of 88.viii% inside an error boundary of 1,000 miles. For the "good-quality images" subset, we get a full general accuracy of 90% and an accurateness of 91.4% inside an mistake spring of 1,000 miles. It is important to note the improvement of 5% in examination set accuracy associated with the improvement in image quality. This issue presents an opportunity to ameliorate performance by validating the quality of uploaded images in existent time and providing immediate feedback and guidance to the customer to generate better quality pictures. Sample results for odometer localization and character recognition are shown in Figures 8, ix.
Figure seven. Accurateness results comparisons.
Figure viii. Selected examples of odometer localization. In case of multiple detections, only the nearly confident box is shown.
Figure 9. Selected examples of character recognition. Character recognition model scans for characters within the region(green box) proposed by localization model. Characters in red are predictions from character recognition model. Ten represents not-digit character.
5.3. Error Analysis
To place central weakness of the model and opportunities for improvement, we performed a more detailed error analysis. For all the incorrect predictions, we manually assigned the error to one of the three stages in the pipeline. Effigy 10 shows the distribution of the incurred test set errors among the odometer Localization, the character Recognition and the post-processing stage. The localization errors occur when the localization model cannot properly detect the odometer brandish, either considering it did not find the brandish or because the proposed bounding box is not accurate enough to include all the characters in the display. It is axiomatic from Figure 10 that a large portion of the errors are coming from the grapheme recognition stage. Errors in this stage include not detecting or recognizing characters inside the odometer brandish. This error could be minimized by improving the character recognition model. As we mentioned before, image quality is an important gene in improving accuracy and we need to put more attempt on ensuring that the uploaded images come across minimum quality standard.
Figure 10. Detailed mistake assay by stage.
The postal service processing algorithm constitutes 15% of the total fault. This error comprises cases such every bit failure to group digits together, failure to distinguish mileage from other numbers in the display, identifying the digit after the decimal point as part of the mileage, etc.
six. Deployment Architecture
The deployment of the odometer mileage detector is a piece of work in progress. However, we are reusing a deployment framework used in the past for similar image recognition models in our visitor. In this section, we will describe such framework.
Containerized deployment is very popular nowadays. Containers are independent, easily configurable and easily scaled to multiple machines. Microservices running inside containers provide isolation from actual system ingesting the service and provide flexibility to work independently and quickly. Nosotros deploy the model as a microservice running in a docker container. Docker allows packaging codes and dependencies into a docker paradigm that runs within a docker container. Docker containers are uniform to run on any operating system.
Figure 11 shows the overall architecture used for deployment. We use tools provided by the Amazon Web Service(AWS) ecosystem to launch, scale, orchestrate and run the docker container. Detailed description of each of these tools can be constitute in the official site [xxx]. The fundamental component is the docker container hosting the odometer mileage extraction model. We use the Amazon rubberband container registry (ECR) to host docker images and Amazon elastic container services (ECS) to run the containers. We use Amazon systems manager parameter store (SMPS) to shop runtime parameters and Amazon CodeBuild to build the docker prototype. Furthermore, Amazon ElasticBeanStalk (EBS) is used to orchestrate the deployment to ECS, as well as to provision and configure other resources such as LoadBalancer, AutoScaling groups, etc. EBS facilitates logging, monitoring and sending notifications to the developers about unexpected service interruptions. We believe that the Continuous Integration/Continuous Delivery(CI/CD) principle [31] is a crucial role of any data science projection. We want to be able to train new models or update code base and deploy them into production automatically with minimum effort. This allows data scientist to focus more on improving models rather than spending time on deployment. For CI/CD, we use Jenkins. As before long every bit we push button changes to a git repository, Jenkins builds an image, runs tests and deploys the model to production. Hither is a step by pace break downward of the deployment procedure:
• Button changes to git repository hosted in bitbucket.
• Jenkins monitors changes in git repository and initiates build process.
• Jenkins builds code, runs test and builds image.
• Jenkins pushes image to ECR and issues deploy to ECS.
• ECS pulls new image from ECR and runs information technology inside docker container.
• EBS receives a HTTP request with odometer image.
• ELB distributes load across multiple containers and EBS launches boosted container instances if necessary.
• Container processes the image and sends mileage back to user app.
Figure 11. Deployment architecture.
The client mobile app makes a HTTP request to odometer server and receives a mileage number in response. It auto-fills the odometer mileage reading into the form. The user will take an choice to validate, and correct the mileage reading if necessary, before submitting the form. The odometer picture is uploaded to a on-bounds server along with the form during submission.
7. Conclusions and Future Work
In this work we developed a novel solution to the insurance-related trouble of extracting mileage readings from odometer images. We leveraged existing object recognition technology and designed a post processing algorithm to identify and extract mileage readings. The developed system was able to become high accuracy in mileage extraction despite having poor quality images. We also have provided a consummate implementation design including the tools and engineering we are using to deploy, scale and manage the model in production.
Our detailed error analysis provides insights into the shortcomings of the organization and unveil opportunities to improve it. We can further improve performance of the model using image guidance and enforcing minimum requirements on image quality. For case, when a user takes a pic of the odometer, the app display could contain a bounding box and the user will be asked to align the odometer display inside that bounding box. This technique is commonly used in several applications that read data from credit cards, personal checks, etc. Epitome guidance could help mitigate the need for having an accurate localization model and hence the errors associated with that model could be minimized significantly. This volition also ensure that the images are taken directly facing the odometer display and with a proper orientation.
We are also exploring methods to judge prediction confidence for the predicted mileage digits. If we are able to estimate prediction conviction, we can automatically have images when we feel confident that we are predicting the correct mileage reading and inquire the user to repeat the procedure or enter the mileage past hand if nosotros neglect to produce a confident enough prediction.
Data Availability Statement
The datasets generated for this written report cannot exist released publicly due to the privacy concern of the customers. Requests to access these datasets should be directed to the corresponding author.
Author Contributions
SA implemented the project, ran experiments, and worked on manuscript. GF initiated the projection, managed it, and worked on manuscript.
Conflict of Interest
SA and GF were employed by the visitor American Family Insurance.
References
1. Smith R. An overview of the tesseract OCR engine. In: Proc. Ninth Int. Conference on Document Assay and Recognition (ICDAR) Parana (2007). p. 629–33. doi: x.1109/ICDAR.2007.4378659
CrossRef Full Text | Google Scholar
3. Hosseini H, Xiao B, Poovendran R. Google's cloud vision API is not robust to noise. CoRR. (2017) abs/1704.05051.
Google Scholar
four. Liu W, Anguelov D, Erhan D, Szegedy C, Reed Due south, Fu CY, et al. Ssd: unmarried shot multibox detector. In: European Conference on Calculator Vision. Amsterdam: Springer (2016). p. 21–37.
Google Scholar
5. Girshick R. Fast R-CNN. In: The IEEE International Conference on Computer Vision (ICCV). Beijing (2015).
Google Scholar
6. Masood SZ, Shu G, Dehghan A, Ortiz EG. License plate detection and recognition using deeply learned convolutional neural networks. CoRR. (2017) abs/1703.07330.
Google Scholar
7. Bulan O, Kozitsky V, Ramesh P, Shreve Chiliad. Division- and note-free license plate recognition with deep localization and failure identification. IEEE Trans Intell Trans Syst. (2017) 18:2351–63. doi: ten.1109/TITS.2016.2639020
CrossRef Full Text | Google Scholar
9. Sonavane Thousand, Soni B, Majhi U. Survey on automatic number plate recognition (ANR). Int J Comput Appl. (2015) 125:1–four. doi: ten.5120/ijca2015905920
CrossRef Total Text | Google Scholar
10. Du South, Ibrahim Thou, Shehata MS, Badawy WM. Automatic License Plate Recognition (ALPR): a state-of-the-art review. IEEE Trans Circ Syst Video Technol. (2013) 23:311–25. doi: ten.1109/TCSVT.2012.2203741
CrossRef Full Text | Google Scholar
11. Nagaoka Y, Miyazaki T, Sugaya Y, Omachi Due south. Text detection past faster R-CNN with multiple region proposal networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 6. Kyoto: IEEE (2017). p. 15–20.
Google Scholar
12. Borisyuk F, Gordo A, Sivakumar V. Rosetta: Big scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Information Mining. London: ACM (2018). p. 71–nine.
Google Scholar
13. Yang C, Yin X-C, Yu H, Karatzas D, Cao Y. ICDAR2017 robust reading challenge on text extraction from biomedical literature figures (DeTEXT). In: 2017 14th IAPR International Briefing on Document Assay and Recognition (ICDAR). Vol. 1. Kyoto: IEEE (2017). p. 1444–7.
Google Scholar
14. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards existent-time object detection with region proposal networks. In: Advances in Neural Data Processing Systems. Montreal, QC: Curran Assembly, Inc. (2015). p. 91–nine.
PubMed Abstract | Google Scholar
15. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Briefing on Calculator Vision and Design Recognition. (Boston, MA) (2015). p. 3431–40.
PubMed Abstruse | Google Scholar
16. Simonyan K, Zisserman A. Very deep convolutional networks for big-calibration image recognition. CoRR. (2014) abs/1409.1556.
Google Scholar
17. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Reckoner Vision and Blueprint Recognition. (Boston, MA) (2015). p. 1–9.
Google Scholar
18. He One thousand, Zhang X, Ren South, Sun J. Deep residual learning for epitome recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (Seattle, WA) (2016). p. 770–viii.
Google Scholar
xix. Russakovsky O, Deng J, Su H, Krause J, Satheesh Due south, Ma S, et al. Imagenet large scale visual recognition challenge. Int J Comput Vision. (2015) 115:211–52.
Google Scholar
20. Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A. The pascal visual object classes challenge: a retrospective. Int J Comput Vision. (2015) 111:98–136. doi: 10.1007/s11263-014-0733-v
CrossRef Full Text | Google Scholar
21. Lin TY, Maire Thou, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: common objects in context. In: European Conference on Computer Vision. Zurich: Springer (2014). p. 740–55.
Google Scholar
24. Russell BC, Torralba A, White potato KP, Freeman WT. LabelMe: a database and spider web-based tool for image annotation. Int J Comput Vision. (2008) 77:157–73. doi: 10.1007/s11263-007-0090-8
CrossRef Full Text | Google Scholar
26. Huang J, Rathod Five, Sun C, Zhu Yard, Korattikara A, Fathi A, et al. Speed/accurateness trade-offs for modern convolutional object detectors. In: IEEE CVPR. Vol. 4. (Honolulu, HI) (2017).
Google Scholar
27. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Briefing on Computer Vision and Pattern Recognition. Seattle, WA (2016). p. 2818–26.
Google Scholar
28. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI (San Francisco, CA). (2017). p. 12.
Google Scholar
29. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint. (2017) arXiv:170404861.
Google Scholar
schumanntandsold80.blogspot.com
Source: https://www.frontiersin.org/articles/10.3389/fams.2019.00061/full
Postar um comentário for "How Is the Total Mileage Read on a Locomotive"