Research and development of expert system for suspicious and fraudulent phenomena detection in non-structured data

Expert system analysing separately text and visual part with the main purpose of detecting insurance frauds. Utilization of lexical algorithms and local descriptors for image analysis.

The aim of the project is to develop a modular expert system working with non-structured and semi-structured data records for insurance services that would detect suspicious or fraudulent phenomena. A trend of documents digitalization in all areas of human activities is apparent. This is related to significant increase in volume of information and data in digital form, but also in increased demand for processing of non-structured and semi-structured data. Apart from traditional database records and metadata, these also include non-structured texts or several other components like images, sound records or videos. Additionally, these data records are stored in volumes of higher orders of magnitude than in last years. To be able to use these data records, novel technologic solutions are required. Expert systems represent a mechanism how to process the non-structured data in order to mine usable information from them. Insurance business is one of the significant domains where expert systems have big potential for detection of suspicious or fraudulent phenomena. Numerous insurance frauds can be revealed by comparison of images enclosed in individual claims’ documentation. This comparison is based on image analysis using local descriptors. Other possible way is to seek for significant similarities in text describing the claims. Experience from previous analyses and Proof-of-Concept projects delivered in Czech Republic and Slovakia by proponents of this FRAUDES project showed that success rate of insurance fraud detection can be increased approx. by 20 – 25 % when expert analysis of non-structured data is implemented. It is important to note that there are several million of realty insurance claims a year. The proposed project is aimed on development of an expert system that would help the insurance companies to detect insurance frauds in area of non-life insurance (vehicles, realties or movable property) as well as in life insurance claims. The modular system will be able to analyse a complete documentation of an insurance claim and compare it to other records in the database but also with data from other insurance companies or data from other sources (typically from the internet). The analysis will be focused on two main parts – the analysis of non-structured or semi-structured texts, and analysis of image and videos. The goal of the text analysis is to seek for common traits or duplicities in medical reports that are enclosed into documentation, as creating, editing or copying parts of genuine medical reports is the most common way how frauds are attempted in the area of life insurance. During the image analysis, the goal is to recognize that images of the same damaged object (e.g. a car) are documented in more than one insurance claim. Using local descriptors, it is possible to detect similarities in photos of a damaged car, even if the photos are taken from different angle or level of zoom. Previous experience of the team with projects delivered for Ministry of the Interior of the Czech Republic and Czech Insurance Company showed that detection of fraudulent behaviour using such an expert system is realistic. However, it has been discovered that current algorithms are not usable in real-life environment due to their slow speed and high demand for time. This is the main reason why it is necessary to develop new, faster software, which will be achieved by using other descriptors in the core. Another goal is to design and implement scalable algorithms for collection, persistence and parallel analyses of text and image. The whole expert system will have modular structure so it could be used also in other than insurance business domain.
Project ID: 
11 723
Start date: 
Project Duration: 
Project costs: 
1 140 000.00€
Technological Area: 
Information Technology/Informatics
Market Area: 

Raising the productivity and competitiveness of European businesses through technology. Boosting national economies on the international market, and strengthening the basis for sustainable prosperity and employment.