To answer such a question, algorithms of unsupervised clustering have been proposed. Since an autonomously chosen criteria is still largely missing, these algorithms rely on a clustering criteria defined beforehand. For this end, pattern discovery methods are at play. Detecting patterns in data is of high importance in many processes: for example, intrusion detection in Internet traffic, monitoring the movement of people traced by cellphone call logs, making sense of gene transcription maps, protein interactions, money transfers, credit history, and other artificial intelligence tasks.
Techniques for statistical pattern discovery
Techniques for pattern discovery consist in the ability to detect patterns in noisy and missing data – which makes them essentially probabilistic – and to detect high order patterns based on the interrelationship between data features. Classical techniques rely on the detection of statistically significant features based on frequency analysis of their occurrence in the data. To make the discovered pattern intuitive to human understanding, the relationship between the features detected must also be given.
This requirement brings naturally the tree (graph) representation of the data features for pattern discovery. Much like in random forest, nodes of the graph hold decision rules which maximize the information gained from a statistically feature and its relationship with others. Statistical tests placed at each node allow us to prune the graph’s node until a decision is made regarding a set of features making up a pattern. Of course, this forces us to define an underlying basic statistical relationship between features which is used in the hypothesis tests.
Non-parametric tests can be incorporated into the decision function at each node, relieving us from assigning a model to describe the data. However, well defined statistical models are useful when modeling a pattern of a rare event, or an event occurring in the heavy tails of a distribution representation of the data. Pattern, that is, a tree representation of the interrelation between events, can then be assigned weight and probability; it can also be searched for iteratively, as the database fills up, or offline periodically.
The algorithm developer stands at the intersection between statistics, machine learning, signal processing and pattern recognition. We at RSIP Vision require our algorithm developers to possess broad interdisciplinary knowledge, the integration of which allows us to construct cutting edge algorithms for many leading companies around the globe.