Computer Vision News - December 2018
21 Computer Vision News Baggett elaborates, “ Think about of all the complexities embedded in the Chrome browser. HTML is sort of its own monstrously complicated area which vastly complicates the problems simply using computer vision models for this task. ” Often, when looking at a logo in a marketing email or a transactional email, like a Bank of America logo, your brain assumes that the logo appears in isolation in the email. This is almost never the case, and emails usually have a giant section of imagery which combines notionally separate elements. Baggett and his team cannot take each of the images that appear in the mail and run them through a model. Instead, they must analyze and semantically separate the components of one big image like the logo, text, and so on. This presents a problem of image segmentation and even some OCR-related issues. Not every email containing a logo indicates a phishing attack either. If an email has a Facebook logo, for instance, the image might just guide their readers to like them on Facebook. Inky must take this into account as well so as not to generate tons of false positives. Combating phishing is “ just an arms race ”. As Inky improves the models, attackers will improve their tactics. Baggett points out: “ Imagine an email might come in that claims to be from American Express. One of the things the attackers are doing now is they are simultaneously trying to fool the machines into believing the mail is legitimate or not even transactional. They are trying to fool the machines, and they are also trying to fool the humans. They will make a mail that looks really convincing like, let’s say, an American Express transactional mail. Maybe it tells you your card was used in China. Then they will make subtle modifications to that mail to try to pass through the mail protection software. ” He continues: “ One of the things they will do is try to cloak all of the strings in the mail that are brand indicative. For example, they might take the A in American Express and replace it with a different or similar looking Unicode character, or they might modify American Express slightly. ” Inky must take on the task of applying approximate string matching across the entire text of the mail to indicate that, although similar, it’s not really American Express. They continue to do active work in this area to go from Generation 4 to Generation 5, particularly around accelerating the performance of those. Performing heavy weight computation on every single mail is costly: they are working on reducing costs by making these algorithms faster. Facebook needs to have a lot of labeled examples in order to train a face recognition model to have a billion face images. Similarly, Inky has a lot of example emails, but they are not labeled. One of their scientists, who did his PhD work in a semi-supervised learning environment with a very large “They are trying to fool the machines, and they are also trying to fool the humans!” Inky Application
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=