Chris Gates's primary research is on the application of machine learning and data mining to malware classification, risk estimation, active learning, automation, and other problems in security and scalable machine learning. Chris received his Ph.D. in Computer Science on the topic of quantifying and communicating risk using machine learning. He graduated from Purdue University in 2014 advised by Ninghui Li and was also part of the Center for Education and Research in Information Assurance and Security (CERIAS).
Since joining the company, Chris has authored several papers and patents, worked on pure research as well as with product teams. The projects that Chris has worked on have touched hundreds of millions of users to keep them, and their information, safer.
Selected Academic Papers
In Proceedings of the 2019 Conference on Human Factors in Computing Systems (CHI 2019)
To identify needs for improvement in security products, we study security concerns raised in Norton Security customer support chats. We found that many consumers face technical support scams and are susceptible to them. Findings also show the value of customer support centers in that 96% of customers that reach out for support in relation to scams have not paid the scammers
In Proceedings of the 27th USENIX Security Symposium (USENIX 2018)
In this paper, we collect seven datasets, including the largest corpus of code-signing certificates, and we combine them to analyze the revocation process from end to end. Effective revocations rely on three roles: (1) discovering the abusive certificates, (2) revoking the certificates effectively, and (3) disseminating the revocation information for clients. We assess the challenge for discovering compromised certificates and the subsequent revocation delays. We show that erroneously setting revocation dates causes signed malware to remain valid even afterthe certificate has been revoked. We also report failures in disseminating the revocations, leading clients to continue trusting the revoked certificates.
In Proceedings of the 23rd SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2017)
Mapping binary files into software packages enables malware detection and other tasks, but is challenging. By combining installation data with file metadata that we summarize into sketches, from millions of machines and billions of files, we can use efficient approximate clustering techniques to map files to applications automatically and reliably.
IEEE Transactions on Visualization and Computer Graphics (TVCG), 24(1), 2018, Presented at the 2017 IEEE Conference on Visual Analytics Science and Technology (VAST), 2017
We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of graph query results. VIGOR contributes an exemplar-based interaction technique and a feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world cybersecurity problems.
In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN 2019)
In this work, we define a collaborative and privacy-preserving machine teaching paradigm with multiple distributed teachers. The focus is to find strategies to organize distributed agents to jointly select a compact subset of data that can be used to train a global model. The global model should achieve nearly the same performance as if the central learner had access to all the data, but the central learner only has access to the selected subset, and each agent only has access to their own data. The goal of this research is to find good strategies to train global models while giving some control back to agents.
In Proceedings of the 7th ACM Conference on Data and Application Security and Privacy (CODASPY)
94% of the software files that Symantec saw in a 1-year dataset appeared only once on a single machine. We examine the primary reasons for which both benign and malicious software files appear as singletons, and design a classifier to distinguish between these two classes of singleton software files.
In Proceedings of the 33th Annual computer Security Applications Conference (ACSAC 2017)
We set out to predict which security events and incidents a security product would have detected had it been deployed, based on the events produced by other security products that were in place. We discovered that the problem is tractable, and that some security products are much harder to model than others, which makes them more valuable.