Yun Shen

Dr. Yun Shen's current research interests focus on applying data-driven approaches to better understand malicious activity on the Internet. Through the collection and analysis of large-scale datasets, he developed novel and robust mitigation techniques to make the Internet a safer place. His research involves a mix of quantitative analysis, machine learning, and systems design.

Before joining NortonLifeLock in 2012 when it was called Symantec, he was a researcher in the HP Labs Bristol, working on privacy enhancing technologies and Cloud Computing infrastructure. Prior to this, he conducted research on intelligence analysis supported by government funding in the University of Bristol. He has authored a number of papers in international journals and conferences, and several US patents. Dr. Shen received his PhD in Computer Science from University of Hull (UK), where his research focused on indexing and retrieval of distributed XML data. He received his bachelors degree in Computer Science from Sichuan University (China).

Selected Academic Papers

  • ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks
    Yun Shen, Gianluca Stringhini
    To appear at the 28th USENIX Security Symposium (USENIX 2019)

    We present ATTACK2VEC, a system that uses temporal word embeddings to model how attack steps are exploited in the wild, and track how they evolve.

  • Collaborative and Privacy-Preserving Machine Teaching via Consensus Optimization
    Yufei Han, Yuzhe Ma, Christopher Gates, Kevin A. Roundy and Yun Shen
    To appear at the 2019 International Joint Conference on Neural Networks (IJCNN)

    In this work, we define a collaborative and privacy-preserving machine teaching paradigm with multiple distributed teachers. The focus is to find strategies to organize distributed agents to jointly select a compact subset of data that can be used to train a global model. The global model should achieve nearly the same performance as if the central learner had access to all the data, but the central learner only has access to the selected subset, and each agent only has access to their own data. The goal of this research is to find good strategies to train global models while giving some control back to agents.

  • IoT Security and Privacy Labels
    Yun Shen, Pierre-Antoine Vervier
    In Proceedings of the ENISA Annual Privacy Forum (APF 2019)

    We devise a concise, informative IoT labelling scheme to convey high-level security and privacy facts about an IoT device to the consumers so as to raise their security and privacy awareness.

  • Waves of Malice: A Longitudinal Measurement of the Malicious File Delivery Ecosystem on the Web
    Colin C. Ife, Yun Shen, Steven J. Murdoch, Gianluca Stringhini
    To appear at the 14th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2019)

    We present a longitudinal measurement of malicious file distribution on the Web.

  • Tiresias: Predicting Security Events Through Deep Learning
    Yun Shen, Enrico Mariconti, Pierre-Antoine Vervier, and Gianluca Stringhini
    In Proceedings of the 25th ACM Conference on Computer and Communications Security (ACM CCS 2018)

  • Before Toasters Rise Up: A View Into the Emerging IoT Threat Landscape
    Pierre-Antoine Vervier, Yun Shen
    In Proceedings of the 21st International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2018)

  • Multi-label Learning with Highly Incomplete Data via Collaborative Embedding
    Yufei Han, Guolei Sun, Yun Shen, Xiangliang Zhang
    In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2018)

    We proposed a weakly supervised multi-label learning approach, based on the idea of collaborative embedding. It provides a flexible framework to conduct efficient multi-label classification at both transductive and inductive mode by coupling the process of reconstructing missing features and weak label assignments in a joint optimisation framework.

  • Marmite: Spreading Malicious File Reputation Through Download Graphs
    Gianluca Stringhini, Yun Shen, Yufei Han, Xiangliang Zhang
    In Proceedings of the 33rd Annual Computer Security Applications Conference (ACSAC 2017)

    We presented Marmite, a system that can detect malicious files by leveraging a global download graph and label propagation with Bayesian confidence.

  • Accurate spear phishing campaign attribution and early detection
    Yufei Han, Yun Shen
    In Proceedings of the 23rd ACM Conference on Computer and Communications Security (ACM Sigsac 2016)

    In this paper, we introduce four categories of email profiling features that capture various characteristics of spear phishing emails. Building on these features, we implement and evaluate an affinity graph-based semi-supervised learning model for campaign attribution and detection.

  • Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph
    Ibrahim Alabdulmohsin, Yufei Han, Yun Shen, Xiangliang Zhang
    In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 2016)

    We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.

  • Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection
    Yufei Han, Yun Shen
    In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016)

    We propose to encode the weakly supervised information in PU learning tasks into pairwise constraints between training in-stances. Violation of pairwise constraints are measured and incorporated into a partially supervised graph embedding model.

  • Insights into rooted and non-rooted Android mobile devices with behavior analytics
    Yun Shen, Nathan Evans, Azzedine Benameur
    In Proceedings of the 31st ACM/SIGAPP Symposium on Applied Computing (ACM SAC 2016)

    We proposed the first quantitative analysis of mobile devices from the perspective of comparing rooted devices to non-rooted devices. We have attempted to map high level thoughts about the characteristics of users who root their devices to the low-level data at our disposal.

  • All your Root Checks are Belong to Us: The Sad State of Root Detection
    Nathan Evans, Azzedine Benameur, Yun Shen
    In Proceedings of the 13th ACM International Symposium on Mobility Management and Wireless Access (MobiWac 2015)

    We analyzed security focused applications as well as BYOD solutions that check for evidence that a device is “rooted”.