Yun Shen

Yun Shen

Yun Shen
Researcher

Dr. Yun Shen's current research interests focus on applying data-driven approaches to better understand malicious activity on the Internet. Through the collection and analysis of large-scale datasets, he developed novel and robust mitigation techniques to make the Internet a safer place. His research involves a mix of quantitative analysis, machine learning, and systems design.

Before joining NortonLifeLock in 2012 when it was called Symantec, he was a researcher in the HP Labs Bristol, working on privacy enhancing technologies and Cloud Computing infrastructure. Prior to this, he conducted research on intelligence analysis supported by government funding in the University of Bristol. He has authored a number of papers in international journals and conferences, and several US patents. Dr. Shen received his PhD in Computer Science from University of Hull (UK), where his research focused on indexing and retrieval of distributed XML data. He received his bachelors degree in Computer Science from Sichuan University (China).

Selected Academic Papers

pdf
ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks

In Proceedings of the 28th USENIX Security Symposium (USENIX 2019)
We present ATTACK2VEC, a system that uses temporal word embeddings to model how attack steps are exploited in the wild, and track how they evolve.

pdf
Accurate spear phishing campaign attribution and early detection

In Proceedings of the 23rd ACM Conference on Computer and Communications Security (ACM Sigsac 2016)
In this paper, we introduce four categories of email profiling features that capture various characteristics of spear phishing emails. Building on these features, we implement and evaluate an affinity graph-based semi-supervised learning model for campaign attribution and detection.

pdf
MR-TRIAGE: Scalable multi-criteria clustering for big data security intelligence applications

In Proceedings of the 2nd IEEE International Conference on Big Data 2014 (IEEE BigData 2014)
We introduce a new framework called MR-TRIAGE leveraging multi-criteria data clustering (MCDC) to perform scalable data clustering on large security data sets and further implement a set of efficient algorithms in a 3-stage MapReduce paradigm.

pdf
Study of collective user behaviour in Twitter: a fuzzy approach

Journal of Neural Computing and Applications, Volume 25, Issue 7–8, December 2014
We proposed a new approach which applies the mass assignment-based fuzzy association rules mining (MASS-FARM) algorithm to Twitter data analysis, for the first time, to automatically extract useful and meaningful knowledge from large-scale data set.

pdf
Insights into Rooted and Non-Rooted Android Mobile Devices with Behavior Analytics

In Proceedings of the 31st ACM/SIGAPP Symposium on Applied Computing (ACM SAC 2016)
We proposed the first quantitative analysis of mobile devices from the perspective of comparing rooted devices to non-rooted devices. We have attempted to map high level thoughts about the characteristics of users who root their devices to the low-level data at our disposal.

pdf
ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks

In Proceedings of the 28th USENIX Security Symposium (USENIX 2019)
We present ATTACK2VEC, a system that uses temporal word embeddings to model how attack steps are exploited in the wild, and track how they evolve.

pdf
Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection

In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016)
We propose to encode the weakly supervised information in PU learning tasks into pairwise constraints between training in-stances. Violation of pairwise constraints are measured and incorporated into a partially supervised graph embedding model.

pdf
Tiresias: Predicting Security Events Through Deep Learning

In Proceedings of the 25th ACM Conference on Computer and Communications Security (ACM CCS 2018)

pdf
Waves of Malice: A Longitudinal Measurement of the Malicious File Delivery Ecosystem on the Web

In Proceeding of the 14th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2019)
We present a longitudinal measurement of malicious file distribution on the Web.

pdf
IoT Security and Privacy Labels

In Proceedings of the ENISA Annual Privacy Forum (APF 2019)
We devise a concise, informative IoT labelling scheme to convey high-level security and privacy facts about an IoT device to the consumers so as to raise their security and privacy awareness.

pdf
Collaborative and Privacy-Preserving Machine Teaching via Consensus Optimization

In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN 2019)
In this work, we define a collaborative and privacy-preserving machine teaching paradigm with multiple distributed teachers. The focus is to find strategies to organize distributed agents to jointly select a compact subset of data that can be used to train a global model. The global model should achieve nearly the same performance as if the central learner had access to all the data, but the central learner only has access to the selected subset, and each agent only has access to their own data. The goal of this research is to find good strategies to train global models while giving some control back to agents.

pdf
Before Toasters Rise Up: A View Into the Emerging IoT Threat Landscape

In Proceedings of the 21st International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2018)

pdf
Multi-label Learning with Highly Incomplete Data via Collaborative Embedding

In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2018)
We proposed a weakly supervised multi-label learning approach, based on the idea of collaborative embedding. It provides a flexible framework to conduct efficient multi-label classification at both transductive and inductive mode by coupling the process of reconstructing missing features and weak label assignments in a joint optimisation framework.

pdf
Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph

In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 2016)
We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.

pdf
All your Root Checks are Belong to Us: The Sad State of Root Detection

In Proceedings of the 13th ACM International Symposium on Mobility Management and Wireless Access (MobiWac 2015)
We analyzed security focused applications as well as BYOD solutions that check for evidence that a device is “rooted”.

pdf
Marmite: Spreading Malicious File Reputation Through Download Graphs

In Proceedings of the 33rd Annual Computer Security Applications Conference (ACSAC 2017)
We presented Marmite, a system that can detect malicious files by leveraging a global download graph and label propagation with Bayesian confidence.

pdf
The Tangled Genealogy of IoT Malware

In Proceedings of the 36th Annual Computer Security Applications Conference (ACSAC 2020)

pdf
Understanding Worldwide Private Information Collection on Android

In Proceedings of the 2021 Network and Distributed System Security Symposium (NDSS 2021)

pdf
ANDRUSPEX: Leveraging Graph Representation Learning to Predict Harmful App Installations on Mobile Devices

In Proceedings of the 2021 IEEE European Symposium on Security and Privacy (EUROS&P 2021)

click to top

Back to Top