Posted: 4 Min ReadResearch Group

How much private information was gathered from my phone?

Understanding Worldwide Private Information Collection on Android

Data has become the commodity that sustains much of the digital ecosystem. As smart devices, especially smartphones, become more central in our daily life, mobiles phones are turned into reliable sources of rich information about us (e.g., where you go, what activities you do, etc.).

Most mobile apps request access to some sort of information about you and obtain certain permissions on the device you are using. In most cases, information is shared and device permissions are enabled with your explicit consent.  Once the consent is given, however, it is impractical for users to recall which app collects what information, not to mention tracing the location the information is transmitted to and the actors who may further process, use, and control the data collected. 

Therefore, it remains very challenging to obtain a comprehensive view of the information collected by those mobile apps. For instance, it is natural to ask questions like how many apps on my smartphone collect private information, what kind of private information these apps collect, which company processes and stores my private information, etc. In a study we conducted together with researchers from Boston University, we investigate 22 categories of information that may affect the user’s privacy. We list them in Table 1. Our goal is to address the above questions and understand worldwide private information collection on Android phones by analyzing the flows of information (i.e., which app collects what information to which domain) generated by 2.1M unique apps installed by 17.3M users over 21 months between 2018 and 2019.

Table 1. The 22 categories of information collected by mobile apps that may affect the user’s privacy.
Table 1. The 22 categories of information collected by mobile apps that may affect the user’s privacy.

Is Private Information Collection Pervasive in Mobile Apps?

It is now a common practice that the apps installed on your smartphone request information about you and the device (e.g., your name, your email address, your location information) before you can use them. We try to understand if private information collection is pervasive in mobile apps.

By analyzing our dataset, we discover that, on average, a mobile app sends private information to 2 unique domains. We also observe that over 57.6K apps (installed on 12.8M devices collectively) collect at least 5 unique categories of private information and send them to at least 5 unique domains. Our findings confirm that private information collection in mobile apps is universal and diversified at the same time, highlighting the need for additional security and privacy layer on the device. 

Figure 1. Top 25 data controllers ranked by the fraction of devices they collect private information from. These 25 data controllers collect private information from a total of 13.9M devices covering 80.2% of all devices used in this study.
Figure 1. Top 25 data controllers ranked by the fraction of devices they collect private information from. These 25 data controllers collect private information from a total of 13.9M devices covering 80.2% of all devices used in this study.
Figure 2. Heatmap illustration of top 12 types of private information collected by global top 20 domains. Each row is normalized to [0, 1] by a PIC domain’s total device penetration rate. The darker the red implies that the more devices that a PIC domain
Figure 2. Heatmap illustration of top 12 types of private information collected by global top 20 domains. Each row is normalized to [0, 1] by a PIC domain’s total device penetration rate. The darker the red implies that the more devices that a PIC domain

Who collects and processes private information?

We further analyze who ultimately obtains and processes the information collected by the mobile apps. We leverage our patented technology to uncover the ownership of the domains to which the private information was transmitted. These domains were then ranked by the fraction of devices they collect private information from. Figure 1 depicts the top 25 data processors and controllers. These data processors and controllers accumulate private information from 13.9M devices. Notably, 2 out of 3 devices would have their information collected by either Facebook or Alphabet. Figure 2 depicts the top 12 types of private information collected by the global top 20 domains. We observe that the companies behind these domains consistently collect four types of private information from the users - device, sim card, location, and settings information. Such information enables them to track the users more systematically.

Figure 3. Sankey diagrams illustrating private information flows between EU27 and top 20 domain locations.
Figure 3. Sankey diagrams illustrating private information flows between EU27 and top 20 domain locations.

GDPR and its impact on private information flow

The European Union’s (EU) General Data Protection Regulation (GDPR) entered into effect on May 25th, 2018.  The implementation of GDPR did not substantially change the flow of personal data originating from EU countries to countries outside the EU, see Figure 3. Our observations of these data flows show that confinement within the EU is low. Germany and Ireland are the only two European countries that host a reasonable portion of private information originated from Europe while the United States dominates the private information collection in the EU.

Why do you see intrusive ads?

Potentially harmful applications (PHAs) are apps that could put users, user data, or devices at risk (e.g., trojan, spyware, etc.). We identify 1.2M PHAs were installed on 3.8M devices. We uncover that 116K PHAs (installed on 393K devices) collect operator information and 63K PHAs (installed on 280K devices) also collect running app information on a global scale. As we can see in Figure 4, such aggressive private information collection behavior enables adversaries to better profile the users and may lead to some intrusive monetization actions. For example, we also uncover that 590K devices with PHAs presence are affected by notification bar ads (i.e., ads are displayed as app notifications) and 317k devices suffer from short-cut ads (i.e., targeted ads are placed on the home screen). 

Figure 4. Heatmap illustration of private information collection by PHAs in different regions.
Figure 4. Heatmap illustration of private information collection by PHAs in different regions.

Implications to the research community and the policymakers

Our findings highlight a number of challenges faced by the research community when studying private information collection on Android. We show that looking at device penetration is critical to observe the distribution of information collection actors in the wild. we also hope that our study will encourage policymakers to think critically about  how private information is used by and shared among the companies and how accountability and customer choice can be truly guaranteed.

Implications to the consumers

Protecting your private information can help reduce your risk of identity theft. We have the following recommendations for  users who want to take more control over their privacy on their mobile devices. 

Read Privacy Policies

The privacy policy can be long and complex. However, it tells you how an app maintains access, safety, and control of the personal information it collects; how it uses the information, and whether it provides information to third parties. 

Turn off ad personalization

Every Android device has a unique Advertising ID. It allows app developers and the Google ad network to recognize your device among the billions of Android devices and then target you with ads. You can opt out of ad personalization such that your unique Advertising ID won't be used to help the services to target you.

Raise privacy awareness

There are steps you can take to protect your personal information while you use your mobile to facilitate your daily tasks. It is important to raise your own privacy awareness and make informed choices to minimize unwanted information disclosure. Latest tips on how to raise your privacy awareness can found here to equip yourself with best practices to take control of what you share online.

Innovations from Norton Labs are for research, evaluation, and consumer feedback purposes. NortonLifeLock does not give any warranties as to the suitability or usability of these prototypes and recommends safeguarding data and reviewing all terms and conditions before use.

Copyright © 2021 NortonLifeLock Inc. All rights reserved. NortonLifeLock, the NortonLifeLock Logo, the Checkmark Logo, Norton, LifeLock, and the LockMan Logo are trademarks or registered trademarks of NortonLifeLock Inc. or its affiliates in the United States and other countries.

 

About the Author

Yun Shen

Senior Principal Research Engineer

Current research interests focus on applying data-driven approaches to better understand malicious activity on the Internet. Yun Shen's research involves a mix of quantitative analysis, machine learning, and systems design. He has been part of SRL since 2012.

Want to comment on this post?

We encourage you to share your thoughts on your favorite social platform.