6 min read

Clean Gmail quickly and painlessly

In this article I will describe the process of analyzing my mailbox which includes almost 3000 unread messages.
Clean Gmail quickly and painlessly

Since you and I have a couple of hours a day off, which we usually spend on the road, it's worth trying to make something out of a long box. Like sorting out the contents of your mail and cleaning it up. I will tell you how I analyzed my inbox, found the most active "writers" and unsubscribed from 50 mailings within ten minutes. It goes without saying that I will give all the tools and you will be able to repeat my experience.


In the spring of 2017, I once again got new mail and of course promised to keep it clean and order. The idyll didn't last long - in January 2020, only the Inbox handled 2500 emails.

Towards the end of 2019, I had to check my mail a couple of times per day - it's not a pleasant thing to do, given the mess that was going on there. But there was no choice, otherwise an important letter just got lost in a bunch of mailings.

Researching the problem

We'll always be able to delete all the letters in the forehead. But, first of all, it's dangerous - what if there's something important? Secondly, there's no way it'll protect us from a bucnh of letters in the future. Since we are in this situation, let's take advantage of it and try to find some insides in the data.

Getting an e-mail archive.

So, we need to download the mail in some convenient format. Gmail has a great API, but after evaluating the size of my mailbox as a couple of hundred megabytes, I decided to go a different way.

Spoiler: The decision was correct, as a result the data took almost 700 megabytes!

Google has a very interesting service that few people know about - Google Takeout. With its help, you can request your personal data from Google. Files from Google Drive, photos from Google Photos, notes from Google Keep and much more. At your request, Google will collect the data and give it to you as a normal archive. Now we are only interested in mail:

http://takeout.google.com/

Export took a few minutes and I received the file with all mail in 3 years.

Spoiler for developers: For data analysis I used the usual data-science stack - Python 3.7, Jupyter, Pandas, Matplotlib and standard mbox library. All the code can be found in Github repository - https://github.com/pavlovdog/my-gmail-research.

Studying the data.

After downloading and pre-processing the data, we can see what it looks like. I limited myself to what I think are the most interesting parts of the letters:

  • the sender (From)
  • email subject
  • reply address
  • mail categories (X-Gmail-Labels)

Monthly dynamics

Let's get the big picture and start with the most basic indicators - how many messages I have received each month for three years. Let's see the next picture:

I have a couple of questions in my head right away:

  • Why from June 2018 to January 2019 dynamics of incoming messages decreased so sharply? It's simple - around that time I first tried to clean up my mail and unsubscribed from unnecessary mailings. As you can see, it worked, but not for long.
  • Why did the number of incoming mails from January 2019 to June 2019 increase sharply? I will tell the reasons below, after a more thorough analysis.

How does Gmail categorize emails?

At the time, I literally had only a couple of personal labels, so the picture below is a good reflection of how Google categorizes emails. You'll probably see a similar picture in your mailbox:

We can see that Google is pretty good at separating Promotions - at least 50% of all incoming emails and no important emails have gone there.

The most active senders

Let's pay attention to those who wrote me the most in three years.

  • Github. It's simple - I'm a developer and Github is my second home. Mailings with issues, security alerts, user activity data. All of this is very useful, but I don't even open most of the messages, which means they should be thrown out of the inbox.
  • Crunchbase. I definitely do not pay so much attention to this resource that it takes the honorable second place.
  • Producthunt. Same thing with Crunchbase.
  • Airbnb. I use it a lot, but that's no reason to write me an entire epistolary novel.

The activity of the senders by months

For more interest, let's take a look at senders by month - this will help to determine where the sharp increase in the number of letters in May 2019 came from.

We see that Github unexpectedly sent me 25 emails in January 2019, although it has never been on the list of active writers before. In the same month, I discovered ProductHunt, CrunchBase, BetaPage, and it went on. These three services alone send me about 60 emails per month!

In May 2019, I used the Airbnb service for the first time and another strong player with 20 emails per month appeared in my inbox. By the end of 2019, I had literally killed my mail with the mails from the corporate Gitlab and Sentry.

Let's get things in order.

The next, rather obvious plan is emerging:

  • Move the entire Inbox folder in the archive
  • Unsubscribe from all the mailing lists - anyway I don't read them.
  • Probably won't be able to filter all the letters at one time. This means that you need a couple of weeks to regularly view the box and work with individual senders.
  • The most important thing is to be consistent and not to start using your mail like a trash bin again.

Unsubscribing from the mail list

To unsubscribe from the mailing lists, you can just go through your inbox and in each letter look for a link with the "Unsubscribe" text. In case there is no such link, you can always block the sender and he will never bother your mail.

There is another, in theory, faster way. You can use services that automatically performs the steps above. Here is a small list that I have noticed:

Unfortunately, I can't test all three services on one mailbox - so I decided to try unroll.me. Let me make it clear that this service does not unsubscribe in the literal sense of the word - instead, it will remove from incoming mails those senders who you have marked as Unsubscribe.

The service is convenient to use, but I did not see ProductHunt and Airbnb among the shooting list. I manually unsubscribed from the first one, but Airbnb does not have a Unsubscribe link in its emails. I had to open the application and manually change the notification settings - definitely a minus in the company's karma.

Spoiler: In the end, I still ran through the incoming mailing lists in search of sender and at the same time unsubscribed from everyone who had marked unroll.me.

Result


Thank you so much for reading the article :) Don't be sore and remember that behind every big crisis there is a big bounce. If you liked the article, feel free to subscribe to my Telegram blog.

See ya 😊