desaparecer no horizonte

Tudo ao meu redor sente o vento. As franjas da rede, todas as árvores da rua, as roupas no varal, as cortinas nas janelas. O capacho, o portão mal fechado do vizinho, o saco de lixo. Uma dança de…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Ditching worthless friends with Facebook data and JavaScript

Friendships are hard to maintain. So much energy is wasted maintaining friendships that might not actually provide any tangible returns. I find myself thinking “Sure I’ve known her since kindergarten, she introduced me to my wife, and let me crash at her place for 6 months when I was evicted, but is this really a worthwhile friendship?”.

I need to decide which friends to ditch. But what’s the criteria? Looks? Intelligence? Money?

Surely, the value of an individual is subjective. There’s no way to benchmark it empirically, right? WRONG. There is one surefire way to way to measure the worth of a friend: the amount of emoji reactions received on Facebook Messenger.

More laughing reactions means that’s the funny friend. The one with the most angry reactions is the controversial one. And so on. Simple!

Counting manually is out of the question; I need to automate this task.

Scraping the chats would be too slow. There’s an API, but I don’t know if it would work for this. It looks scary and the documentation has too many words! I eventually found a way to get the data I need:

The next day, I get an email notifying me that the archive is ready to download (all 8.6 GB of it) under the “Available Copies” tab. The zip file has the following structure:

The directory I am interested in is inbox. The [chats] directories have this structure:

These files can get pretty big, so don’t be surprised if your fancy IDE faints at the sight of it. The chat I want to analyze is about 5 years old, which resulted in over a million lines of JSON.

The JSON file is structured like this:

I want to focus on messages. Each message has this format:

And I found what I was looking for! All the reactions listed right there.

I see the file input field on my page, and the parsed JavaScript object is logged to the console when I select the JSON. It can take a few seconds due to the absurd length. Now I need to figure out how to read it.

The participants object from the original JSON already has a similar format. Just need to add that counts field:

Now I need to iterate the whole message list, and accumulate the reaction counts:

This is how the logged output looks like:

I’m getting four weird symbols instead of emojis. What gives?

I grab one message as an example, and it only has one reaction: the crying emoji ( 😢). Checking the JSON file, this is what I find:

"reaction": "\u00f0\u009f\u0098\u00a2"

How does this character train relate to the crying emoji?

It may not look like it, but this string is four characters long:

So what are they?

That’s pretty close! Turns out this is a UTF-8 encoding, in hex format. But for some reason, each byte is written as a Unicode character in UTF-16 format.

Knowing this, how do I go from \u00f0\u009f\u0098\u00a2 to \uD83D\uDE22?

I extract each character as a byte, and then merge the bytes back together as a UTF-8 string:

So now I have what I need to properly render the results:

I want to calculate a score based on the count of each type of reaction. I need some variables:

And for the received reactions, I made some categories:

The final equation is:

In JavaScript it would go something like this:

Displaying the information in table form makes it easier to parse:

Note: Due to privacy concerns I replaced my friend’s real names with their home addresses.

With a quick look at the table I can finally decide who I need to remove from my life.

Farewell, cousin Sam.

Add a comment

Related posts:

Feelings are like large waves

You walk into a wave, bracing yourself through it. You paddle over another one. You duck dive under the next. You feel it slightly at first. The sudden angling of your board, the sense of being…

Letture migliori del periodo

Ecco una lista delle mie letture in gravidanza, potremmo considerarla una mini bibliografia selezionata. Ho letto tanto, a me personalmente aiutava molto, mi abbassava l’ansia e apriva la mente, ma…

Two Types of Synthetic Fraud

While industry and government groups continue to work to define synthetic fraud, one thing is clear — there are two types of synthetic fraud. SentiLink refers to these two types as First Party…