The Facebook machine and the power of its algoritms

Timelines of Facebook didn’t show information about the Ferguson shooting and riots in August 2014, analysts remarked. Time to explain Facebooks algorithms, its filter bubble, the reality of Mark Zuckerberg and why he should share algorithm transparancy and influence with us, his users.

*) This article is based on a chapter of the book The Power of Facebook)

First, Mark Zuckerberg once said: “A squirrel dying in your front yard may be more relevant to your interests right now than people dying in Africa.” He was right, I guess, for most people on Facebook.

In Net Neutrality, Algorithmic Filtering and Ferguson sociologist Zeynep Tufekci (photo) explained the influence of social media algorithms on the disseminations of - in this case - originally just local news:

“And then I switched to non net-neutral internet to see what was up. I mostly have a similar a composition of friends on Facebook as I do on Twitter…Nada, zip, nada. No Ferguson on Facebook last night. I scrolled. Refreshed.”

Canadian journalism teacher Mark Hamilton @gmarkham already remarked the same: “Twitter vs. Facebook: my tweetstream is almost wall-to-wall with news from Ferguson. Only two mentions of it in my Facebook news feed.”

Tufekci: “This isn’t about Facebook per se—maybe it will do a good job, maybe not—but the fact that algorithmic filtering, as a layer, controls what you see on the Internet. Net neutrality (or lack thereof) will be yet another layer determining this. This will come on top of existing inequalities in attention, coverage and control.

Does Facebook deliberately influence its selections. No, it’s the ‘perfect machine’. I experienced this with a survey I published What is your Facebook value. When this was published on several news web sites suddenly 15,000 people used the gadget. Their results should go viral on Facebook. But they did hardly appear in the Timelines of their Facebook friends. Some filter probably blocked it.

How stupid could I be to try to do a critical viral campaign about Facebook on the platform itself? When I switched to Twitter to let people distribute their personal results, it was too late.

The filter bubble and Mark Zuckerberg

Back tot the ‘discoveries’ of Zeynep Tufekci and Mark Hamilton. Since Facebook is a private company, with the ultimate goal to maximize value for its shareholders, one shouldn’t be too much surprised about the way Facebook is selecting and presenting our social image of the world. Like commercial television is not gaining the best revenue by showing much of ‘Ferguson’ neither Facebook does. And by the way, most of it’s users rather know which new dress their friend bought than some racial problem in a town of Missouri. Should we really blame Facebook?

And finally we still have good old tv and newspapers; that’s why professional journalists shouldn’t lean much on social media. Here’s how I described The Filter Bubble of Eli Pariser, who is critical about ‘tunneling’ news within social media groups versus the right of Mark Zuckerberg to create his own views and algorithms on social news.

‘Safe’ in our cocoons

A peculiar type of severe criticism deals with social media contributing to prolific information sharing within increasingly smaller circles. This results in social compartmentalization. In 2000, the warning was given prominence in by professor and prolific writer Cass Sunstein.

He described how online users themselves would avert otherwise-minded and unwelcome information and opinions, as a result of the selection of a limited number of sources.

A kind of a follow-up book was written in 2011 by Eli Pariser, ‘The Filter Bubble - What the Internet Is Hiding from You.’ On Facebook, Pariser gives the example that like-minded Friends and groups provide you with information and opinions that you are likely to appreciate This leads to a continuous spiral of consensus and conflict-avoidance.

Facebook’s selection for News Feed is based on previously shown preferences, with the EdgeRank as the malefactor of filtering: “EdgeRank demonstrates the paradox in the race for relevancy. To provide relevance, personalization algorithms need data. But the more data there is, the more sophisticated the filters must become to organize it. It’s a never-ending cycle.”

The 'You-loop,’ as Pariser aptly calls this tendency, makes our world literally and figuratively smaller. It deprives us more and more of encounters with unexpected sites, opinions and people. The online life becomes predictable indeed.

Pariser’s appreciation of media cocooning is fascinating, but tendentious. Pew Research showed that Facebookers and other internet users certainly orient themselves towards online media as broadly as ‘offliners.’

A villainous question: who is interested in Pariser’s book? Perhaps those who believe that it fuels their own opinion and interests? Is the buying behavior of this book thus also the result of a ‘filter bubble’?

Pariser readily points to the following statement by Mark Zuckerberg: “A squirrel dying in your front yard may be more relevant to your interests right now than people dying in Africa.”

The author rebukes this tunnel vision and disinterest. But what if Pariser is sitting watching the news one morning and his little boy shouts: “Daddy, a squirrel in the garden!”

Daddy Pariser then rushes to the window. His son asks: “Don’t you have to watch the news?” Pariser replies: “No, it’s only about Africa, after all.”

We have been filtering for a long time already. I see this narrowing of information not as a problem, but as a logical reaction. With the advent of the internet, the number of available sources has increased enormously, as well as the speed of news.

The Facebook algorithms

The Facebook selection method for it’s Timeline (before: News Feed) was called EdgeRank. This name had been scrapped since then, while EdgeRank and the method behind it was outdated. It has become much more complicated and gets continious updates, just like the search algorithms of Google. My observations and explanation back in 2012:

It’s hard to assume of course, but the daily marketing possibilities of Facebook, based on complex algorithms, are endless. Your communication with Friends provides a steady stream of information for the marketing identity machine Facebook.

This machine behind the screen constantly evaluates your expressions and behavior en masse using numerous psychological, sociological, anthropological, and who knows what other variables from different sciences.

There is only one company that owns all this information about your preferences in relation to your friends combined with your real names and pictures; furthermore, this information is held alongside the equivalent information and preferences of another billion people. Just think, one billion people with a Facebook passport.

Only China and India have more identities registered in one governed space. Here arises the first virtual empire, unless Facebook keeps on making costly mistakes….

The  algorithmformerly known as EdgeRank

Every day, more than 1.5 billion requests for attention (updates, comments, photos) are flooding over Facebook’s pages. Just within the Like economy, 2.5 billion likes assemble and fade away each day in the Facebook universe.

Needless to say, not every expression catches attention, and not every Friend’s posts, or organization’s updates that you are subscribed to, can all make it into your News Feed (otherwise it would be an almost unreadable rapid stream of data).

So Facebook needs a mechanism to determine which posts should be sent to particular Friends’ News Feeds, which are more important or relevant to this Friend rather than that other Friend or Fan, and so on.

The algorithm programmed to determine this is called EdgeRank. The EdgeRank algorithm measures the contents of each post against three main criteria (time, weight and affinity), giving a total a score on these aspects for each post, which then determines how prominently, and to who, the post will be displayed to, from a Page's overall Friend or Fan base.

Every status update, Like, photo, video, score within Mafia Wars, and song listened to on Spotify gets an 'EdgeRank': a ranking of attractiveness in a specific relationship to users’ preferences, connections and behaviors.

For example: a recent photo published by Mary has a higher rank than an older one. It also gets a higher rank for John than for Suzy, because he is much more interested in clicking on her activity, often, while Suzy is seldom interested in Mary’s posts, no matter what they are. EdgeRank enables the Facebook machine to determine what appears in everyone's News Feeds, in which order and for how long.

In this, the way EdgeRank works is very much like the PageRank from Google which determines the ranking of search results. Since Google’s results have been personalized, it has come to look more and more like Facebook’s EdgeRank.

The required secrecy of the actual formula is also similar. Every marketer wants to understand exactly how such social algorithms work, in order to produce pages and posts which rank as high as possible in Google’s search results and users’ News Feeds. Free publicity saves advertising expenses.

But the exact codes are kept under wraps by Facebook and Google; otherwise advertisers would exploit and de-legitimize the information aspect of searches, however personalized.

Finally, EdgeRank decides that about 90 percent of Facebook users will never see free posts from companies they ‘liked’ in their News Feed. That’s not a bad thing for Facebook, since advertisers have to pay for more attention instead of getting free publicity.

Algorithm elements

Facebook scored each post according to the combination of three elements: Affinity, Weight and Freshness.

Affinity is a score based on the proximity of relationships. You might think Jenny is your closest friend, but Facebook’s machine knows from your actual behavior that you're much more curious about Jimmy and Sarah’s expressions. Facebook adds up your actions: clicking, liking, commenting, time spent with (or better still: on) every Friend and every kind of his expressions.

This aspect has a self-fulfilling prophecy: the more often someone appears in your Feed, the more likely affinity will grow. Another tricky aspect is that affinity is programmed in to be one-way. You might love or like another person (much) more than she or he loves you. Facebook is aware of this kind of discrepancy and even encourages it at the technical level.

‘Weight’ ranks the importance of several kinds of expressions per user, including the percentage of your clicks on photos, text, links and comments. Facebook changes the edge weights to reflect which type of stories they judge a specific user will find most engaging. Freshness (or, ‘time decay’): more recent postings will feature higher in the News Feed than older ones. The experience of this element differs according to the individual: a Facebook user checking his page every half an hour will experience a different rhythm to their News Feed updates to a peer who logs in only twice a week.

Facebook’s top software engineer Phil Zigoris told in July 2012 that Facebook no longer refers to its algorithm as EdgeRank: “EdgeRank is a term that has been used in the past to describe how we optimize the content of news feeds based on what is most interesting to you. We don’t have a product or system called EdgeRank.” Not the name, but indeed the algorithm system.

The power of artificial intelligence

The longer you use Facebook, the better Facebook will get to ‘know’ you. I’m sure we are just in the beginning with analyzing preferences and online social media behavior, resulting in continuously richer marketing profiles and communication rankings like EdgeRank.

With one billion users and growing, Facebook can, for example, very easily compare and apply your the behavior and setting with those of other users with similar templates. If your behavior corresponds 99 percent of the time with some user in Alaska then, the remaining one percent will be deduced by the platform's algorithms, drawn from post, Page and Friend recommendations, and tailored advertising. Facebook will be able to analyze innumerable cross-references.

In the future, this kind of machine led behavior monitoring -- analysis based on ‘big data’ stores -- could easily be extended from surfing, searching and chat contents to include behavioral characteristics like eye tracking, finger tapping and mouse patterns for example.

Digitalized or machine led behavior monitoring is still so young, and a promising field for marketers and police forces. Most of us have not the slightest idea what is happening in this field. One of the already not so futuristic fields of personal data is DNA information. While this conjures dark scenarios, DNA analysis could also be advantageous for worldwide health service provision.

What can Facebook know about us already, or in the near future? The profile categories and digitalized behaviors through which it collects data are already significant.

The Basics from profile information: * Age, country, region, city; *Family background, family names; *Education; * Religion; * Occupation; * Hobbies; * Books, papers, magazines read; * Music preferences; * TV program preferences; * Political opinions; * Online spending; * Brand and product interests; * Languages you understand; * Sexual preferences; * Travel and holidays;

Aggregated social data: * Your status; * Knowledge and skills. * Home and car class; * Estimated income and wealth; * Social position and that of friends; * Living and family situation; * Use of PC, gadgets, software etc.; * When you are at home; * Places where you are out; * What times you are online and with what intentions; * Good and bad days; * When you’re horny.

Also to consider: * Pace of life; * Diseases and disorders; * Concentration cycle throughout the day; * Whether you are persistent or give up quickly; * Work behavior and effort; * Private browsing during work hours; * Preferences between text, photo and video; * Proceedings of contact; * Intensity of contact; * Attention to different relationships / friends; * Frequency and likelihood of new relationships; * Approach to individuals and businesses; * Choice of words and attitude; * Secret desires and fantasies; * Creativity; * Logical thinking; * Non-conscious and irrational behaviors; * Rational choices and weight given to these; * Emotions in experiences and exposures; * Behavior in different emotions; * Degree of happiness.

The final three are the most fascinating. Of course you will deviate from your patterns, sometimes only slightly and other times completely. But the potential for Facebook to ‘know’ its people extends beyond ordinary imagination, and because of its basis in algorithms, can happen in real-time.

The existing social research I mention in other chapters, which show how Facebook data including photographs might indicate degrees of future happiness, and other social and psychological behaviors, is just the beginning. This is an overwhelming archive of potential knowledge.

You are what you like

Important research published in PNAS confirmed not only my suggestions given above about the data profiling  possibilities of Facebook, but also showed how public Facebook Likes could already expose intimate details and personality traits of individuals.

Research of Michal Kosinski, David Stillwella, and Thore Graepel of the Psychometrics Centre of the University of Cambridge in the UK, together with Microsoft Research, showed how intimate personal attributes can be predicted with high levels of accuracy from ‘traces’ left by seemingly innocuous digital behavior, as with Facebook Likes.

After automated analysis of only Facebook Likes of nearly 60,000 Facebook users in the US, who volunteered via an app, aspects like race, age, IQ, sexuality, personality, substance use and political views could be inferred from. This is information currently publicly available by default. Not to say what Facebook is able to do with the pile of closed information.

Models proved 95 percent accurate distinguishing African-American from Caucasian American and 85 percent accurate differentiating Republican from Democrat. Christians and Muslims were correctly classified in 82 percent of cases. Right prediction accuracy was achieved for relationship status and substance abuse between 65 and 73 percent.

The model’s accuracy was lowest (60 percent) when inferring whether users’ parents stayed together or separated before users were 21 years old. Sexual orientation was easier to distinguish among males (88 percent) than females (75 percent).

Few users clicked Likes explicitly revealing these attributes. For example, less that 5 percent of gay users clicked obvious Likes such as Gay Marriage. Accurate predictions relied on ‘inference’ - aggregating huge amounts of less informative but more popular Likes such as music and TV shows to produce incisive personal profiles.

The researchers also tested for personality traits including intelligence, emotional stability, openness and extraversion. While such latent traits are far more difficult to gauge, the accuracy of the analysis was striking. Study of the openness trait – the spectrum of those who dislike change to those who welcome it – revealed that observation of Likes alone is roughly as informative as using an individual’s actual personality test score.

Some Likes had a strong but seemingly incongruous or random link with a personal attribute, such as Curly Fries with high IQ; That Spider is More Scared Than U Are with non-smokers; Motown and insomnia; Nepal and loving cats;

Liking Beyoncé and favoring your left leg

When taken as a whole, researchers believe that the varying estimations of personal attributes and personality traits gleaned from Facebook Like analysis alone can form surprisingly accurate personal portraits of potentially millions of users worldwide.

They say the results suggest a possible revolution in psychological assessment which – based on this research – could be carried out on an unprecedented scale without costly assessment centers and questionnaires.

“We believe that our results, while based on Facebook Likes, apply to a wider range of online behaviors.” said Michal Kosinski, Operations Director at the Psychometric Centre.

The study raised important questions about personalized marketing and online privacy, as I do in this book. They argue that many online consumers might feel such levels of digital exposure exceed acceptable limits - as corporations, governments, and even individuals could use predictive software to accurately infer highly sensitive information from Facebook Likes and other digital ‘traces’.

Kosinski: “I am a great fan and active user of new amazing technologies, including Facebook. I appreciate automated book recommendations, or Facebook selecting the most relevant stories for my newsfeed,” said Kosinski. “However, I can imagine situations in which the same data and technology is used to predict political views or sexual orientation, posing threats to freedom or even life.”

The researchers could use 170 Likes per person on average. Demonstration of personality prediction based on individuals’ Likes is available at

The largest market researcher

This book is concerned especially with shedding light on Facebook from the perspective of the individual. Facebook is an identity machine, yet it also possesses a gigantic collective knowledge. The opportunity to analyze its mountain of data alone is reason enough for me, as a technology and culture journalist, to fantasize about working at Facebook.

Facebook can draw an exact sketch of variations in behavior, economy, interests etc. from every district and street in Shanghai, London, Chicago or San Paolo to Barcelona supporters or Yale students. It could even fathom national characters of usage, should such exist, of France, Vietnam or Venezuela.

Secretly, Google and Facebook have the greatest powers of access to do behavioral science, far more mighty than universities. We generate an unimaginable amount of information for these two parties, on numerous levels and in numerous areas.

Google presently has the edge in this field, given its far more extensive databank, filled with our searches and clicks on results. Google let us share this enjoyment with Google Zeitgeist (now Trends). You can also see the most searched-for terms per country and per month or year. You can find out a lot with it, especially if you supplement it with the information that Google makes available to advertisers.

In this field Facebook has even more opportunities, because of the registrations it has of real names, exact geographic information, personal preferences, social interactions, and other registrations of nearly every kind of important conscious and unconscious act that the individual user makes there.

This includes for behavior that seems solitary (browsing, purchasing), as well as for acts conducted as part of groups and spontaneous collectives (in responses to shared posts for example). Facebook could build all sorts of indicators for collective behavior to make economic and therefore also political predictions.

Scientists are already tracking contagious diseases, by analyzing worldwide messages spread socially about diseases on Facebook and Twitter. You can fantasize about the possibilities from almost every angle. Google and Facebook can already offer statistics on consumer trust, which is now measured at monthly intervals by national statistics offices using expensive methods.

Through data collection, Facebook and Google are the most omniscient market research bureaus in the world at present. Of course this will only remain so if we remain faithful to it, searching and commenting in the relevant boxes, like good boys and girls.

Third parties also try to exploit this attachment to the machine. Thus an investment fund from the London-based Derwent Capital Markets operates on the basis of statements in social media, taken from a method used by the Belgian academic Johan Bollen, who developed ‘sentiment analysis’. In this way, the profits of new film and book titles, as well as election results, are being predicted at ever greater frequency.

Public and commercial social research

British journalist and graphic designer David McCandless compiled a chart from 10,000 Facebook status updates with the phrases “breakup” and “broken up” to show what time of year people are most splitting up. He found big spikes after Valentine’s Day, April Fool’s Day, the weeks leading up to spring break, and just before Christmas. This might be interesting information for some advertisers. On the other hand, this is far from accurate: the decision to break up comes after a certain duration also. There are much more interesting patterns to show.

MIT Techreview’s journalist Tom Simonite wrote about the activity of Facebook’s social scientists like Cameron Marlow and his Data Science Team with more than twenty researchers applying math, programming skills, econometrics, and social sciences to mine the vast amount of data for insights advancing advertising and strategic objectives. Within the first five months of connecting Spotify to Facebook, more than five billion instances of people listening to songs online were cataloged, revealing much about the preferences and feelings of users at different points in time.

The data Group also developed a way to calculate a country’s “gross national happiness” from its Facebook activity by logging the occurrence of words and phrases that signal positive and negative emotion. Such studies can be focused on single individual’s use also, of course, which was not mentioned. Welcome to the World of “big data.”

 Facebook’s scientists must obviously contribute to the long-term profitability of the network. However, their research can combine public and company goals; the former can also be publicly promoted.

Simonite emphasized the holy of research within Facebook in this quote: “Whatever happens, Marlow says, the primary goal of his team is to support the well-being of the people who provide Facebook with their data, using it to make the service smarter. Along the way, he says, he and his colleagues will advance humanity’s understanding of itself.

That echoes Zuckerberg’s often doubted but seemingly genuine belief that Facebook’s job is to improve how the world communicates.” Simonite continued with: “Just don’t ask yet exactly what that will entail.”

Conclusion: should we direct our algorithms?

In itself the algorithms of Facebook are secret formula as is its execution. Specialized bureaus are attempting to get behind this on the basis of experience, in order to secure more prominent spaces in our News Feeds for their clients. For commercial reasons only. We should do much more to hack the Facebook principles if Mark Zuckerbergs principle of ‘sharing’ is not applied to his companies’ secrecy.

Facebook has little to lose. It could even proceed to make public the variables for the selection of Friends’ statuses in your overview. Or even downgrade the prominence of commercial parties in Timelines, so that companies are forced to advertise more.

The true kitchen secret cherished by Facebook is the way in which each person’s Timeline and advertising offer is constructed. The variables and terms of this remain concealed from our view.

If those calling for ‘transparency’ have their way, perhaps ranking of news and advertising selection as well as our big data profiling will someday become public, enforced by law-givers?

Even this transparancy might make Mark Zuckerbergs empire more powerful since we’ll gonna trust him more. And we might even be able to create our own social media selection mechanisms or perhaps even algorithms: “please include ‘local racial conflicts’ in my Timeline (but not on Saturdays, not when I'm feeling sad, et cetera...)



17 aug 2014
Netkwesties is een webuitgave over internet, ict, media en samenleving met achtergrondartikelen, beschouwingen, columns en commentaren van een panel van deskundigen.
Colofon Nieuwsbrief RSS Feed Twitter

Nieuwsbrief ontvangen?

De Netkwesties nieuwsbrief bevat boeiende achtergrondartikelen, beschouwingen, columns en commentaren van een panel van deskundigen o.g.v. internet, ict, media en samenleving.

De nieuwsbrief is gratis. We gaan zorgvuldig met je gegevens om, we sturen nooit spam.

Abonneren Preview bekijken?

Netkwesties © 1999/2024. Alle rechten voorbehouden. Privacyverklaring