Person using Twitter on phone
New machine learning algorithm can predict age and gender from just your Twitter profile

Image credit: Shutterstock

New machine learning algorithm can predict age and gender from just your Twitter profile

A new “demographic inference” tool developed by academics can make predictions based solely on the information in a person’s social media profile (i.e. screen name, biography, profile photo, and name). The tool—which works in 32 languages—could pave the way for views expressed on social media to be factored in to popular survey methods.

Researchers at the University of Oxford, University of Michigan, University of Massachusetts, GESIS – Leibniz Institute for the Social Sciences, the Max Planck Institute, and Stanford University have developed a method to infer information about a social media account owner based on the information disclosed in their Twitter profile information.

A new machine learning system —unveiled at the Web Conference in San Francisco this week—learned the patterns associated with different ages, genders, and between organisations and individuals from a data set of over four million Twitter accounts in 32 languages. This information was then combined with estimated locations and re-weighted against census data to produce more accurate estimates of population in 1,101 statistical regions across the EU.

This could pave the way for a more representative understanding of people's views on key societal issues and topics, based on what they post on social media and attributed to specific geographical locations and demographic groups.

Dr Scott Hale, Senior Research Fellow, Oxford Internet Institute, University of Oxford said: “Despite providing lots of data points, social media has long been an unreliable tool for understanding what issues are most important to a wider population given how people self-select into using any one platform.

“This first study of its kind performs demographic predictions about a social media account’s owner based purely on the account’s profile information in 32 languages and then re-weights the online sample to be more similar to an offline population.

“We see this as a significant step towards using social media to get a more accurate picture on the issues and topics that most interest the public and understanding which groups’ views are over- or under-represented.”

This information and data underpinning this research has been made available in an open source library and you can test the inference tool at http://www.euagendas.org/m3demo.