Introduction
Informing a food justice project, this social listening effort asks what is part of the conversation and what is missing when the world discusses food. It asks questions like:
- What are the commonly discussed food challenges?
- What terms are used to describe common justice topics like food security?
- What terms come up across different parts of the world when they talk about "good" food and what does that potentially tell us about the priorities or beliefs of those different regions?
- How do governance, policy, and justice feature in food conversations and how does that vary geographically?
- How can these data driven insights inform and inspire qualitative research?
Specifically, using news media as a lens, this natural language processing effort uses machine learning / artificial intelligence to read thousands of articles from across countries and languages to see how often different topics appear and in which context. Distilling this insight into interactive data visualizations, this website provides research tools which enable users to explore this global dataset.
Tutorial
Before getting started, this short video tutorial goes through an example analysis, showing how to use the interactive web-based visualization.
You can also skip to app or skip to insights.
Note: Some example articles are listed with URLs. These are provided as examples solely for the purpose of academic research. Rights retained by their authors.
Web Application
This online interactive visualization allows users to explore the data in detail but may take a few moments to load for the first time. Some users such as those with slower internet connections, some mobile devices, or certain adaptive (accessibility) technologies may prefer the express version. By continuing you agree to the site's terms as described in the about section which discusses open source, data license, privacy, and additional details.
Loading...
Express version
This alternative version of the web application may work better for some mobile devices, slower internet connections, or certain adaptive (accessibility) technologies. Simply prepare a query below and indicate if you want a visualization or data export. By continuing you agree to the site's terms as described in the about section which discusses open source, data license, privacy, and additional details.Query
Results
Please wait...
Method
This natural language processing project utilizes a number of technologies to source data and then produce an interactive visualization:- News sources are queried through Newsdata.io using a filter for articles discussing food.
- Search parameters are converted to each article's language using Amazon Translate, the same automated system used to convert all article information to a single common language (English: the most common article original language) before additional processing.
- This application determines the topics of an article by having a machine read and catagorize their contents. This happens through either Method A or Method B.
- Method A: In an algorithm called LDA, the computer examines how often different words appear in articles in general as well as how often they appear together, creating a sense for which words may be in the same topic (like dog and cat) and, thus, which topics a document contains.
- Method B: After filtering commonly occuring words that may be less useful in determining article topic (TF-IDF), the words of an article are converted to numbers describing their meaning (Word2Vec) where dog and cat may have more similar numbers than dog to truck. Finally, the computer clusters together similar words or, put another way, groups words with similar numbers (HDBSCAN).
- The team double checks and, when necessary, performs minor refinements of the resulting topics before performing "affinity diagramming" where topics are clustered further into a smaller number of "categories" as shown in the visualization.
- A visualization using country centroids is built using a technology called Sketchingpy and deployed to the web.
About
This is an open source academic research project.Data license and open source
Code available under an open source license. The project makes the following repositories available: Original publisher retains copyright to article content and some metadata including title. Please ensure you have rights or fair use to use the materials given the specifics of derivative work. Data available for download under the CC-BY-NC License. The full data download is only intended for academic research and, by downloading it, you agree to use it only for academic purposes. See provider Newsdata.io for more information. By using these data you agree they are made available without any warranty of any kind.Credits
Collaboration between the Eric and Wendy Schmidt Center for Data Science and Environment and the Global Alliance for the Future of Food. See humans.txt for more details and full credits. Open source libraries used are available in visualization README and data pipeline README.Privacy
This application records standard server access logs for security and stability reasons, including to prevent abuse. This is a common practice employed by many websites necessary for maintaing application function. This information includes:- IP addresses which describe from which location and internet connection the application is accessed.
- User agent strings which provide basic information about the device and software ("browser") used to access the site.
- The requested URLs which are unique identifiers for data and other resources requested from our servers.
- Personally identifying information is never shared or sold.
- Anonymized data do not include specific IP addresses.
- IP address is not retained past 7 days except for security / abuse prevention purposes if potential unusual behavior is deteced like in the case of a large number of requests.
- No information we collect is used for advertising.
- Though we do not ask users their age to respect their privacy, this webpage is intended for those 18 years and older.
- This website does not use cookies.
- Your device may "cache" parts of the application "locally" on your machine for performance reasons as dictated by your browsesr settings.