AusReddit: A Reddit Dataset for Australia

Reddit is a vast and dynamic platform for public discussion, but finding relevant Australian conversations can be a major challenge for researchers. AusReddit solves this problem by providing a curated, comprehensive, and historical research databank of Reddit posts and comments from Australian-related subreddits. Currently, the databank contains over 6 million submissions and nearly 120 million comments from 590 subreddits, making it an invaluable resource for understanding public discourse and societal issues in Australia.

What is in AusReddit?

The core of AusReddit is its curated collection of data from subreddits identified as "Australian".

How is a subreddit identified as Australian?

Subreddits are identified through a multi-step process. A subreddit is included if it meets one or more of the following criteria:

  • Explicitly Australian Topic: The subreddit's name or topic clearly refers to Australia or one of its regions, cities, or states (e.g., r/australia, r/brisbane, r/AusFinance).

  • Australian Entities or Issues: The subreddit is dedicated to Australian organisations, cultural phenomena, or issues (e.g., r/AFL, r/NRL, r/ABCTV).

  • Substantial Australian Community: The subreddit is known to host a significant Australian community where discussions about Australian issues frequently occur.

It is important to note that this method confines the data to an Australian context. It does not guarantee that every post was made by a user in Australia.

What time period does AusReddit cover?

AusReddit's coverage is both historical and current. The data extends from 2005 to the most recent whole month. The databank is updated monthly, ensuring researchers have access to timely and ongoing conversations.

How does AusReddit work?

  1. Explore with the Ngram Viewer: For those wanting to explore trends without registering for full data access, the public Ngram Viewer is the starting point. The viewer allows you to search for words or phrases and plots their frequency over time, revealing when certain topics spiked in public conversation.

  2. Full Data Access via the Web Portal & API: For deep analysis, researchers can apply for full access. The secure web portal features a powerful search interface that allows for keyword and semantic searches across all submissions and comments. A key advantage is the ability to reliably search and sort by date—a function that is difficult using Reddit’s own public API.

  3. Analyse with Jupyter Notebooks: To make analysis accessible to all researchers, we provide a suite of pre-built Jupyter Notebooks. These tools require minimal knowledge and allow you to perform complex tasks like visualising conversation structures, identifying key topics, and analysing the emotions expressed in discussions.

What are the advantages of AusReddit?

AusReddit offers significant benefits over attempting to collect and analyse Reddit data manually or through the standard Reddit API.

  • Curated and Focused: The collection is pre-filtered to include 590 Australian-centric subreddits, saving you the effort of identifying relevant communities yourself.

  • Comprehensive Historical Data: The archive provides a long-term perspective on public discourse.

  • Powerful and Accessible Analysis: The combination of the search portal and the ready-to-use analysis notebooks lowers the technical barrier, empowering researchers from all disciplines to work with large-scale social media data.

  • Ideal for Cross-Referencing: The data can be effectively used alongside other datasets to build a richer, multi-platform understanding of a particular event or topic.

What research can AusReddit support?

AusReddit is a versatile resource suitable for a wide array of research projects in the humanities and social sciences. It can be used to:

  • Track public opinion and sentiment on political events, social issues, or cultural trends over more than a decade.

  • Analyse the structure and dynamics of online communities and how conversations unfold.

  • Conduct linguistic analysis on the language used in specific Australian online spaces.

  • Serve as an excellent teaching tool for university students, providing a rich, real-world dataset for them to build critical data analysis skills.

How to Access AusReddit

Who can access AusReddit?

AusReddit is available to academic researchers at institutions across Australia.

What do I need to get access?

As the data custodian, the AIO and its partners have overarching ethics approval to maintain the databank. To gain access to download and use the data for your own research, you will need to provide evidence of ethics approval from your institution that specifically mentions the use of AusReddit.

How do I get started?

To request access, please create an account and follow the application process on the QUT Digital Observatory website. Researchers with access can search, view, and download data from AusReddit for their projects.