• Home
  • 2024 Datasets

2024 Datasets

Note: Please consult data licensing in every case you use others’ data, because although they may be publicly accessible, you may be limited in its usage or may have to cite authors/creators in your outputs.


Qualitative data repositories:

The Qualitative Data Repository (https://qdr.syr.edu)
An archive for storing and sharing digital data (and accompanying documentation) generated or collected through qualitative and multi-method research in the social sciences and related disciplines.

The UK Data Service (https://discover.ukdataservice.ac.uk/QualiBank)
The principle repository for economic, population, and social research data in the UK.


Data repositories where qualitative data is also deposited:

(enter your own search terms; add “qualitative”, “transcript”, etc. to find datasets)

Figshare (https://figshare.com)

Open Science Framework (https://osf.io/search)


Original data:

Narratives and photovoice data from CAPTIVATE (https://osf.io/wju6d/)
Capturing Innovation in Learning and Teaching in Higher Education (CAPTIVATE) project aims to explore the impact of innovative pedagogies on student learning experiences within a Science and Technology university setting. Utilizing a participatory research methodology, the study aims to create a comprehensive understanding of how students interact with, and are impacted by, emerging teaching methodologies.

Interviews from the “COVID in Cartoons project” (https://figshare.com/articles/dataset/CIC_Datasets_Focus_Group_Transcripts/19886647)
The project engaged 15-18 year olds with political cartoons and cartoonists to foster processes of meaning-making in relation to the pandemic; it engaged young people in building critical narratives of the crisis and its impact on their lives.

Patient narratives on type II diabetes (https://figshare.com/articles/dataset/Transcripts_in-depth_patient_interviews/13606529)
Ten transcripts of in-depth patient interviews with patients diagnosed with diabetes mellitus type two and receiving diabetes care from both a general practitioner and a practice nurse.

Photovoice data from the “Community Solidarity Initiatives as Spaces of Connection, Resistance and Change” project (https://osf.io/yvcwk)
The project examined experiences and outcomes of displaced and resident/national participants in community solidarity initiatives in Ireland that bring people living in Direct Provision (DP) in contact with the wider community.

Interviews on Shared Medical Appointments (https://figshare.com/articles/dataset/Qualitative_Transcripts/19773664)
Project title: “Qualitative Exploration of the Psychological Dimensions of Telehealth Shared Medical Appointments (SMAs) for Buprenorphine Prescribing". The study examined psychological components of telehealth SMAs for buprenorphine prescribing to learn about the benefits and drawbacks of this treatment model.

Interviews on Parental Mentalization (https://figshare.com/articles/dataset/Parental_Mentalization_Measures_Interview_Transcripts_5_docx/12869984)
5 transcripts with expert clinicians and researchers in the field of mentalization. Interviews are semi-structured on the topic of parental mentalization measures, and the current strengths and weaknesses of existing measures. The aim of interview was to find out what essential elements are needed in clinically valid and usable measures of parental mentalization.

Interviews on Open Science practices (https://osf.io/e5t8x)
This project looks at how scholars in Communication and particularly Mass Communication approach, understand, and implement open science. They do this from a qualitative perspective, implementing many open science practices themselves.


Other ideas:

Speak My Language (https://speakmylanguage.com.au/accessible-interview-transcripts)
Transcriptions of interviews that share the lived experience of people with disabilities, providing insights into the resources, activities and places that help to build their fulfillment and wellbeing.

YouTube (make use of built-in automatic transcription)

Coded datasets by the Epistemic Analytics Lab (https://www.qehub.org/resources/datasets)

Also: News databases, ChatGPT-generated mock interviews or narratives, Instagram accounts

Coursera Dataset: Forum posts drawn from student discussions of online courses across 8 years in topics such as probability, vaccines, mythology, accounting, music, design, gamification,calculus, global trends and modern poetry

Link: Reach out to Amanda Barany for data access (amanda.barany@gmail.com)

Format: de-identified .xls files and virtual visualizations using rONA

Access: digital access through online database

Main variables: forum post content, course name and date, forum post prompt and date, anonymized user

Existing automatic codes (set up with Codey): fIntroductions, gratitude, evaluations, apologies, liking, course logistics, links, positive expressions, negative expressions

Note: This is a flexible, auto-coded, large-scale data source that can support a variety of explorations. Reach out to Amanda Barany (amanda.barany@gmail.com) to facilitate


Working with text files, want to code them, and get a qualitative data table in the end?
This video shows you how with the Reproducible Open Coding Kit in 7 easy steps.