Data Collection

Introduction and Motivation

We used the Open Data API from the Consumer Financial Protection Bureau (CFPB) to collect detailed information about consumer complaints related to financial products and services. This data is sourced from the Consumer Complaint Database, a publicly available resource provided by the CFPB.

The Consumer Complaint Database serves as a central repository for complaints about financial products and services. It includes issues ranging from credit cards and loans to mortgages and bank accounts. Once the CFPB receives a complaint, it forwards the details to the respective company for resolution. Complaints are published in the database after the company responds or 15 days after submission, whichever comes first. The CFPB ensures the database is updated daily, making it a reliable and timely resource for tracking consumer issues in the financial sector.

Dive into the Data

To better understand the data in the Consumer Complaint Database, it is essential to examine how the complaint process works. The CFPB outlines a structured procedure that ensures consumer issues are addressed effectively. The process consists of several key steps:

  1. Complaint Submitted
    Consumers can submit a complaint directly through the CFPB’s website or other channels. In some cases, other government agencies forward complaints to the CFPB. Once submitted, consumers receive email updates about the status of their complaint and can track its progress online.

  2. Route
    After receiving the complaint, the CFPB routes it to the appropriate company. The company is then responsible for reviewing the issues raised in the complaint and preparing a response.

  3. Company Response
    The company communicates directly with the consumer to address the issues raised. Companies are generally expected to respond within 15 days. In some cases, if the issue requires additional investigation, the company may notify the consumer that their response is in progress and provide a final response within 60 days.

  4. Complaint Published
    Once the company responds or the 15-day period elapses, the CFPB publishes the complaint in its public Consumer Complaint Database. Personal information that directly identifies the consumer is removed to ensure privacy. If consumers consent, their narrative descriptions of what happened are also published, offering valuable context to the complaint data.

  5. Consumer Review
    After the company submits its response, the CFPB notifies the consumer. The consumer can then review the company’s response and provide feedback. Consumers have 60 days to evaluate the response and share their input, ensuring transparency and accountability in the resolution process.

This structured complaint process ensures that consumer voices are heard and provides a mechanism for resolving disputes. The resulting data in the Consumer Complaint Database not only offers insights into the types of issues consumers face and how companies address them but also helps identify problems in the marketplace, supervise companies’ activities, and enforce federal consumer financial laws.

Field Reference

The Consumer Complaint Database includes the following fields:

Field name Description
Date received The date the CFPB received the complaint. For example, “05/25/2013.”
Product The type of product the consumer identified in the complaint. For example, “Checking or savings account” or “Student loan.”
Sub-product The type of sub-product the consumer identified in the complaint. For example, “Checking account” or “Private student loan.”
Issue The issue the consumer identified in the complaint. For example, “Managing an account” or “Struggling to repay your loan.”
Sub-issue The sub-issue the consumer identified in the complaint. For example, “Deposits and withdrawals” or “Problem lowering your monthly payments.”
Consumer complaint narrative Consumer complaint narrative is the consumer-submitted description of “what happened” from the complaint. Consumers must opt-in to share their narrative. We will not publish the narrative unless the consumer consents, and consumers can opt-out at any time. The CFPB takes reasonable steps to scrub personal information from each complaint that could be used to identify the consumer.
Company public response The company’s optional, public-facing response to a consumer’s complaint. Companies can choose to select a response from a pre-set list of options that will be posted on the public database. For example, “Company believes complaint is the result of an isolated error.”
Company The complaint is about this company. For example, “ABC Bank.”
State The state of the mailing address provided by the consumer.
ZIP code The mailing ZIP code provided by the consumer. This field may: i) include the first five digits of a ZIP code; ii) include the first three digits of a ZIP code (if the consumer consented to publication of their complaint narrative); or iii) be blank (if ZIP codes have been submitted with non-numeric values, if there are less than 20,000 people in a given ZIP code, or if the complaint has an address outside of the United States).
Tags Data that supports easier searching and sorting of complaints submitted by or on behalf of consumers. For example, complaints where the submitter reports the age of the consumer as 62 years or older are tagged “Older American.” Complaints submitted by or on behalf of a servicemember or the spouse or dependent of a servicemember are tagged “Servicemember.” Servicemember includes anyone who is active duty, National Guard, or Reservist, as well as anyone who previously served and is a veteran or retiree.
Consumer consent provided? Identifies whether the consumer opted in to publish their complaint narrative. We do not publish the narrative unless the consumer consents, and consumers can opt-out at any time.
Submitted via How the complaint was submitted to the CFPB. For example, “Web” or “Phone.”
Date sent to company The date the CFPB sent the complaint to the company.
Company response to consumer This is how the company responded. For example, “Closed with explanation.”
Timely response? Whether the company gave a timely response. For example, “Yes” or “No.”
Consumer disputed? Whether the consumer disputed the company’s response.
Complaint ID The unique identification number for a complaint.

Code

We used the CFPB Open Data API to access detailed information about consumer complaints. This API allows users to retrieve data from the Consumer Complaint Database for research and analysis.

For this report, we focused only on complaints where the state is “D.C.” The CFPB updated product and issue options in April 2017 and August 2023. To ensure consistency and relevance, we collected data within the date range of 2017-04-01 to 2023-08-31.

The CFPB Open Data API has a limit of 10,000 records per query. This means it cannot return all results in a single request. To work within this limitation, we splitted our query into two parts. We used 2020-01-01 as the cutoff date, creating two time intervals: one from 2017-04-01 to 2019-12-31, and another from 2020-01-01 to 2023-08-31. This approach ensures we retrieved all relevant data within our specified date range.

Below is the code used to implement this query process.

import requests
import pandas as pd

# Define the base URL for the API endpoint
url = "https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/"


def get_data(size=10000, state='DC', date_min='yyyy-mm-dd', date_max='yyyy-mm-dd'):
    # Initialize an empty list to store records
    records = []

    # Make a GET request to the API with the specified parameters
    params = {'size': size, 'state': state, 'date_received_min': date_min, 'date_received_max': date_max}
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # Parse the JSON response
        data = response.json()

        # Extract the record
        complaints = data.get('hits', {}).get('hits', [])

        # Extract relevant fields
        for complaint in complaints:
            source = complaint.get('_source', {})
            record = {
            'Complaint_ID': source.get('complaint_id'),
            'Date Received': source.get('date_received'),
            'Timely':source.get('timely'),
            'Submitted via': source.get('submitted_via'), 
            'Product': source.get('product'),
            'Sub-product': source.get('sub_product'),
            'Issue': source.get('issue'),
            'Sub-issue': source.get('sub_issue'),
            'Has Narrative': source.get('has_narrative'),
            'Complaint':source.get('complaint_what_happened'),
            'State': source.get('state'),
            'ZIP Code': source.get('zip_code'),
            'Tags':source.get('tags'),
            'Company': source.get('company'),
            'Company Response':source.get('company_response'),
            'Company Public Response': source.get('company_public_response')
            }
            records.append(record)

        return records
    else:
        print(f"Failed to fetch data. Status code: {response.status_code}, Error: {response.text}")

# Use 2020-01-01 as the cutoff date to split the query into two parts
records1 = get_data(size=10000, state='DC', date_min='2017-04-01', date_max='2019-12-31')
records2 = get_data(size=10000, state='DC', date_min='2020-01-01', date_max='2023-08-31')
records = records1 + records2

# Save the combined records as a CSV file
pd.DataFrame(records).to_csv("../../data/raw-data/customer_complaints.csv", index = False)