import pandas as pd
import numpy as np
from tqdm import (
tqdm,
) # Adds progress bars to loops and other iterable processes for better visualization.
tqdm.pandas() # Allows progress bars to appear during DataFrame operations.Counterfactual Data Balancing
Introduction and Motivation
In this project, I set out to create a balanced dataset that would support supervised learning models for predicting the factors linked to exonerations. At the heart of this process is counterfactual balancing: building a dataset that includes exonerated individuals alongside a comparable group of non-exonerated individuals, drawn to reflect the broader incarcerated population in Illinois. This balance is critical—it allows the model to make fair and meaningful comparisons when identifying patterns and predictors of exoneration outcomes.
Why Use Counterfactual Data?
Counterfactual data is a necessity when access to complete prison population records is unavailable. Since I don’t have access to a full dataset of all incarcerated individuals in Illinois and their exoneration statuses (e.g., exonerated, not exonerated), I relied on counterfactuals to bridge the gap and construct a balanced dataset.
Counterfactuals allow us to ask “what if?” questions.What if an exonerated person had not been exonerated? Would their characteristics look similar to non-exonerated individuals? Moreover, counterfactual data helps isolate these comparisons by holding everything else constant except the hypothetical condition—in this case, exoneration.
As explained in this primer on counterfactuals, a counterfactual statement operates on an unrealized “if” condition. The “if” portion, also known as the antecedent, frames the comparison: exonerated individuals versus those who weren’t. This approach is powerful because it reduces bias and ensures that the model is trained on data that is reliable, balanced, and representative.1
Acknowledgments
The implementation of this counterfactual data balancing relied heavily on expert guidance and code contributions from Professor Jeff Jacobs. His insights and support were invaluable in refining the methodology and making this process possible.
Narrowing to Incarcerated Population
To focus on the incarcerated population in Illinois, the dataset was filtered to include only relevant columns that captured key demographic details, such as total incarcerated populations broken down by race—White, Black, and Latino. This step ensured that the precise subset of data needed for balancing was used while also laying the groundwork for simulating representative draws from the Illinois incarcerated population.
il_df = pd.read_csv("../../data/processed-data/representation_by_county.csv")
il_df = il_df[il_df["state"] == "Illinois"].copy()
il_df.head(3)| county | state | total_population | total_white_population | total_black_population | total_latino_population | incarcerated_population | incarcerated_white_population | incarcerated_black_population | incarcerated_latino_population | non-incarcerated_population | non-incarcerated_white_population | non-incarcerated_black_population | non-incarcerated_latino_population | ratio_of_overrepresentation_of_whites_incarcerated_compared_to_whites_non-incarcerated | ratio_of_overrepresentation_of_blacks_incarcerated_compared_to_blacks_non-incarcerated | ratio_of_overrepresentation_of_latinos_incarcerated_compared_to_latinos_non-incarcerated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Adams | Illinois | 67103 | 62414 | 2331 | 776 | 110 | 73 | 36 | 0 | 66993 | 62341 | 2295 | 776 | 0.71 | 9.54 | 0.00 |
| 1 | Alexander | Illinois | 8238 | 4983 | 2915 | 155 | 411 | 89 | 242 | 79 | 7827 | 4894 | 2673 | 76 | 0.35 | 1.72 | 19.82 |
| 2 | Bond | Illinois | 17768 | 15797 | 1080 | 547 | 1542 | 500 | 657 | 304 | 16226 | 15297 | 423 | 243 | 0.34 | 16.32 | 13.14 |
Columns are renamed to streamline the analysis, removing unnecessary verbosity while retaining clarity.
rename_map = {
"county": "county",
"state": "state",
"incarcerated_population": "Total",
"incarcerated_white_population": "White",
"incarcerated_black_population": "Black",
"incarcerated_latino_population": "Latino",
}
# Keep only the cols in the rename_map
cols_to_keep = list(rename_map.keys())
il_df = il_df[cols_to_keep].copy()
# And do the renaming
il_df.rename(columns=rename_map, inplace=True)
il_df.head()| county | state | Total | White | Black | Latino | |
|---|---|---|---|---|---|---|
| 0 | Adams | Illinois | 110 | 73 | 36 | 0 |
| 1 | Alexander | Illinois | 411 | 89 | 242 | 79 |
| 2 | Bond | Illinois | 1542 | 500 | 657 | 304 |
| 3 | Boone | Illinois | 71 | 38 | 12 | 21 |
| 4 | Brown | Illinois | 2059 | 419 | 1267 | 367 |
To align the data with the exoneration registry, a small adjustment was made to clean up the county names. The original dataset listed counties with the trailing word “County” (e.g., “Cook County”), but the registry uses simplified names (like “Cook”), ensuring consistency across datasets.
A state_prop column was then added to represent the proportion of all Illinois inmates coming from each county. This was calculated by dividing each county’s total incarcerated population (Total) by the sum of the total population across all counties. Sorting the values in descending order highlighted the counties with the largest share of the state’s incarcerated population.
# Since the Exoneree project uses just the county name (like "Cook"), we'll remove the trailing " County" (so, e.g., "Cook County" will turn into just "Cook"):
il_df["county"] = il_df["county"].str.replace(" county", "")
# Compute a state_prop column representing the % of all Illinois inmates contained in each county:
il_df["state_prop"] = il_df["Total"] / il_df["Total"].sum()
il_df.sort_values(by="state_prop", ascending=False).head()| county | state | Total | White | Black | Latino | state_prop | |
|---|---|---|---|---|---|---|---|
| 15 | Cook | Illinois | 11649 | 1769 | 8369 | 1468 | 0.164469 |
| 98 | Will | Illinois | 3902 | 811 | 2528 | 538 | 0.055091 |
| 78 | Randolph | Illinois | 3571 | 934 | 2250 | 377 | 0.050418 |
| 53 | Logan | Illinois | 3060 | 963 | 1705 | 389 | 0.043203 |
| 52 | Livingston | Illinois | 2798 | 905 | 1577 | 294 | 0.039504 |
From the output, Cook County stands out, contributing roughly 16% of Illinois’ incarcerated individuals, followed by Will, Randolph, Logan, and Livingston counties. This helps identify where most of the incarcerated population is concentrated, which will be key for balancing comparisons in the analysis.
# To avoid confusing the state_prop value with the sampled proportion that we compute below, we can drop state_prop now:
il_df = il_df.drop(columns=["state_prop"])
# Since they're only tracking three racial groups, the total of the three race counts should not equal the total incarcerated population. But let's check:
il_df["three_cat_total"] = il_df["Black"] + il_df["White"] + il_df["Latino"]
il_df.head()| county | state | Total | White | Black | Latino | three_cat_total | |
|---|---|---|---|---|---|---|---|
| 0 | Adams | Illinois | 110 | 73 | 36 | 0 | 109 |
| 1 | Alexander | Illinois | 411 | 89 | 242 | 79 | 410 |
| 2 | Bond | Illinois | 1542 | 500 | 657 | 304 | 1461 |
| 3 | Boone | Illinois | 71 | 38 | 12 | 21 | 71 |
| 4 | Brown | Illinois | 2059 | 419 | 1267 | 367 | 2053 |
To ensure the sample accurately represents the county-by-county distributions, the difference between three_cat_total and Total was used to construct the “Other” category.
il_df["Other"] = il_df["Total"] - il_df["three_cat_total"]
il_df.head()| county | state | Total | White | Black | Latino | three_cat_total | Other | |
|---|---|---|---|---|---|---|---|---|
| 0 | Adams | Illinois | 110 | 73 | 36 | 0 | 109 | 1 |
| 1 | Alexander | Illinois | 411 | 89 | 242 | 79 | 410 | 1 |
| 2 | Bond | Illinois | 1542 | 500 | 657 | 304 | 1461 | 81 |
| 3 | Boone | Illinois | 71 | 38 | 12 | 21 | 71 | 0 |
| 4 | Brown | Illinois | 2059 | 419 | 1267 | 367 | 2053 | 6 |
The data source doesn’t provide much documentation, but it seems like some counties might be double-counting individuals who report more than one race. This assumption comes from the fact that, in some cases, the three_cat_total values (sum of White, Black, and Latino counts) are higher than the overall Total population for those counties.
il_df[il_df["three_cat_total"] > il_df["Total"]]| county | state | Total | White | Black | Latino | three_cat_total | Other | |
|---|---|---|---|---|---|---|---|---|
| 13 | Clinton | Illinois | 1599 | 486 | 917 | 199 | 1602 | -3 |
| 16 | Crawford | Illinois | 1230 | 310 | 782 | 141 | 1233 | -3 |
| 25 | Fayette | Illinois | 1527 | 467 | 933 | 129 | 1529 | -2 |
| 40 | Jefferson | Illinois | 1857 | 827 | 812 | 224 | 1863 | -6 |
| 50 | Lawrence | Illinois | 2358 | 486 | 1490 | 393 | 2369 | -11 |
| 59 | Madison | Illinois | 14 | 0 | 11 | 14 | 25 | -11 |
| 60 | Marion | Illinois | 114 | 69 | 37 | 10 | 116 | -2 |
| 91 | Vermilion | Illinois | 2084 | 536 | 1236 | 319 | 2091 | -7 |
| 95 | Wayne | Illinois | 2 | 0 | 2 | 2 | 4 | -2 |
| 96 | White | Illinois | 72 | 35 | 19 | 36 | 90 | -18 |
Since most of these cases involve low numbers (with Madison County and White County as notable exceptions—anomalous, but beyond the scope of what can be addressed without direct input from correctional facilities), the “Other” value was set to 0 in these instances.
il_df["Other"] = il_df["Other"].apply(lambda x: 0 if x < 0 else x)
# Drop three_cat_total, since we only needed that in order to form the other count:
il_df.drop(columns=["three_cat_total"], inplace=True, errors="ignore")
# Store these names in a list for future use (to ensure consistency in naming throughout):
race_category_names = ["White", "Black", "Latino", "Other"]
il_df.head()| county | state | Total | White | Black | Latino | Other | |
|---|---|---|---|---|---|---|---|
| 0 | Adams | Illinois | 110 | 73 | 36 | 0 | 1 |
| 1 | Alexander | Illinois | 411 | 89 | 242 | 79 | 1 |
| 2 | Bond | Illinois | 1542 | 500 | 657 | 304 | 81 |
| 3 | Boone | Illinois | 71 | 38 | 12 | 21 | 0 |
| 4 | Brown | Illinois | 2059 | 419 | 1267 | 367 | 6 |
Illinois Exoneree Counts/Demographics
The Illinois exoneration data was loaded, and the total number of exonerated individuals was calculated by taking the length of the dataframe using len(exon_il_df). The result: 548 exonerations. This serves as the starting point for understanding the scope of exoneration cases in Illinois.
exon_il_df = pd.read_csv("../../data/processed-data/illinois_exoneration_data.csv")
exon_il_df.head(3)| last_name | first_name | age | race | sex | state | county | latitude | longitude | worst_crime_display | ... | child_welfare_worker_misconduct | withheld_exculpatory_evidence | misconduct_that_is_not_withholding_evidence | knowingly_permitting_perjury | witness_tampering_or_misconduct_interrogating_co_defendant | misconduct_in_interrogation_of_exoneree | perjury_by_official | prosecutor_lied_in_court | tag_sum | geocode_address | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Abbott | Cinque | 19.0 | Black | male | Illinois | Cook | 41.819738 | -87.756525 | Drug Possession or Sale | ... | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 7 | Cook County, Illinois, United States |
| 1 | Abernathy | Christopher | 17.0 | White | male | Illinois | Cook | 41.819738 | -87.756525 | Murder | ... | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 10 | Cook County, Illinois, United States |
| 2 | Abrego | Eruby | 20.0 | Hispanic | male | Illinois | Cook | 41.819738 | -87.756525 | Murder | ... | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 9 | Cook County, Illinois, United States |
3 rows × 49 columns
num_il = len(exon_il_df)
num_il548
The value_counts() function was applied to the race column with normalize=True to calculate the proportion of exonerated individuals by race in Illinois. The results highlight significant disparities:
- Black individuals make up the majority of exonerations at 76.3%.
- Hispanic individuals account for 14.8%, while White individuals represent only 8.6%.
- The remaining categories, including Asian and Native American, each comprise less than 0.2% of exonerations.
exon_il_df["race"].value_counts(normalize=True)race
Black 0.762774
Hispanic 0.147810
White 0.085766
Asian 0.001825
Native American 0.001825
Name: proportion, dtype: float64
Since the Prison Policy Initiative demographic data only includes Black, White, Latino, and Other as race categories, “Hispanic” was first renamed to “Latino” for consistency. “Asian” and “Native American” were then combined into the “Other” category. To preserve the original race data, it was saved into a new column called Race_orig for future reference if needed.
recode_map = {
"Black": "Black",
"Hispanic": "Latino",
"White": "White",
"Asian": "Other",
"Native American": "Other",
}
exon_il_df["Race_orig"] = exon_il_df["race"]
exon_il_df["race"] = exon_il_df["race"].apply(lambda x: recode_map[x])
exon_il_df["race"].value_counts(normalize=True)race
Black 0.762774
Latino 0.147810
White 0.085766
Other 0.003650
Name: proportion, dtype: float64
Sampling from the Incarcerated Population
Draw Representative Samples
The first step in the simulation is to draw a representative sample of 548 “people” from the Illinois prison population. To achieve this, a weighted random sample with replacement was performed from the il_df dataset. Sampling weights were determined based on each county’s total incarcerated population, ensuring that counties with larger populations contributed proportionally more to the sample.
A random seed (random_state=5000) was set to ensure the results are replicable. This step produces a valid population-weighted sample where the only known characteristic of each “person” is their county.
il_sample_df = il_df.sample(
num_il,
replace=True,
weights=il_df["Total"],
random_state=5000,
).copy()
il_sample_df.head()| county | state | Total | White | Black | Latino | Other | |
|---|---|---|---|---|---|---|---|
| 15 | Cook | Illinois | 11649 | 1769 | 8369 | 1468 | 43 |
| 36 | Henry | Illinois | 301 | 172 | 108 | 21 | 0 |
| 72 | Perry | Illinois | 2323 | 561 | 1398 | 352 | 12 |
| 15 | Cook | Illinois | 11649 | 1769 | 8369 | 1468 | 43 |
| 53 | Logan | Illinois | 3060 | 963 | 1705 | 389 | 3 |
il_sample_df["county"].value_counts(normalize=True).head()county
Cook 0.142336
Will 0.060219
Randolph 0.056569
Perry 0.040146
Logan 0.040146
Name: proportion, dtype: float64
Simulating Racial Distribution
To replicate the racial makeup of the incarcerated population, racial counts for each county were used to create a probability distribution for race. For each row in il_sample_df (which represents a sampled county), a distribution was formed based on the race-specific counts, and a single “person” was drawn from that distribution.
This process was done row-by-row using NumPy’s random.choice() function. A random seed (RNG) was also set to ensure the results remain consistent and replicable across runs.
rng = np.random.default_rng(seed=5000)
def draw_race_sample(row):
race_counts = [row[cur_val] for cur_val in race_category_names]
total_count = sum(race_counts)
race_probs = [cur_count / total_count for cur_count in race_counts]
# And now we have a probability distribution! We can use rng.choice() to sample from it
sampled_vals = rng.choice(race_category_names, size=1, p=race_probs)
# We only sampled 1 value here, so we use [0] to extract it
sampled_val = list(sampled_vals)[0]
return sampled_valBefore sampling, the function was tested by drawing multiple samples for a specific county—Cook County, in this case. To verify its accuracy, the expected proportions for sampling N inmates from Cook were first computed.
cook_row = il_df[il_df["county"] == "Cook"].iloc[0]
for cname in race_category_names:
cook_row[f"{cname}_prop"] = cook_row[cname] / cook_row["Total"]
cook_rowcounty Cook
state Illinois
Total 11649
White 1769
Black 8369
Latino 1468
Other 43
White_prop 0.151859
Black_prop 0.718431
Latino_prop 0.126019
Other_prop 0.003691
Name: 15, dtype: object
This means that if the draw_race_sample() function is working correctly, it should generate “White” 15.2% of the time, “Black” 71.8% of the time, and so on. To confirm this, a sample of size N=5000 was generated from Cook County to check whether the proportions align with the expected values.
N = 5000
cook_samples = [draw_race_sample(cook_row) for _ in range(N)]
cook_sample_df = pd.DataFrame(cook_samples, columns=["Race"])
cook_sample_df["Race"].value_counts(normalize=True)Race
Black 0.7186
White 0.1518
Latino 0.1260
Other 0.0036
Name: proportion, dtype: float64
The results look good and are very close to the expected proportions, which confirms that the draw_race_sample() function is working as intended. With this validation, the function can now be used to sample a race value for each row in il_sample_df.
This step also introduces the tqdm library, which is useful for tracking progress when running simulations like this. It helps monitor how long the code takes per row, ensuring the simulation remains efficient.
il_sample_df["Race"] = il_sample_df.progress_apply(draw_race_sample, axis=1)100%|██████████| 548/548 [00:00<00:00, 8289.38it/s]
sample_cols_to_keep = ["county", "state", "Race"]
il_sample_df = il_sample_df[sample_cols_to_keep].copy()
il_sample_df| county | state | Race | |
|---|---|---|---|
| 15 | Cook | Illinois | Black |
| 36 | Henry | Illinois | White |
| 72 | Perry | Illinois | Black |
| 15 | Cook | Illinois | Black |
| 53 | Logan | Illinois | Black |
| ... | ... | ... | ... |
| 51 | Lee | Illinois | White |
| 10 | Christian | Illinois | Black |
| 25 | Fayette | Illinois | Black |
| 44 | Kane | Illinois | White |
| 52 | Livingston | Illinois | Black |
548 rows × 3 columns
Let’s take a look at the racial distribution of the Cook County subset from our sample to see how it turned out:
cook_sample_df = il_sample_df[il_sample_df["county"] == "Cook"].copy()
cook_sample_df["Race"].value_counts(normalize=True)Race
Black 0.743590
Latino 0.166667
White 0.089744
Name: proportion, dtype: float64
The results show a slight oversample of Latinos compared to the population expectation and an undersample of Whites. While this might seem odd, it’s actually a feature of this sampling process. The goal here is to simulate the simplified model of the Exoneration Registry, where the sample of exonerees represents a subset of 548 inmates from Cook County. This allows for a direct comparison with another size-548 subset of those still incarcerated in Cook.
With this step completed, the 548 rows from il_sample_df can now be combined with the 548 rows in exon_il_df, creating a balanced DataFrame with a total of 1,096 rows. Half of these rows represent exonerated individuals from Illinois, and the other half represent non-exonerated individuals, sampled to be statistically representative of Illinois’ incarcerated population as a whole.
Constructing the Final Balanced Dataset
To prepare the final balanced dataset, a new label column was added to distinguish between exonerated and non-exonerated individuals. Specifically:
- The Label column in exon_il_df was set to “Exonerated”.
- The Label column in il_sample_df was set to “Non-Exonerated”.
To avoid confusion when combining datasets, the county column in il_sample_df was renamed to County. With the labels in place and columns aligned, both datasets were combined into a single DataFrame using pd.concat().
Next, a race mapping was applied to standardize the race categories across datasets:
- “Asian” and “Native American” were combined into the “Other” category.
- “Black,” “White,” and “Hispanic” categories were kept as-is.
To clean up, the race and Race columns were combined, prioritizing non-NaN values to ensure no data was lost. The original race column was then dropped. Similarly, the county and County columns were merged, and the original county column was removed to streamline the final DataFrame.
Finally, the resulting Race and County columns were checked to confirm the expected values, and the first few rows of the balanced dataset were displayed to verify everything was in place.
# Construct our new label: exonerated vs. non-exonerated
exon_il_df["Label"] = "Exonerated"
il_sample_df["Label"] = "Non-Exonerated"
il_sample_df = il_sample_df.rename(
columns={"county": "County"}
) # Rename to distinguish when combining datasets
# And combine!
balanced_df = pd.concat([exon_il_df, il_sample_df], axis=0)
# Define the mapping for 'race'
race_mapping = {
"Asian": "Other",
"Native American": "Other",
"Black": "Black",
"White": "White",
"Hispanic": "Hispanic",
}
# Map the 'race' column
balanced_df["race"] = balanced_df["race"].map(race_mapping)
# Combine 'race' and 'Race' columns, prioritizing non-NaN values
balanced_df["Race"] = balanced_df["race"].combine_first(balanced_df["Race"])
# Drop the old 'race' column
balanced_df.drop(columns=["race"], inplace=True)
# Combine 'county' and 'County' columns, prioritizing non-NaN values
balanced_df["County"] = balanced_df["county"].combine_first(balanced_df["County"])
# Drop the old 'county' column
balanced_df.drop(columns=["county"], inplace=True)
# Verify the final Race column
print(balanced_df["Race"].value_counts())
print(balanced_df["County"].value_counts())
balanced_df.head()Race
Black 709
White 207
Latino 92
Other 5
Name: count, dtype: int64
County
Cook 552
Will 37
Randolph 31
Jefferson 23
Logan 22
Perry 22
Livingston 22
Fulton 21
Johnson 21
Tazewell 19
Lawrence 18
Montgomery 17
Bond 17
Vermilion 16
DuPage 15
Winnebago 15
Lake 14
St. Clair 14
Clinton 14
La Salle 14
Fayette 13
Lee 13
Brown 12
Kane 12
Knox 11
Peoria 10
Morgan 10
Rock Island 10
Macon 9
Crawford 9
McHenry 7
Christian 6
Williamson 6
Champaign 5
McLean 4
Sangamon 4
Henry 3
Kankakee 3
Stephenson 2
Woodford 2
Edgar 2
Effingham 2
Iroquois 2
Adams 2
Richland 1
Menard 1
Pope 1
Madison 1
Boone 1
Jackson 1
Cumberland 1
Washington 1
Dupage 1
Dekalb 1
Moultrie 1
LaSalle 1
De Witt 1
Name: count, dtype: int64
| last_name | first_name | age | sex | state | latitude | longitude | worst_crime_display | sentence | sentence_in_years | ... | witness_tampering_or_misconduct_interrogating_co_defendant | misconduct_in_interrogation_of_exoneree | perjury_by_official | prosecutor_lied_in_court | tag_sum | geocode_address | Race_orig | Label | County | Race | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Abbott | Cinque | 19.0 | male | Illinois | 41.819738 | -87.756525 | Drug Possession or Sale | Probation | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | Cook County, Illinois, United States | Black | Exonerated | Cook | Black |
| 1 | Abernathy | Christopher | 17.0 | male | Illinois | 41.819738 | -87.756525 | Murder | Life without parole | 100.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 10.0 | Cook County, Illinois, United States | White | Exonerated | Cook | White |
| 2 | Abrego | Eruby | 20.0 | male | Illinois | 41.819738 | -87.756525 | Murder | 90 years | 90.0 | ... | 1.0 | 1.0 | 1.0 | 0.0 | 9.0 | Cook County, Illinois, United States | Hispanic | Exonerated | Cook | NaN |
| 3 | Adams | Demetris | 22.0 | male | Illinois | 41.819738 | -87.756525 | Drug Possession or Sale | 1 year | 1.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | Cook County, Illinois, United States | Black | Exonerated | Cook | Black |
| 4 | Adams | Kenneth | 22.0 | male | Illinois | 41.819738 | -87.756525 | Murder | 75 years | 75.0 | ... | 1.0 | 0.0 | 0.0 | 0.0 | 11.0 | Cook County, Illinois, United States | Black | Exonerated | Cook | Black |
5 rows × 51 columns
balanced_df.to_csv("../../data/processed-data/exonerees_balanced.csv", index=False)Summary and Next Steps
The final balanced dataset now consists of 1,096 rows, split evenly between exonerated and non-exonerated individuals. Key steps included creating consistent labels, standardizing race categories, and combining the datasets while ensuring no critical data was lost. The resulting DataFrame provides a clean and structured foundation for further analysis.
Next Steps
This balanced dataset can now be used for supervised learning tasks, such as:
- Predicting Exoneration Factors: Training machine learning models to identify the characteristics most associated with exoneration outcomes.
- Comparative Analysis: Exploring differences in demographics, geographic distribution, or other variables between exonerated and non-exonerated individuals.
- Visualization and Insights: Mapping trends or disparities across counties and racial groups to better understand systemic patterns in wrongful convictions.
With this dataset, models and analyses can provide deeper insights into the factors driving exonerations while ensuring fairness and balance in comparisons.