Publications
Mandel, Hannah L.; Yoo, Yun J.; Allen, Andrea J.; Abedian, Sajjad; Verzani, Zoe; Karlson, Elizabeth W.; Kleinman, Lawrence C.; Mudumbi, Praveen C.; Oliveira, Carlos R.; Muszynski, Jennifer A.; Gross, Rachel S.; Carton, Thomas W.; Kim, C.; Taylor, Emily; Park, Heekyong; Divers, Jasmin; Kelly, J. Daniel; Arnold, Jonathan; Geary, Carol Reynolds; Zang, Chengxi; Tantisira, Kelan G.; Rhee, Kyung E.; Koropsak, Michael; Mohandas, Sindhu; Vasey, Andrew; Mosa, Abu S. M.; Haendel, Melissa; Chute, Christopher G.; Murphy, Shawn N.; O'Brien, Lisa; Szmuszkovicz, Jacqueline; Guthe, Nicholas; Santana, Jorge L.; De, Aliva; Bogie, Amanda L.; Halabi, Katia C.; Mohanraj, Lathika; Kinser, Patricia A; Packard, Samuel E.; Tuttle, Katherine R.; Hirabayashi, Kathryn; Kaushal, Rainu; Pfaff, Emily; Weiner, Mark G.; Thorpe, Lorna E.; Moffitt, Richard A.
Long-COVID incidence proportion in adults and children between 2020 and 2024 Journal Article
In: Clinical Infectious Diseases, vol. 80, iss. 6, pp. 1247-1261, 2025.
Abstract | Links | BibTeX | Tags: COVID-19, electronic health records, long COVID, public health surveillance
@article{nokey,
title = {Long-COVID incidence proportion in adults and children between 2020 and 2024},
author = {Hannah L. Mandel and Yun J. Yoo and Andrea J. Allen and Sajjad Abedian and Zoe Verzani and Elizabeth W. Karlson and Lawrence C. Kleinman and Praveen C. Mudumbi and Carlos R. Oliveira and Jennifer A. Muszynski and Rachel S. Gross and Thomas W. Carton and C. Kim and Emily Taylor and Heekyong Park and Jasmin Divers and J. Daniel Kelly and Jonathan Arnold and Carol Reynolds Geary and Chengxi Zang and Kelan G. Tantisira and Kyung E. Rhee and Michael Koropsak and Sindhu Mohandas and Andrew Vasey and Abu S. M. Mosa and Melissa Haendel and Christopher G. Chute and Shawn N. Murphy and Lisa O'Brien and Jacqueline Szmuszkovicz and Nicholas Guthe and Jorge L. Santana and Aliva De and Amanda L. Bogie and Katia C. Halabi and Lathika Mohanraj and Patricia A Kinser and Samuel E. Packard and Katherine R. Tuttle and Kathryn Hirabayashi and Rainu Kaushal and Emily Pfaff and Mark G. Weiner and Lorna E. Thorpe and Richard A. Moffitt},
doi = {10.1093/cid/ciaf046},
year = {2025},
date = {2025-07-18},
urldate = {2025-07-18},
journal = {Clinical Infectious Diseases},
volume = {80},
issue = {6},
pages = {1247-1261},
abstract = {Background: Incidence estimates of post-acute sequelae of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, also known as long COVID, have varied across studies and changed over time. We estimated long COVID incidence among adult and pediatric populations in 3 nationwide research networks of electronic health records (EHRs) participating in the RECOVER (Researching COVID to Enhance Recovery) Initiative using different classification algorithms (computable phenotypes).
Methods: This EHR-based retrospective cohort study included adult and pediatric patients with documented acute SARS-CoV-2 infection and 2 control groups: contemporary coronavirus disease 2019 (COVID-19)-negative and historical patients (2019). We examined the proportion of individuals identified as having symptoms or conditions consistent with probable long COVID within 30-180 days after COVID-19 infection (incidence proportion). Each network (the National COVID Cohort Collaborative [N3C], National Patient-Centered Clinical Research Network [PCORnet], and PEDSnet) implemented its own long COVID definition. We introduced a harmonized definition for adults in a supplementary analysis.
Results: Overall, 4% of children and 10%-26% of adults developed long COVID, depending on computable phenotype used. Excess incidence among SARS-CoV-2 patients was 1.5% in children and ranged from 5% to 6% among adults, representing a lower-bound incidence estimation based on our control groups. Temporal patterns were consistent across networks, with peaks associated with introduction of new viral variants.
Conclusions: Our findings indicate that preventing and mitigating long COVID remains a public health priority. Examining temporal patterns and risk factors for long COVID incidence informs our understanding of etiology and can improve prevention and management.},
keywords = {COVID-19, electronic health records, long COVID, public health surveillance},
pubstate = {published},
tppubtype = {article}
}
Methods: This EHR-based retrospective cohort study included adult and pediatric patients with documented acute SARS-CoV-2 infection and 2 control groups: contemporary coronavirus disease 2019 (COVID-19)-negative and historical patients (2019). We examined the proportion of individuals identified as having symptoms or conditions consistent with probable long COVID within 30-180 days after COVID-19 infection (incidence proportion). Each network (the National COVID Cohort Collaborative [N3C], National Patient-Centered Clinical Research Network [PCORnet], and PEDSnet) implemented its own long COVID definition. We introduced a harmonized definition for adults in a supplementary analysis.
Results: Overall, 4% of children and 10%-26% of adults developed long COVID, depending on computable phenotype used. Excess incidence among SARS-CoV-2 patients was 1.5% in children and ranged from 5% to 6% among adults, representing a lower-bound incidence estimation based on our control groups. Temporal patterns were consistent across networks, with peaks associated with introduction of new viral variants.
Conclusions: Our findings indicate that preventing and mitigating long COVID remains a public health priority. Examining temporal patterns and risk factors for long COVID incidence informs our understanding of etiology and can improve prevention and management.
Shao, Hui; Thorpe, Lorna E.; Islam, Shahidul; Bian, Jiang; Guo, Yi; Li, Piaopiao; Bost, Sarah; Dabelea, Dana; Conway, Rebecca; Crume, Tessa; Schwartz, Brian S.; Hirsch, Annemarie G.; Allen, Katie S.; Dixon, Brian E.; Grannis, Shaun J.; Lustigova, Eva; Reynolds, Kristi; Rosenman, Marc; Zhong, Victor W.; Wong, Anthony; Rivera, Pedro; Le, Thuy; Akerman, Meredith; Conderino, Sarah; Rajan, Anand; Liese, Angela D.; Rudisill, Caroline; Obeid, Jihad S.; Ewing, Joseph A.; Bailey, Charles; Mendonca, Eneida A.; Zaganjor, Ibrahim; Rolka, Deborah; Imperatore, Giuseppina; Pavkov, Meda E.; Divers, Jasmin
In: Diabetes Care, vol. 48, iss. 6, pp. 914-921, 2025.
Abstract | Links | BibTeX | Tags: diabetes mellitus, electronic health records
@article{nokey,
title = {Developing a Computable Phenotype for Identifying Children, Adolescents, and Young Adults With Diabetes Using Electronic Health Records in the DiCAYA Network},
author = {Hui Shao and Lorna E. Thorpe and Shahidul Islam and Jiang Bian and Yi Guo and Piaopiao Li and Sarah Bost and Dana Dabelea and Rebecca Conway and Tessa Crume and Brian S. Schwartz and Annemarie G. Hirsch and Katie S. Allen and Brian E. Dixon and Shaun J. Grannis and Eva Lustigova and Kristi Reynolds and Marc Rosenman and Victor W. Zhong and Anthony Wong and Pedro Rivera and Thuy Le and Meredith Akerman and Sarah Conderino and Anand Rajan and Angela D. Liese and Caroline Rudisill and Jihad S. Obeid and Joseph A. Ewing and Charles Bailey and Eneida A. Mendonca and Ibrahim Zaganjor and Deborah Rolka and Giuseppina Imperatore and Meda E. Pavkov and Jasmin Divers},
doi = {10.2337/dc24-1972},
year = {2025},
date = {2025-06-01},
urldate = {2025-06-01},
journal = {Diabetes Care},
volume = {48},
issue = {6},
pages = {914-921},
abstract = {Objective: The Diabetes in Children, Adolescents, and Young Adults (DiCAYA) network seeks to create a nationwide electronic health record (EHR)-based diabetes surveillance system. This study aimed to develop a DiCAYA-wide EHR-based computable phenotype (CP) to identify prevalent cases of diabetes.
Research design and methods: We conducted network-wide chart reviews of 2,134 youth (aged <18 years) and 2,466 young adults (aged 18 to <45 years) among people with possible diabetes. Within this population, we compared the performance of three alternative CPs, using diabetes diagnoses determined by chart review as the gold standard. CPs were evaluated based on their accuracy in identifying diabetes and its subtype.
Results: The final DiCAYA CP requires at least one diabetes diagnosis code from clinical encounters. Subsequently, diabetes type classification was based on the ratio of type 1 diabetes (T1D) or type 2 diabetes (T2D) diagnosis codes in the EHR. For both youth and young adults, the sensitivity, specificity, and positive and negative predictive values (PPV and NPV, respectively) in finding diabetes cases were >90%, except for the specificity and NPV in young adults, which were slightly lower at 83.8% and 80.6%, respectively. The final DiCAYA CP achieved >90% sensitivity, specificity, PPV, and NPV in classifying T1D, and demonstrated lower but robust performance in identifying T2D, consistently maintaining >80% across metrics.
Conclusions: The DiCAYA CP effectively identifies overall diabetes and T1D in youth and young adults, though T2D misclassification in youth highlights areas for refinement. The simplicity of the DiCAYA CP enables broad deployment across diverse EHR systems for diabetes surveillance.},
keywords = {diabetes mellitus, electronic health records},
pubstate = {published},
tppubtype = {article}
}
Research design and methods: We conducted network-wide chart reviews of 2,134 youth (aged <18 years) and 2,466 young adults (aged 18 to <45 years) among people with possible diabetes. Within this population, we compared the performance of three alternative CPs, using diabetes diagnoses determined by chart review as the gold standard. CPs were evaluated based on their accuracy in identifying diabetes and its subtype.
Results: The final DiCAYA CP requires at least one diabetes diagnosis code from clinical encounters. Subsequently, diabetes type classification was based on the ratio of type 1 diabetes (T1D) or type 2 diabetes (T2D) diagnosis codes in the EHR. For both youth and young adults, the sensitivity, specificity, and positive and negative predictive values (PPV and NPV, respectively) in finding diabetes cases were >90%, except for the specificity and NPV in young adults, which were slightly lower at 83.8% and 80.6%, respectively. The final DiCAYA CP achieved >90% sensitivity, specificity, PPV, and NPV in classifying T1D, and demonstrated lower but robust performance in identifying T2D, consistently maintaining >80% across metrics.
Conclusions: The DiCAYA CP effectively identifies overall diabetes and T1D in youth and young adults, though T2D misclassification in youth highlights areas for refinement. The simplicity of the DiCAYA CP enables broad deployment across diverse EHR systems for diabetes surveillance.
Conderino, Sarah; Divers, Jasmin; Dodson, John A.; Thorpe, Lorna E.; Weiner, Mark G.; Adhikari, Samrachana
Evaluating Methods for Imputing Race and Ethnicity in Electronic Health Record Data Journal Article
In: Health Services Research, vol. 60, iss. 5, pp. e14649, 2025.
Abstract | Links | BibTeX | Tags: electronic health records, methodology
@article{nokey,
title = {Evaluating Methods for Imputing Race and Ethnicity in Electronic Health Record Data},
author = {Sarah Conderino and Jasmin Divers and John A. Dodson and Lorna E. Thorpe and Mark G. Weiner and Samrachana Adhikari},
doi = {10.1111/1475-6773.14649},
year = {2025},
date = {2025-05-27},
journal = {Health Services Research},
volume = {60},
issue = {5},
pages = {e14649},
abstract = {Objective: To compare anonymized and non-anonymized approaches for imputing race and ethnicity in descriptive studies of chronic disease burden using electronic health record (EHR)-based datasets.
Study setting and design: In this New York City-based study, we first conducted simulation analyses under different missing data mechanisms to assess the performance of Bayesian Improved Surname Geocoding (BISG), single imputation using neighborhood majority information, random forest imputation, and multiple imputation with chained equations (MICE). Imputation performance was measured using sensitivity, precision, and overall accuracy; agreement with self-reported race and ethnicity was measured with Cohen's kappa (κ). We then applied these methods to impute race and ethnicity in two EHR-based data sources and compared chronic disease burden (95% CIs) by race and ethnicity across imputation approaches.
Data sources and analytic sample: Our data sources included EHR data from NYU Langone Health and the INSIGHT Clinical Research Network from 3/6/2016 to 3/7/2020 extracted for a parent study on older adults in NYC with multiple chronic conditions.
Principal findings: Under simulation analyses, the non-anonymized BISG imputation provided the most accurate classification of race and ethnicity, ranging from 66% to 73% across missing data mechanisms. Anonymized imputation methods were more sensitive to the missing data mechanism, with agreement dropping when race and ethnicity was missing not at random (MNAR) (κ single = 0.25, κ MICE = 0.25, κ randomforest = 0.33). When these methods were applied to the NYU and INSIGHT cohorts, however, racial and ethnic distributions and chronic disease burden were consistent across all imputation methods. Slight improvements in the precision of estimates were observed under all imputation approaches compared to a complete case analysis.
Conclusions: BISG imputation may provide a more accurate racial and ethnic classification than single or multiple imputation using anonymized covariates, particularly if the missing data mechanism is MNAR. Descriptive studies of disease burden may not be sensitive to methods for imputing missing data.},
keywords = {electronic health records, methodology},
pubstate = {published},
tppubtype = {article}
}
Study setting and design: In this New York City-based study, we first conducted simulation analyses under different missing data mechanisms to assess the performance of Bayesian Improved Surname Geocoding (BISG), single imputation using neighborhood majority information, random forest imputation, and multiple imputation with chained equations (MICE). Imputation performance was measured using sensitivity, precision, and overall accuracy; agreement with self-reported race and ethnicity was measured with Cohen's kappa (κ). We then applied these methods to impute race and ethnicity in two EHR-based data sources and compared chronic disease burden (95% CIs) by race and ethnicity across imputation approaches.
Data sources and analytic sample: Our data sources included EHR data from NYU Langone Health and the INSIGHT Clinical Research Network from 3/6/2016 to 3/7/2020 extracted for a parent study on older adults in NYC with multiple chronic conditions.
Principal findings: Under simulation analyses, the non-anonymized BISG imputation provided the most accurate classification of race and ethnicity, ranging from 66% to 73% across missing data mechanisms. Anonymized imputation methods were more sensitive to the missing data mechanism, with agreement dropping when race and ethnicity was missing not at random (MNAR) (κ single = 0.25, κ MICE = 0.25, κ randomforest = 0.33). When these methods were applied to the NYU and INSIGHT cohorts, however, racial and ethnic distributions and chronic disease burden were consistent across all imputation methods. Slight improvements in the precision of estimates were observed under all imputation approaches compared to a complete case analysis.
Conclusions: BISG imputation may provide a more accurate racial and ethnic classification than single or multiple imputation using anonymized covariates, particularly if the missing data mechanism is MNAR. Descriptive studies of disease burden may not be sensitive to methods for imputing missing data.
Mandel, Hannah L.; Shah, Shruti N.; Bailey, L. Charles; Carton, Thomas W.; Chen, Yu; Esquenazi-Karonika, Shari; Haendel, Melissa; Hornig, Mady; Kaushal, Rainu; Oliveira, Carlos R.; Perlowski, Alice A.; Pfaff, Emily; Rao, Suchitra; Razzaghi, Hanieh; Seibert, Elle; Thomas, Gelise L.; Weiner, Mark G.; Thorpe, Lorna E.; Divers, Jasmin
In: Journal of Medical Internet Research, vol. 27, pp. e59217, 2025.
Abstract | Links | BibTeX | Tags: COVID-19, electronic health records, long COVID
@article{nokey,
title = {Opportunities and Challenges in Using Electronic Health Record Systems to Study Postacute Sequelae of SARS-CoV-2 Infection: Insights From the NIH RECOVER Initiative},
author = {Hannah L. Mandel and Shruti N. Shah and L. Charles Bailey and Thomas W. Carton and Yu Chen and Shari Esquenazi-Karonika and Melissa Haendel and Mady Hornig and Rainu Kaushal and Carlos R. Oliveira and Alice A. Perlowski and Emily Pfaff and Suchitra Rao and Hanieh Razzaghi and Elle Seibert and Gelise L. Thomas and Mark G. Weiner and Lorna E. Thorpe and Jasmin Divers},
doi = {10.2196/59217},
year = {2025},
date = {2025-03-05},
urldate = {2025-03-05},
journal = {Journal of Medical Internet Research},
volume = {27},
pages = {e59217},
abstract = {The benefits and challenges of electronic health records (EHRs) as data sources for clinical and epidemiologic research have been well described. However, several factors are important to consider when using EHR data to study novel, emerging, and multifaceted conditions such as postacute sequelae of SARS-CoV-2 infection or long COVID. In this article, we present opportunities and challenges of using EHR data to improve our understanding of long COVID, based on lessons learned from the National Institutes of Health (NIH)-funded RECOVER (REsearching COVID to Enhance Recovery) Initiative, and suggest steps to maximize the usefulness of EHR data when performing long COVID research.},
keywords = {COVID-19, electronic health records, long COVID},
pubstate = {published},
tppubtype = {article}
}
Conderino, Sarah; Bendik, Stefanie; Richards, Thomas B.; Pulgarin, Claudia; Chan, Pui Ying; Townsend, Julie; Lim, Sungwoo; Roberts, Timothy R.; Thorpe, Lorna E.
In: BMC Medical Informatics and Decision Making, vol. 22, iss. 1, pp. 91, 2022.
Abstract | Links | BibTeX | Tags: common data model, early detection of cancer, electronic health records, public health informatics, public health surveillance
@article{nokey,
title = {The use of electronic health records to inform cancer surveillance efforts: a scoping review and test of indicators for public health surveillance of cancer prevention and control},
author = {Sarah Conderino and Stefanie Bendik and Thomas B. Richards and Claudia Pulgarin and Pui Ying Chan and Julie Townsend and Sungwoo Lim and Timothy R. Roberts and Lorna E. Thorpe},
doi = {10.1186/s12911-022-01831-8},
year = {2022},
date = {2022-04-06},
urldate = {2022-04-06},
journal = {BMC Medical Informatics and Decision Making},
volume = {22},
issue = {1},
pages = {91},
abstract = {Introduction: State cancer prevention and control programs rely on public health surveillance data to set objectives to improve cancer prevention and control, plan interventions, and evaluate state-level progress towards achieving those objectives. The goal of this project was to evaluate the validity of using electronic health records (EHRs) based on common data model variables to generate indicators for surveillance of cancer prevention and control for these public health programs.
Methods: Following the methodological guidance from the PRISMA Extension for Scoping Reviews, we conducted a literature scoping review to assess how EHRs are used to inform cancer surveillance. We then developed 26 indicators along the continuum of the cascade of care, including cancer risk factors, immunizations to prevent cancer, cancer screenings, quality of initial care after abnormal screening results, and cancer burden. Indicators were calculated within a sample of patients from the New York City (NYC) INSIGHT Clinical Research Network using common data model EHR data and were weighted to the NYC population using post-stratification. We used prevalence ratios to compare these estimates to estimates from the raw EHR of NYU Langone Health to assess quality of information within INSIGHT, and we compared estimates to results from existing surveillance sources to assess validity.
Results: Of the 401 identified articles, 15% had a study purpose related to surveillance. Our indicator comparisons found that INSIGHT EHR-based measures for risk factor indicators were similar to estimates from external sources. In contrast, cancer screening and vaccination indicators were substantially underestimated as compared to estimates from external sources. Cancer screenings and vaccinations were often recorded in sections of the EHR that were not captured by the common data model. INSIGHT estimates for many quality-of-care indicators were higher than those calculated using a raw EHR.
Conclusion: Common data model EHR data can provide rich information for certain indicators related to the cascade of care but may have substantial biases for others that limit their use in informing surveillance efforts for cancer prevention and control programs.},
keywords = {common data model, early detection of cancer, electronic health records, public health informatics, public health surveillance},
pubstate = {published},
tppubtype = {article}
}
Methods: Following the methodological guidance from the PRISMA Extension for Scoping Reviews, we conducted a literature scoping review to assess how EHRs are used to inform cancer surveillance. We then developed 26 indicators along the continuum of the cascade of care, including cancer risk factors, immunizations to prevent cancer, cancer screenings, quality of initial care after abnormal screening results, and cancer burden. Indicators were calculated within a sample of patients from the New York City (NYC) INSIGHT Clinical Research Network using common data model EHR data and were weighted to the NYC population using post-stratification. We used prevalence ratios to compare these estimates to estimates from the raw EHR of NYU Langone Health to assess quality of information within INSIGHT, and we compared estimates to results from existing surveillance sources to assess validity.
Results: Of the 401 identified articles, 15% had a study purpose related to surveillance. Our indicator comparisons found that INSIGHT EHR-based measures for risk factor indicators were similar to estimates from external sources. In contrast, cancer screening and vaccination indicators were substantially underestimated as compared to estimates from external sources. Cancer screenings and vaccinations were often recorded in sections of the EHR that were not captured by the common data model. INSIGHT estimates for many quality-of-care indicators were higher than those calculated using a raw EHR.
Conclusion: Common data model EHR data can provide rich information for certain indicators related to the cascade of care but may have substantial biases for others that limit their use in informing surveillance efforts for cancer prevention and control programs.
Brown, Jeffrey S.; Bastarache, Lisa; Weiner, Mark G.
Aggregating Electronic Health Record Data for COVID-19 Research—Caveat Emptor Journal Article
In: JAMA Network Open, vol. 4, iss. 7, pp. e2117175, 2021.
Links | BibTeX | Tags: COVID-19, electronic health records
@article{nokey,
title = {Aggregating Electronic Health Record Data for COVID-19 Research—Caveat Emptor},
author = {Jeffrey S. Brown and Lisa Bastarache and Mark G. Weiner},
doi = {10.1001/jamanetworkopen.2021.17175},
year = {2021},
date = {2021-07-21},
journal = {JAMA Network Open},
volume = {4},
issue = {7},
pages = {e2117175},
keywords = {COVID-19, electronic health records},
pubstate = {published},
tppubtype = {article}
}
Xu, Zhenxing; Wang, Fei; Adekkanattu, Prakash; Bose, Budhaditya; Vekaria, Veer; Brandt, Pascal; Jiang, Guoqian; Kiefer, Richard C.; Luo, Yuan; Pancheco, Jennifer; Rasmussen, Luke V.; Xu, Jie; Alexopoulos, George; Pathak, Jyotishman
Subphenotyping depression using machine learning and electronic health records Journal Article
In: Learning Health Systems, vol. 4, iss. 4, pp. e10241, 2020.
Abstract | Links | BibTeX | Tags: depression, electronic health records, machine learning
@article{nokey,
title = {Subphenotyping depression using machine learning and electronic health records},
author = {Zhenxing Xu and Fei Wang and Prakash Adekkanattu and Budhaditya Bose and Veer Vekaria and Pascal Brandt and Guoqian Jiang and Richard C. Kiefer and Yuan Luo and Jennifer Pancheco and Luke V. Rasmussen and Jie Xu and George Alexopoulos and Jyotishman Pathak},
doi = {10.1002/lrh2.10241},
year = {2020},
date = {2020-08-03},
journal = {Learning Health Systems},
volume = {4},
issue = {4},
pages = {e10241},
abstract = {Objective: To identify depression subphenotypes from Electronic Health Records (EHRs) using machine learning methods, and analyze their characteristics with respect to patient demographics, comorbidities, and medications.
Materials and methods: Using EHRs from the INSIGHT Clinical Research Network (CRN) database, multiple machine learning (ML) algorithms were applied to analyze 11 275 patients with depression to discern depression subphenotypes with distinct characteristics.
Results: Using the computational approaches, we derived three depression subphenotypes: Phenotype_A (n = 2791; 31.35%) included patients who were the oldest (mean (SD) age, 72.55 (14.93) years), had the most comorbidities, and took the most medications. The most common comorbidities in this cluster of patients were hyperlipidemia, hypertension, and diabetes. Phenotype_B (mean (SD) age, 68.44 (19.09) years) was the largest cluster (n = 4687; 52.65%), and included patients suffering from moderate loss of body function. Asthma, fibromyalgia, and Chronic Pain and Fatigue (CPF) were common comorbidities in this subphenotype. Phenotype_C (n = 1452; 16.31%) included patients who were younger (mean (SD) age, 63.47 (18.81) years), had the fewest comorbidities, and took fewer medications. Anxiety and tobacco use were common comorbidities in this subphenotype.
Conclusion: Computationally deriving depression subtypes can provide meaningful insights and improve understanding of depression as a heterogeneous disorder. Further investigation is needed to assess the utility of these derived phenotypes to inform clinical trial design and interpretation in routine patient care.},
keywords = {depression, electronic health records, machine learning},
pubstate = {published},
tppubtype = {article}
}
Materials and methods: Using EHRs from the INSIGHT Clinical Research Network (CRN) database, multiple machine learning (ML) algorithms were applied to analyze 11 275 patients with depression to discern depression subphenotypes with distinct characteristics.
Results: Using the computational approaches, we derived three depression subphenotypes: Phenotype_A (n = 2791; 31.35%) included patients who were the oldest (mean (SD) age, 72.55 (14.93) years), had the most comorbidities, and took the most medications. The most common comorbidities in this cluster of patients were hyperlipidemia, hypertension, and diabetes. Phenotype_B (mean (SD) age, 68.44 (19.09) years) was the largest cluster (n = 4687; 52.65%), and included patients suffering from moderate loss of body function. Asthma, fibromyalgia, and Chronic Pain and Fatigue (CPF) were common comorbidities in this subphenotype. Phenotype_C (n = 1452; 16.31%) included patients who were younger (mean (SD) age, 63.47 (18.81) years), had the fewest comorbidities, and took fewer medications. Anxiety and tobacco use were common comorbidities in this subphenotype.
Conclusion: Computationally deriving depression subtypes can provide meaningful insights and improve understanding of depression as a heterogeneous disorder. Further investigation is needed to assess the utility of these derived phenotypes to inform clinical trial design and interpretation in routine patient care.
Pressl, Christina; Jiang, Caroline S.; da Rosa, Joel Correa; Friedrich, Maximilian; Vaughan, Roger; Freiwald, Winrich A.; Tobin, Jonathan N.
Interrogating an ICD-Coded Electronic Health Records Database to Characterize the Epidemiology of Prosopagnosia Journal Article
In: Journal of Clinical and Translational Science, vol. 5, iss. 1, pp. e11, 2020.
Abstract | Links | BibTeX | Tags: electronic health records, epidemiology, prosopagnosia
@article{nokey,
title = {Interrogating an ICD-Coded Electronic Health Records Database to Characterize the Epidemiology of Prosopagnosia},
author = {Christina Pressl and Caroline S. Jiang and Joel Correa da Rosa and Maximilian Friedrich and Roger Vaughan and Winrich A. Freiwald and Jonathan N. Tobin},
doi = {10.1017/cts.2020.497},
year = {2020},
date = {2020-06-19},
journal = {Journal of Clinical and Translational Science},
volume = {5},
issue = {1},
pages = {e11},
abstract = {Introduction: Recognition of faces of family members, friends, and colleagues is an important skill essential for everyday life. Individuals affected by prosopagnosia (face blindness) have difficulty recognizing familiar individuals. The prevalence of prosopagnosia has been estimated to be as high as 3%. Prosopagnosia can severely impact the quality of life of those affected, and it has been suggested to co-occur with conditions such as depression and anxiety.
Methods: To determine real-world diagnostic frequency of prosopagnosia and the spectrum of its comorbidities, we utilized a large database of more than 7.5 million de-identified electronic health records (EHRs) from patients who received care at major academic health centers and Federally Qualified Health Centers in New York City. We designed a computable phenotype to search the database for diagnosed cases of prosopagnosia, revealing a total of n = 902 cases. In addition, data from a randomly sampled matched control population (n = 100,973) were drawn from the database for comparative analyses to study the condition's comorbidity landscape. Diagnostic frequency of prosopagnosia, epidemiological characteristics, and comorbidity landscape were assessed.
Results: We observed prosopagnosia diagnoses at a rate of 0.012% (12 per 100,000 individuals). We discovered elevated frequency of prosopagnosia diagnosis for individuals who carried certain comorbid conditions, such as personality disorder, depression, epilepsy, and anxiety. Moreover, prosopagnosia diagnoses increased with the number of comorbid conditions.
Conclusions: Results from this study show a wide range of comorbidities and suggest that prosopagnosia is vastly underdiagnosed. Findings imply important clinical consequences for the diagnosis and management of prosopagnosia as well as its comorbid conditions.},
keywords = {electronic health records, epidemiology, prosopagnosia},
pubstate = {published},
tppubtype = {article}
}
Methods: To determine real-world diagnostic frequency of prosopagnosia and the spectrum of its comorbidities, we utilized a large database of more than 7.5 million de-identified electronic health records (EHRs) from patients who received care at major academic health centers and Federally Qualified Health Centers in New York City. We designed a computable phenotype to search the database for diagnosed cases of prosopagnosia, revealing a total of n = 902 cases. In addition, data from a randomly sampled matched control population (n = 100,973) were drawn from the database for comparative analyses to study the condition's comorbidity landscape. Diagnostic frequency of prosopagnosia, epidemiological characteristics, and comorbidity landscape were assessed.
Results: We observed prosopagnosia diagnoses at a rate of 0.012% (12 per 100,000 individuals). We discovered elevated frequency of prosopagnosia diagnosis for individuals who carried certain comorbid conditions, such as personality disorder, depression, epilepsy, and anxiety. Moreover, prosopagnosia diagnoses increased with the number of comorbid conditions.
Conclusions: Results from this study show a wide range of comorbidities and suggest that prosopagnosia is vastly underdiagnosed. Findings imply important clinical consequences for the diagnosis and management of prosopagnosia as well as its comorbid conditions.
