PedsOncSKR

Symbolic Methods project: Domain Coverage and SemRep Evaluation for Knowledge Integration in Pediatric Oncology

This github repository contains code and data for our project.

Code

SemRep WebAPI

The Java-based API used to run SemRep was not included in this repo but can be downloaded and installed here

Note that access to this tool requires a UMLS Terminology Services (UTS) account

Scripts

preprocess_clintrials.py and preprocess_textbook.py

These python files are used to split the data into multiple files to make it compatible with SemRep (input file can not exceed 10,000 characters in length per request)

semrep_command.txt

This text file contains all the code used to run the SemRep WebAPI. The steps involved in this process include:

Remove non ASCII characters
Run SemRep WebAPI (compile before, if changes were made)
Triple extraction part 1: keep rows with "relation"
Triple extraction part 2: run get_triples.py script to get triples

get_triples.py

This python script extracts triples from files created in the process of running blocks of code in semrep_command.txt. Make sure to manually specify inside the python file what input files to run the script on.

plots_for_triples.py

This python script creates all the plots reported in our final paper and final presentation using the data provided in this github repository. Running this code generates all the plots in the plots directory.

Results

Metamap folder

ClinicalTrialsMetaMapResults.pdf, PubMedMetaMapResults.pdf, TextbookMetaMapResults.pdf

These files show MetaMap output accuracy by highlighting mapped phrases with the following color scheme:

Yellow: correct match with best concept
Green: partial match
Pink: incorrect
Purple: not mapped

SemRep folder

SemRepGSComparison.xlsx

This file contains SemRep and MetaMap evaluation results

The different worksheets in this file contain the following information:

PubMed MetaMap Matches, NCT MetaMap Matches, and Textbook MetaMap Matches: which parts of the gold standard concepts were mapped by MetaMap (shown bolded)
MetaMap Comparison:

Summary statistics for MetaMap matchings of gold standard concepts (from PubMed MetaMap Matches, NCT MetaMap Matches, and Textbook MetaMap Matches)
Summary statistics for accuracy of MetaMap output (data from Metamap folder)

SemRep Internal Eval: assign automated SemRep triples to one of the following

True and useful (T+)
True but not useful (T-)
False (F)

SemRep Comparison: summary statistics for SemRep vs. gold standard (used to calculate precision and recall)

semmeddb_combined_triples.csv, clintrials_combined_triples.csv, textbook_combined_triples.csv

These files contain triples automatically extracted via SemRep WebAPI on all the data

subset_triples.csv

These files contain triples automatically extracted via SemRep WebAPI on the subset of data that was also used to create the gold standard triples

PubMedGS_Triples.csv, ClinTrialGS_Triples.csv, TextBookGS_Triples.csv

These files contain triples manually extracted on the subset of data

IdealSemanticRelations_csv.csv

This file contains examples of desired ontologic predications for pediatric ALL used to make an ideal semantic network

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Code		Code
Data		Data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PedsOncSKR

Code

SemRep WebAPI

Scripts

preprocess_clintrials.py and preprocess_textbook.py

semrep_command.txt

get_triples.py

plots_for_triples.py

Results

Metamap folder

ClinicalTrialsMetaMapResults.pdf, PubMedMetaMapResults.pdf, TextbookMetaMapResults.pdf

SemRep folder

SemRepGSComparison.xlsx

semmeddb_combined_triples.csv, clintrials_combined_triples.csv, textbook_combined_triples.csv

subset_triples.csv

PubMedGS_Triples.csv, ClinTrialGS_Triples.csv, TextBookGS_Triples.csv

IdealSemanticRelations_csv.csv

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PedsOncSKR

Code

SemRep WebAPI

Scripts

preprocess_clintrials.py and preprocess_textbook.py

semrep_command.txt

get_triples.py

plots_for_triples.py

Results

Metamap folder

ClinicalTrialsMetaMapResults.pdf, PubMedMetaMapResults.pdf, TextbookMetaMapResults.pdf

SemRep folder

SemRepGSComparison.xlsx

semmeddb_combined_triples.csv, clintrials_combined_triples.csv, textbook_combined_triples.csv

subset_triples.csv

PubMedGS_Triples.csv, ClinTrialGS_Triples.csv, TextBookGS_Triples.csv

IdealSemanticRelations_csv.csv

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages