Introduction

CHARTextract variable extraction relies on regular expression based pattern matching and rule weighting. Knowledge or expertise in regular expressions is not required, but may help in writing more robust rules. Refer to the Regular Expressions primer for basics on using regular expressions and additional resources.

Each label consists of various primary rules. Each primary rule in CHARTextract corresponds to a series of nested pattern matching rules (i.e., regular expressions) and a score (i.e., weight) assigned to each rule.

Variable extraction in CHARTextract follows the steps below:

If there are multiple text notes for an ID, the notes are concatenated (i.e., combined).
Each case is then split into sentences.
For each sentence, CHARTextract verifies if any of the labels have primary rule matches.
If there is a rule match, the corresponding label’s score is updated with the assigned primary rule score.
Once all sentences have been checked, the scores for each label are tallied.
The case is assigned the label with the highest score. In the case where the scores are both 0, the Negative label is assigned to the chart (refer to the Classifier Settings documentation for details on the Negative label).