Smoking Status Variable - writing rules from scratch
In this tutorial, we will be using the smoking status example data to write rules that will allow us to classify charts according to the following labels: Current smoker, Former smoker, Never smoked, or Not dictated.
- Download the smoking status example data.
text_data.csvlabels_train.csvlabels_valid.csv
- Upload the text data file.
- From the Settings view, press on the Project Settings tab.
- Press on the Data File button to add the smoking status text data file (
text_data.csv). - Set the
Data ID columnto be 0. - Set the
Data First Rowto be 1. - Set the
Data Columnto be 1.
- Upload the labels files.
- Still in the Project Settings tab, set the
Create Train and Validation Setflag to be False. We don’t need to create train and validation sets, since we will be uploading our own train and validation labels. - Set
Prediction Modeto be False. - Press on the folder icon next to
Train Label Fileto upload the smoking status train data labels (labels_train.csv). - Press on the folder icon next to
Valid Label Fileto upload the smoking status valid data labels (labels_valid.csv). - Set the
Label ID Columnto be 0. - Set the
Label First Rowto be 1.
- Still in the Project Settings tab, set the
- Set Rules folder file.
- Still in the Project Settings tab, press on the folder icon next to
Rules Folder. Set the rules folder to be a folder in which you can save files. All of your variable rules files will be saved in the rules folder. - Press on the
Save Project Settingsbutton.
- Still in the Project Settings tab, press on the folder icon next to
- Create a new variable.
- Press on the Variable Settings tab.
- Press on green ‘+’ symbol. This will toggle a text input field next to
Current Variable. Type insmoking_status. This will be the name of your variable. - Set the
Label columnto be 1. - Press on the
Save Variable Settingsbutton.
- Set the classifier settings.
- Press on the Classifier Settings tab.
- Set the
Classifier Typeto RegexClassifier. - Set the
Negative LabeltoNot dictated.
- Set up your rules.
- Go to the Rules view.
- Press on the green ‘+’ symbol to add a new label.
- Double click on the new button and type in
Current smoker. You have now added a label forCurrent smoker. - Add labels for
Former smokerandNever smoked.
- Add rules for
Current smoker.- Select the
Current smokerlabel. - Press on the “Add Primary Rule” button.
- In the text box, write “smok”.
- Press on the green button next to the text box, and select “Replace Rule” from the dropdown menu.
- A new field will apear. This is where you enter your secondary rule. In the secondary rule text box, type in “current”, then press on the Enter key. Next, type in “or” and press on the Enter key. Type “every day” and press on the Enter key.
- Set the secondary rule score to be 1.
- You have just created your first set of rules for the
Current smokerlabel! CHARTextract will search for sentences that contain the word “smok” (primary rule). If the sentence also contains either the words “current” or “every day” (secondary rule), then the sentence will be replaced with a score of 1 for theCurrent smokerlabel.
- Select the
- Add rules for
Former smoker.- Select the
Former smokerlabel. - Add a primary rule with the word “smok”.
- Add a Replace secondary rule. Set the text to be
history ofORused to, and assign a score of 1.
- Select the
- Add rules for
Never smoked.- Select the
Never smokedlabel. - Add a primary rule with the “smok”.
- Add a Replace secondary rule. Set the text to be
neverORno history of, and assign a score of 1.
- Select the
- Run the tool.
- Press on the Run button.
- The top pane will display misclassified charts (i.e., the label predicted by CHARTextract is not the same as the ground truth label).

- Refine the rules.
- Press on the misclassified chart with ID 1003.
- CHARTextract classified this chart as
Never smoked, but its ground truth label isFormer smoker. - If you press on the yellow highlighted text, you will see that CHARTextract detected the word “smok” in the sentence.
- The rules in their current state are not able to correctly classify this chart. Let’s fix this.
- Go back to the rules view. Press on the
Former smokerlabel. - Edit the secondary rule and add in the following:
ORformer. - Re-run the tool by pressing on the Run button.
- You’ll notice that now chart with ID 1003 no longer shows up in the misclassified instances.

Further exercises:
- Instead of adding the rules specified in the tutorial, create your own rules.
- Try using the advanced view instead.
- Instead of writing rules from scratch, start from pre-existing rules and modify them.