The specific STs extracted, along with their Type Unique Identifiers (TUIs), include T019 (“Congenital Abnormality”), T020 (“Acquired Abnormality”), T033 (“Finding”), T034 (“Laboratory or Test Result”), T041 (“Mental Process”), T046 (“Pathologic Function”), T047 (“Disease or Syndrom”), T048 (“Mental or Behavioral Dysfunction”), T049 (“Cell or Molecular Dysfunction”), T050 (“Experimental Model of Disease”), T059 (“Laboratory Procedure”), T060 (“Diagnostic Procedure”), T063 (“Molecular Biology Research Technique”), T073 (“Manufactured Object”), T170 (“Intellectual Product”), T184 (“Sign or Symptom”), T190 (“Anatomical Abnormality”), T191 (“Neoplastic Process”), and T201 (“Clinical Attribute”).
## Prompt engineering
We have used 6 different main prompts with different variations, which are hereunder described.
***Prompt 1: Zero-shot instruction**. It instructs the model to list observable characteristics classified by TUIs and STs, emphasizing the use of “phenotypic manifestations” to capture a broad range of symptoms beyond strict biomedical terms. A guided variation (**Zero-shot with guidance**) explicitly specifies the disease or condition in the text to enhance precision when identification is ambiguous, incorporating automation to extract condition names dynamically.