Using LLMs for Downstream Classification: Prompt, Verbalize, Train

11 Jun 2025

Authors:

(1) Goran Muric, InferLink Corporation, Los Angeles, (California [email protected]);

(2) Ben Delay, InferLink Corporation, Los Angeles, California ([email protected]);

(3) Steven Minton, InferLink Corporation, Los Angeles, California ([email protected]).

Table of Links

Abstract and 1 Introduction

1.1 Motivation

2.2 In-context learning

2.3 Model interpretability

3 Method

3.1 Generating questions

3.2 Prompting LLM

3.3 Verbalizing the answers and 3.4 Training a classifier

4 Data and 4.1 Clinical trials

4.2 Catalonia Independence Corpus and 4.3 Climate Detection Corpus

4.4 Medical health advice data and 4.5 The European Court of Human Rights (ECtHR) Data

4.6 UNFAIR-ToS Dataset

5 Experiments

6 Results

7 Discussion

7.1 Implications for Model Interpretability

7.2 Limitations and Future Work

Reproducibility

Acknowledgment and References

A Questions used in ICE-T method

3.2 Prompting LLM

The LLMs are prompted in two occasions. First, they are prompted to obtain the set of secondary questions Q, as described in Section 3.1. Second, for each document, we prompt the LLM with the document and corresponding secondary questions.

Then, for each question qi the output ai of the LLM is collected, creating a set of outputs for each doc-ument. The textual outputs are then assigned a numerical value and transformed into a feature vector vi, through the verbalization process explained in Section 3.3.

3.3 Verbalizing the answers

The output of the LLM in response to each prompt is limited to one of three possible values: Yes, No, or Unknown, depending on the answer to the question posed in the prompt. These responses are subsequently assigned numerical values for analysis, with “Yes” translating to 1, “No” to 0, and “Unknown” to 0.5.

3.4 Training a classifier

To train a classifier, we use a set V of low-dimensional numerical vectors, where |V=n+1 and corresponding labels X, where each vector vi has a corresponding binary label xi. Vectors V are obtained from the training textual data after prompting LLM to generate n + 1 outputs that are then assigned a numerical value. A classifier is then trained using a 5-fold cross-validation process and grid search for the best parameters. A choice of a specific classification algorithm will depend on the size of training data, values distribution and desired performance on a specific classification metric.

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

← Previous

How ICE-T Trains LLMs with Yes/No Questions for Better Classification

Up Next →

Diverse NLP Datasets for Real-World Text Classification