Text classification using LLMs

Dario · July 24, 2024, 1:51pm

I wish to categorize color descriptions into eight groups using LLMs. I attempted to identify them using cosine similarity and text embeddings from these color descriptions, but the results were unsatisfactory. It responds correctly when I use instructions that indicate it’s a color classifier. Is it possible to use embeddings in this use case in an efficient manner? Because of the size of the dataset, quick engineering is not a practical solution.

Orson · July 24, 2024, 1:55pm

Why not have the class output by it? It can be made to respond in any one of your eight categories. Embeddings are not required.

felix · August 8, 2024, 4:11am

I think BERT does this fairly well. You may train your own model on a ton of example notebooks that are available online. Obtaining high-quality training data requires effort, but the results are invaluable for my multilabel classification project.