I wish to categorize color descriptions into eight groups using LLMs. I attempted to identify them using cosine similarity and text embeddings from these color descriptions, but the results were unsatisfactory. It responds correctly when I use instructions that indicate it’s a color classifier. Is it possible to use embeddings in this use case in an efficient manner? Because of the size of the dataset, quick engineering is not a practical solution.
1 Like
Why not have the class output by it? It can be made to respond in any one of your eight categories. Embeddings are not required.
I think BERT does this fairly well. You may train your own model on a ton of example notebooks that are available online. Obtaining high-quality training data requires effort, but the results are invaluable for my multilabel classification project.