Instruction Embedding: New Concept, Benchmark and Method for Latent Representations of InstructionsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Instruction data is crucial for improving the capability of Large Language Models (LLMs) to align with human-level performance. Recent research LIMA demonstrates that alignment is essentially a process where the model adapts instructions' interaction style or format to solve various tasks, leveraging pre-trained knowledge and skills. Therefore, for instructional data, the most important aspect is the task it represents, rather than the specific semantics and knowledge information. The latent representations of instructions play roles for some instruction-related tasks like data distillation for instruction tuning and prompt retrieval for in-context learning. However, they are always derived from text embeddings, encompass overall semantic information that influences the representation of task categories. In this work, we introduce a new concept, instruction embedding, and construct Instruction Embedding Benchmark (IEB) for its evaluation. Then, we propose baseline method, prompt-based instruction embedding (PIE), to make the instruction embeddings more attention on task rather than whole semantic information. The evaluation of PIE, alongside other embedding methods on IEB, demonstrates its superior performance in accurately identifying task categories. Moreover, the application of PIE in downstream tasks showcases its effectiveness and suitability for instruction-related tasks.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
0 Replies

Loading