Instruction Embedding: New Concept, Benchmark and Method for Latent Representations of Instructions

Anonymous

Instruction Embedding: New Concept, Benchmark and Method for Latent Representations of Instructions

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Instruction data is crucial for improving the capability of Large Language Models (LLMs) to align with human-level performance. Recent research LIMA demonstrates that alignment is essentially a process where the model adapts instructions' interaction style or format to solve various tasks, leveraging pre-trained knowledge and skills. Therefore, for instructional data, the most important aspect is the task it represents, rather than the specific semantics and knowledge information. The latent representations of instructions play roles for some instruction-related tasks like data distillation for instruction tuning and prompt retrieval for in-context learning. However, they are always derived from text embeddings, encompass overall semantic information that influences the representation of task categories. In this work, we introduce a new concept, instruction embedding, and construct Instruction Embedding Benchmark (IEB) for its evaluation. Then, we propose baseline method, prompt-based instruction embedding (PIE), to make the instruction embeddings more attention on task rather than whole semantic information. The evaluation of PIE, alongside other embedding methods on IEB, demonstrates its superior performance in accurately identifying task categories. Moreover, the application of PIE in downstream tasks showcases its effectiveness and suitability for instruction-related tasks.

Paper Type: long

Research Area: Resources and Evaluation

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

0 Replies

Loading