Parameter-Efficient Tuning with Information Carrier and Partially Unfrozen Component

Parameter-Efficient Tuning with Information Carrier and Partially Unfrozen Component

ACL ARR 2024 April Submission219 Authors

15 Apr 2024 (modified: 13 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent developments in Large Language Models (LLMs) have motivated widespread research interest in exploring their potential in downstream applications. Given the black box nature of LLMs, prompt engineering becomes the natural method to interact with them. Prompt tuning, including hard and soft prompts, combine manually or automatically composed text templates with adjustable vectors, aiming to improve parameter efficiency with only a small fraction of tunable parameters. However, the training cost has not been significantly reduced due to the presence of network-wide backpropagation, and it often leads to moderate performance deterioration with a soft prompt initialization issue. Late Prompt Tuning (LPT) further reduces training cost and performance drop, by inserting a parameterized vector into the center of the model and introducing intrinsic dimension into the initial soft prompt. With significant saving in training cost, performance of LPT is still weak compared to other parameter-efficiency methods like adapter and LoRA (Low-Rank-Adaptation). We argue that it is caused by the limited capacity of soft prompts to carry complete downstream task information. To deal with this issue, we propose a new parameter-efficient tuning method called Back-and-Forth Tuning (BFT), which achieves better results by combining hard prompts and task information. With a new information component and partially unfrozen module, our model can store task-specific information with limited parameters. Soft prompt and adapter are both viable options for the information component, and results show better performance of the latter. Comprehensive experiments demonstrate that our method improves 2.27% over LPT on average accuracy of 10 tasks, together with faster convergence speed and no increase in training cost.

Paper Type: Long

Research Area: Special Theme (conference specific)

Research Area Keywords: Efficient/Low-Resource Methods for NLP

Contribution Types: Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 219

Loading