Leveraging Human Preferences to Master Poetry

Rafael Pardinas, Gabriel Huang, David Vazquez, Alexandre Piche

February 2023

Abstract

Large language models have been fine-tuned to learn poetry via supervised learning on a dataset containing relevant examples. However, those models do not generate good-quality output that respects the structure expected for a specific poem type. For instance, generated haikus may contain toxic language, be off-topic, incoherent, and not respect the typical 5-7-5 syllable meter. In this work, we investigate if it is possible to learn an objective function to quantify the quality of haiku—from human feedback—and if this reward function can be used to improve haiku generation using reinforcement learning.

Type

Workshop

Publication

AAAI Workshops