ServiceNow Research

Leveraging Human Preferences to Master Poetry

Abstract

Large language models have been fine-tuned to learn poetry via supervised learning on a dataset containing relevant examples. However, those models do not generate good-quality output that respects the structure expected for a specific poem type. For instance, generated haikus may contain toxic language, be off-topic, incoherent, and not respect the typical 5-7-5 syllable meter. In this work, we investigate if it is possible to learn an objective function to quantify the quality of haiku—from human feedback—and if this reward function can be used to improve haiku generation using reinforcement learning.

Publication
AAAI Workshops
Rafael Pardinas
Rafael Pardinas
Applied Research Scientist

Applied Research Scientist at Human Machine Interaction Through Language located at London, UK.

Gabriel Huang
Gabriel Huang
Research Scientist

Research Scientist at AI Trust and Governance Lab located at Montreal, QC, Canada.

David Vazquez
David Vazquez
Director of Research Programs

Director of Research Programs at Research Management located at Montreal, QC, Canada.

Alexandre Piche
Alexandre Piche
Research Scientist

Research Scientist at Human Machine Interaction Through Language located at Montreal, QC, Canada.