DylanASHillier 's Collections Learning from feedback dir
updated
Suppressing Pink Elephants with Direct Principle Feedback
Paper
• 2402.07896
• Published • 11
Policy Improvement using Language Feedback Models
Paper
• 2402.07876
• Published • 9
Direct Language Model Alignment from Online AI Feedback
Paper
• 2402.04792
• Published • 35
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published • 69
Learning to Learn Faster from Human Feedback with Language Model
Predictive Control
Paper
• 2402.11450
• Published • 23
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper
• 2402.10893
• Published • 12
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper
• 2402.14830
• Published • 24
Iterative Length-Regularized Direct Preference Optimization: A Case
Study on Improving 7B Language Models to GPT-4 Level
Paper
• 2406.11817
• Published • 13
Bootstrapping Language Models with DPO Implicit Rewards
Paper
• 2406.09760
• Published • 41
Artificial Generational Intelligence: Cultural Accumulation in
Reinforcement Learning
Paper
• 2406.00392
• Published • 14
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
• 2406.00888
• Published • 33
Aligning Teacher with Student Preferences for Tailored Training Data
Generation
Paper
• 2406.19227
• Published • 25
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
LLMs
Paper
• 2406.18629
• Published • 42
Can LLMs Learn by Teaching? A Preliminary Study
Paper
• 2406.14629
• Published • 21
Teaching Embodied Reinforcement Learning Agents: Informativeness and
Diversity of Language Use
Paper
• 2410.24218
• Published • 6
RL Zero: Zero-Shot Language to Behaviors without any Supervision
Paper
• 2412.05718
• Published • 4
Moto: Latent Motion Token as the Bridging Language for Robot
Manipulation
Paper
• 2412.04445
• Published • 22
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback
Paper
• 2506.11930
• Published • 53
Provably Learning from Language Feedback
Paper
• 2506.10341
• Published • 9