<aside> π
Zero-Shot Prompting

μΈμ΄ λͺ¨λΈμκ² μμ λ μμ°(Demonstrations)μ μ£Όμ§ μκ³ μμ μ μννλ λ°©λ²λ‘ μ΄λ€.
μΈμ΄ λͺ¨λΈμ΄ κΈ°μ‘΄ μ§μμ μ¬μ©νμ¬ μμ μ μΆλ‘ κ°λ₯νλ©° μ΄λ λλμ λ°μ΄ν°λ₯Ό μ¬μ νμ΅ νκΈ° λλ¬Έμ΄λ€. μ΅μνμ μ 보λ§μΌλ‘ λ€μν μμ μ μνν μ μμ΄, λ°μ΄ν°κ° λΆμ‘±ν κ²½μ°μμ μ μ©νκ² νμ© κ°λ₯ν κΈ°μ μ΄λ€.
Instruction-Tuning
RLHF ( Reinforcement Learning from Human Feedback ) </aside>
ex)
<aside> π
λ Όλ¬Έ: Language Models are Few-Shot Learners

</aside>
<aside> π
λ Όλ¬Έ: Finetuned Language Models Are Zero-Shot Learners,

</aside>
<aside> π
λ Όλ¬Έ: Deep reinforcement learning from human preferences

</aside>