<aside> ๐Ÿ“„

๋…ผ๋ฌธ: Learning Transferable Visual Models From Natural Language Supervision

image.png

</aside>

CLIP ( Contrastive Language-Image pre-training )


<aside> ๐Ÿ“–

CLIP์€ ๊ธฐ์กด ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต ๊ตฌ์กฐ๋ฅผ ์™„์ „ํžˆ ์žฌํ•ด์„ํ•ด์„œ ์ด๋ฏธ์ง€ ํ…์ŠคํŠธ ๋ถ„์•ผ์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋งŒ๋“  ๋ชจ๋ธ์ด๋‹ค.

๐Ÿงย ๊ธฐ์กด ์ด๋ฏธ์ง€ ๋ชจ๋ธ, CLIP(๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ) ๋ชจ๋ธ ๋น„๊ต

๋Œ€์กฐํ•™์Šต ( Contrastive learning )

CLIP ํ…์ŠคํŠธ ์ธ์ฝ”๋” (Text Encoder)

CLIP ์ด๋ฏธ์ง€ ์ธ์ฝ”๋” (Image Encoder)

CLIP ํ™œ์šฉ ์‚ฌ๋ก€

1. ์ œ๋กœ์ƒท ๋ถ„๋ฅ˜ (zero-shot classification)

2. ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ (Image-Text Retrieval)

3. ์ด๋ฏธ์ง€ ์บก์…”๋‹ (๋ผ๋ฒจ๋ง)