BentoML-Extension (bentomlx)#

github_stars pypi_status actions_status documentation_status join_slack


BentoML-Extensions A.K.A (bentomlx) provide Two additional Components,

  • Intel Optimized interService(or Runner)

  • FeatureStore.

pip install bentoml bentomlx

BentoML-Extensionsโ€™s Goal#

todo: KR -> ENG

์ตœ๊ทผ LLM์˜ ๋ฐœ์ „์ด ๊ฐ€์†ํ™”๋จ์— ๋”ฐ๋ผ ๋ชจ๋ธ ์„œ๋น™ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ๋„ ๋Œ€๊ทœ๋ชจ ์—ฐ์‚ฐ๋Ÿ‰์„ ์œ„ํ•ด Nvidia GPU ๊ธฐ๋ฐ˜ ๊ธฐ๋Šฅ๋“ค์„ ์ง€์›ํ•˜๊ณ  ๊ฐœ์„ ํ•ด๋‚˜๊ฐ€๊ณ ์žˆ๋‹ค. BentoML ๋˜ํ•œ ์ตœ๊ทผ ํŠธ๋žœ๋“œ์— ๋งž๊ฒŒ nvidia GPU resourceํ˜ธํ™˜,vLLM๊ณผ ๊ฐ™์€ ๊ณ ์„ฑ๋Šฅ inference ํ”„๋ ˆ์ž„์›Œํฌ ์—ฐ๋™๊ณผ ๊ฐ™์ด GPU ๋ชจ๋ธ์„œ๋น™ ๊ด€๋ จ ๊ธฐ๋Šฅ์ ์ธ ๊ฐœ์„ ์— ํž˜์„ ์“ฐ๊ณ ์žˆ๋‹ค. ์ตœ๊ทผ ์ฃผ๋ชฉ๋ฐ›๊ณ ์žˆ๋Š” ๋ชจ๋ธ๋“ค, ์ฆ‰ Diffusion ๋˜๋Š” LLM๊ณ„์—ด ๋ชจ๋ธ๋“ค์˜ ์—ฐ์‚ฐ๋Ÿ‰์„ ๊ฐ๋‹นํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” GPU ํŠนํžˆ cuda์— ๊ด€ํ•œ ์ง€์›์€ ์šฐ์„ ์ˆœ์œ„๊ฐ€ ๋†’์•„์•ผ๋งŒํ•œ๋‹ค.

์ด๋กœ ์ธํ•ด ๋Œ€๋ถ€๋ถ„์˜ inference engine or serving ์˜คํ”ˆ์†Œ์Šค๋“ค์€ CPU ๊ด€๋ จ ๊ธฐ๋Šฅ๋“ค์— ๋Œ€ํ•œ ์ง€์›์ด ๋นˆ์•ฝํ•  ์ˆ˜ ๋ฐ–์— ์—†๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ CPU inference ์—ฐ์‚ฐ๋Ÿ‰ ์ตœ์ ํ™”์— ๋Œ€ํ•œ ์ง€์›์ด ๋ถ€์กฑํ•˜๋‹ค๋Š” ์˜๋ฏธ๋Š” ์ตœ์šฐ์„ ์ˆœ์œ„๊ฐ€ ์•„๋‹๋ฟ์ด์ง€ ์ตœํ•˜์œ„์šฐ์„ ์ˆœ์œ„๋ผ๋Š” ์˜๋ฏธ๋Š” ์•„๋‹ˆ๋‹ค. is not high priority, it does not mean lowest priority. BentoML์€ ๋งค์šฐ ๋›ฐ์–ด๋‚œ ๋ชจ๋ธ์„œ๋น™ ํ”„๋ ˆ์ž„์›Œํฌ๋‹ค. ๊ธฐ์กด MLํ”„๋ ˆ์ž„์›Œํฌ๋“ค์„ ํ†ตํ•ฉํ•˜๋ฉด์„œ ์ด๋ฅผ ์‰ฝ๊ฒŒ ๋นŒ๋“œ ๋ฐฐํฌํ• ์ˆ˜ ์žˆ๋„๋ก ๊ธฐ๋Šฅ๋“ค์„ ์ œ๊ณตํ•œ๋‹ค. BentoML ๋˜ํ•œ

The BentoML documentation provides detailed guidance on the project with hands-on tutorials and examples. If you are a first-time user of BentoML, we recommend that you read the following documents in order: