DeepSpeed

Learning about Distributed Inference with DeepSpeed ZeRO-3 and Docker Compose

Today, we’re going to test out DeepSpeed ZeRO-3 in docker-compose. Perhaps in a future blog post, I’ll cover DeepSpeed-FastGen or how to deploy this on a real multi-node/multi-gpu cluster. I also aim to compare this method vs Multi-Node Inference with vLLM. If you’re setting up a local cluster, consider checking out high bandwidth networking with InfiniBand. It’s surprisingly affordable.

Read More