Learning about Distributed Inference with DeepSpeed ZeRO-3 and Docker Compose
Today, we’re going to test out DeepSpeed ZeRO-3 in docker-compose. Perhaps in a future blog post, I’ll cover DeepSpeed-FastGen
or how to deploy this on a real multi-node/multi-gpu cluster. I also aim to compare this method vs Multi-Node Inference with vLLM
. If you’re setting up a local cluster, consider checking out high bandwidth networking with InfiniBand. It’s surprisingly affordable.