There are two ways to run a single model on multiple GPUs, data parallelism and device parallelism. In most cases, what you need is most likely data parallelism.
1) Data parallelism
Data parallelism consists of replicating the target model once on each device and using each replica to process a different fraction of the input data. The best way to do data parallelism with Keras models is to use the tf.distribute API.
2) Model parallelism
Model parallelism consists of running different parts of the same model on different devices. It works best for models that have a parallel architecture, e.g. a model with two branches. This can be achieved by using TensorFlow device scopes.
You can also go through the below link for more information.