You can split the audio files into chunks based on the silence. So, you can process the file sentence by sentence and concatenate them to get the result. For that, you need to install some libraries mentioned below:
Pydub: sudo pip3 install pydub
Speech recognition: sudo pip3 install SpeechRecognition.
If you need the code that how to process this, you can revert back. But first most you need to have these libraries installed.