There is no definitive solution to this question: filter size is one of the hyperparameters that must be tuned in most cases. There are, however, some important remarks that may be of assistance to you. Smaller filters with a greater number of them are frequently desired.
Four 5x5 filters, for example, have 100 parameters (ignoring bias), while ten 3x3 filters have 90. You may still capture the variety of features in the image with the larger of filters, but with fewer parameters.
Modern CNNs take this concept a step further by using 3x1 and 1x3 convolutional layers in succession. This further reduces the number of parameters, but has no effect on performance.
The stride chosen is similarly significant, but it has an impact on the tensor shape after convolution, and hence the entire network. The basic norm is to use stride=1 in regular convolutions and padding to preserve the spatial size, and stride=2 when downsampling the image.