ND filters can also help keep you under your flash sync speed.
You can absolutely stack images to simulate one long exposure (with the limitations noted above
). It starts to require many images, doubling for each stop of the ND filter you're trying to simulate. 1-stop ND filter can be replaced by 2^1=2 images, 2-stop ND filter needs 2^2=4 images, 3-stop ND filter needs 2^3=8 images, ... a 10-stop ND filter needs 2^10=1024 images. I think it's the k3 and later cameras that have a 2000 image limit for the multi-exposure mode and blends them on the spot into one, that would be way more practical than than having to process hundreds of images later. My camera's limit is 9, so I can get just over a 3-stop ND filter, I've used it a few times for a little more blur in borderline situations. There are 'Slow Shutter Apps" for smartphones that I assume work the same way (averaging multiple exposures).
If completely removing people from a crowded scenic location was the goal, you can probably get by with fewer exposures and a 'median' blend mode + manual masking if some of the tourists are particularly lazy. A cattle prod can also help clear the area, be sure to get the image completed before you're arrested.