It should be noted that if you want to try to shoot focus stacks handheld, stacking software is actually pretty good at lining up your shots as long as your movement is reasonably linear and steady. You'll get edge artefacts in your stack from the fames not lining up perfectly, but you can usually either crop that out, or sometimes even manually retouch it using single frames from the stack. I find that for 2-20 frames, it works quite well for me.
That being said, some people really trip out on motorized rails to shoot super deep stacks of non-moving subjects. Not my style, but to each their own...
Here's a single frame at 2:1:
And here's the 22-frame handheld focus stack that the previous shot was pulled from (click through for higher resolution). (Note that the whole point is to not have the subject in the same plane as the sensor.)