Note that comparing 1:1 is when you compare
differently zoomed crops. One image might be zoomed in 5x while the other are zoomed in 8x. Thats why "print" on DxO are the only apples to apples comparison.
To the original question: Comparing same sensor sizes, with same lenses and settings, you will note very little difference. Like Tony Northrup says, 1/3 of a stop is a lot in this context. But if we shrink pixel sizes way beyond todays m43, APS-C and FF pixel sizes there will be a steep increase in noise when the pixel densities are approaching the wavelength of light. Now I talk about hundreds of megapixels in those sensor sizes. BSI and Isocell techniques will help a lot (up to 0,5 - 1 stop?) on pixel sizes around 1 micron when using large aperture lenses. Why only large aperture lenses? Its because they make the light hit the sensor from a wide angle (edges of the back of the wide rear glass) in stead of more or less normal to the sensor plane. Due to diffraction and airy disc size only large apertures make sense with densely packed pixels. Thats why mobile phones (using 0,95 - 2,0 micron pixels) always have constant aperture around f/1,9 - f/2,4. Stopping down doesn't make sense resolution wise.
But there is more to consider. Dynamic range. The pixel wells have an upper limit to how many electrons (converted from photons) they can hold. That limit is largely dependent on the storage area per pixel. So if we replace one pixel with four smaller ones, it usually means they will hold 1/4 the number of photons. This isn't causing any problems so far since the sum of those four would be the same as in one large one. The problem occurs in the other end of the scale. Photon noise per pixel will be higher, actually 4 times higher (or a little more if the pixels are approaching 1 micron). Averaging 2x2 pixels will bring down the noise to 2 times more then the previously large sensor. The net effect is that the signal to noise ratio decreases by a factor of 2, when 2x2 pixels are averaged. Smaller pixels give us less DR, and this is not restorable by averaging. If you look through raw samples from a large range of sensor sizes you will see that this is correct. Just start out with the extremes, phone cameras with raw vs CMOS medium format to see how obvious the difference are, and see that its consistent with formats in between.
That was purely about photon noise, but there are other noise sources. Read noise are normally quite small, but when reading 4 pixels in stead of one makes four instances of read noise in stead of one. Averaging that noise component you will get 2x more read noise from those 4 pixels compared to 1 larger pixel.
---------- Post added 02-21-16 at 04:21 PM ----------
Originally posted by marcusBMG I would not regard T. Northrupp as an authority on any of these matters (albeit his presentational skills are good). Check out his videos on crop factor and effective f stop, where in extensive rabbiting on about this he completely fails to make clear that a different size sensor does not affect light intensity and therefore the only signiificance is in fact relating to depth of field.
Thats nonsense. You should read more about that subject before jumping popular myth conclusions.
Here are one of PF own knowledgeable users writings on the subject.