You can think of it like this. How big are the sets of misclassified points in the two experiments? How many of your 1000 points are misclassified on average? How accurate an estimate do you think you are getting for each of the misclassified sets?

Actually it's worse than if you want to estimate the misclassification error for one method, as if

and

are the two sets of misclassified points, you are only interested in the points that are in one set but not the other.

Note: if you have a fraction

of a set that you are trying to estimate and you use N sample points, it's not difficult to calculate the standard deviation on such an estimate, which you can use to get a very good handle on how reliable your estimates and conclusions are.