I guess I've thrown a lot of stuff on the wall at once. let me try to sort it out a bit.
first on the ear/brain working together: I don't know much beyond what the basic components are and their first order transducing and processing jobs. there is a lot more subtle processing each does, and I roughly understand those as analogues of modern electronics systems (or the other way 'round). Most of the time I go black-box simplistic and consider the aural antennas (Sat dish elements?

= outer ears), mechanical resonators and filters (head, sinuses), the sensing elements (inner ears), neural net local to each inner ear (cochlear nerve cells and synaptic connections) the antenna array (two each of antennas, inner ears ... local neurals), and aural processing parts of the brain alltogethernow as the
ear system. it's a bit like dealing with computer's HW, FW and SW. there are issues unique to each. but most of the time , I just concentrate on the application (matching or measuring pitch) and forget about actual implementation. that is unless I believe there is some quasi-meaningful issue that involves some subpart of the whole system. so if I write
ear, it usually connotes the entire ear- or hearing system. however the fact that there are multiple aural paths to each inner ear. that there is pair (a dipolar array) of two outer/inner ears, that the cochlea senses sound in manifold, mysterious ways, and that the ear system is dynamically (not statically) perceptive in different dimensions: pitch, multi-pitch, dynamics, directivity (due to modified dipolar array) all seem significant to the discussion.
where was I going ....?
Nick: you seem to be focused on the singer's pitch matching -- you adding yourself to cacophony , while I took Scrybe's Q more generally one of mixing/mastering. so my last post pertainings to the general problem of pitch estimation (during mixing/mastering, but not recording or performing). let me emphasize, that absent the complexity of lossy codec processed source material, most of the issues with absolute pitch perception are related to bad signal-to-noise -- lots of other, loud stuff around the pitch-of-interest making it difficult for the ear to pull out an accurate estimation of pitch, coupled with the innate non-linearity of the ear system. situations such as those posited reveal the cracks the way we perceive sound. it seems we are optimized for survival and communications, and humming a good tune or singing in harmony is not usually relevant to either. consider how few people have the ability to ID absolute pitch (within the 12 tone scale anyway). yet most of us do well enough at relative pitch comparison. the second seems more important to survival (doppler depends on this). the first, not so much.
Scrybe's first part Q was the difference between cans and monitors -- I did NOT take that as a LOUD set of monitors, but something like near-fields at moderate level. and she is abs correct there are difference in pitch perception (estimation) in those cases. the reasons I can list come down to (sorry to restate):
*
headphones affecting the outer/inner ear: sound isolating (sealed)
headphones -- as opposite to "open air" -- couple bass more tightly into the ear and causing variations in air pressure which impinge upon the eardrum, small bones, and cochlear media. if you've ever had the experience of a jet landing with sinus congestion, you've probably also noticed the shifting of pitch caused by moderate, but steady changes in air pressure. the changes caused by tightly coupled bass (cans) are more subtle and faster changing, but these do modulate the frequency components of other arriving sounds. this essentially smears the pitch of affected sounds. that can make small pitch errors in the native material less objectionable.
many people who use
headphones prefer "open-air" type, not because they can hear all the other sounds, but because they feel more comfortable to the ear. closed-back cans, not only create a feeling of isolation (desired), but of "pressure." that may be a real thing, unfortunately, for real studio work, isolating cans often are required. so if these are causing a shift in abs pitch, how do we use them effectively? (studio vocals often are recorded while listening to other tracks in the cans.) answer: the vocal also is fed to those cans. this method has worked for many years. possible lesson: feed all sources to the ears through the same path.
* the crossover of left and right source into right and left (outer/inner) ears that occurs for nearfield monitors definitely changes the problem as compared to isolating L=>L and R=>R. (this is particularly critical for lossy codec processed program material. it's quite possibly become THE dominant issue for cans v. monitors these days. more on that later -- probably a later post. but note this: never trust that pitches are accurately represented in either abs or rel terms in lossy codec [MP3, ACC, WMA. Ogg Vorbis ...] programming at low to medium coding rates. same for fundamentals and their harmonics. NEVER.)
* level differences: the ear (system) is non-linear. that means if you toss it a complex signal, it gets distorted. it's level dependent and has a memory -- recall the "cotton in the ears" post loud concert effect? that is not clogged sinuses. it's your ears system re-establishing its range of operation. I've noted how amusical music sounds during this condition. it's as if all the critical processing now works best at concert levels. everyday levels are now too low for the ear to process. indirectly seems to support the pitch estimation accuracy as a function of level argument.
For Nick's singing scenario: matching vocal pitch to an external source also seems to have a lot of moving parts. I really doubt the loud sounds from external monitors (performance now, not nearfield studio) find their way directly through the body or sinuses as much as through the ear canal. and what does make it through the sinuses is very distorted in frequency response -- predominantly low freqs, for which pitch recognition not so critical -- we don't do it as well as for mids and highs. OTOH, your vocal cords clearly can couple a lot of sound energy through the head to the inner ear. is there a pitch change through that path as compared to through the air (vocal from monitors)? well closing one ear to the outside will change ear canal air pressure, and that can change the pitch of sounds conducted though sinuses to the inner ear. try it. go to a quiet room. stick a finger in your to seal it off and listen. now slowly vary the pressure of the finger on the ear. pitch change? yep. enough so it is not mistaken for a volume (trem) effect. what does that mean for your pitch matching in a live situation? the other, unaltered outer/inner ear will get the "band", but you are sensing your voice pitch through your internal path to the pressure-shifted ear, and maybe even pressing that finger in pretty firmly. interesting, eh? might be interesting to compare this to the less sightly, but less invasive cupped-hand-between-mouth-and-ear method. I notice a lot of singers simply touch a hand to outer ear to do this. maybe they know something.
As far as a loud environment causing issues. I will point out that, yes, I can see a number of error causing problems -- whole non-linear/logarithmic ear problem, but in reality, the ear (system) seems to get less discerning of pitch in the presence of high SPLs -- at least until it adapts (cotton-in-ears state). is this really a problem?
it's well known that trad headphone designs do not support proper sound stage imaging from stereo program material created for speakers. one reason is the loss of the L=>R and R=>L leakage with that appropriate small time delay factor. if the right L/R crossover leakage and appropriate delay factors (how many mS wide is your head?) are added into a source signal intended for headphone listening, the
headphones imaging improves. the SRS WOW software bundled with some MP3 and WMA players can do this.
I've avoided the words "psycho-acoustics" in all that. better? lossy codecs ... brilliant technology with interesting accuracy problems ... later.