Skip to content
Why Do Headphones A...
 
Notifications
Clear all

Why Do Headphones Affect Pitch Recognition?

17 Posts
11 Users
0 Likes
11.4 K Views
(@scrybe)
Famed Member
Joined: 17 years ago
Posts: 2241
Topic starter  

More so with closed back headphones than open backed ones (frequency build up in the cans causing cancelling, presumably), but all headphones in general do this to some extent. A track can sound okay on cans, but switch to halfway decent monitors and you notice the pitching problems that had eluded you in the cans listen. Why is this?

As a related question....why/how does volume affect pitch recognition? I've heard that louder volumes limit this (perhaps because the loudness is painful enough, so you figure all listening discomfort is volume related?), but I've noticed it with lower volumes too. If I try to transcribe a riff and have my stereo very quiet, I find it more difficult to transcribe pitch accurately, I have to find a happy working volume to perform this task at my best. I've also noticed that quieter volumes make transcribing riffs using distortion a particular nightmare, but I don't know why.

Thoughts? Ideas? Explanations?

Thanks guys. :D

Ra Er Ga.

Ninjazz have SuperChops.

http://www.blipfoto.com/Scrybe


   
Quote
(@blueline)
Noble Member
Joined: 17 years ago
Posts: 1704
 

I don't have an answer but only support for your theory. I learn most songs by ear. When I try to learn them using cans or ear buds, I have a difficult time. (more difficult time I should say :roll: ) I need to use my monitors if I want to get through it rather quickly.

Teamwork- A few harmless flakes working together can unleash an avalanche of destruction.


   
ReplyQuote
(@nicktorres)
Illustrious Member
Joined: 16 years ago
Posts: 5381
 

Because your sinuses resonate and headphones affect your ability to hear outside sound.


   
ReplyQuote
(@moonrider)
Noble Member
Joined: 20 years ago
Posts: 1305
 

Are you having trouble staying on pitch when recording vocals, or tracking leads?
In both cases, keeping your cue mix to the absolute minimum you need will help help. For vocals I like a cue mix consisting of drums and rhythm guitar with a little reverb on the vocal monitor stream only . I avoid bass in that cue mix 'cause I'll tend to sing flat if there's too much bass in the mix.

For tracking leads I like drums, bass and vocal in the cue mix.
I've also noticed that quieter volumes make transcribing riffs using distortion a particular nightmare, but I don't know why.

It's probably got a lot to do with the fact that the fundamental waveform is clipped, and then has odd or even order harmonics added in to muddy things even more.

Playing guitar and never playing for others is like studying medicine and never working in a clinic.

Moondawgs on Reverbnation


   
ReplyQuote
(@gnease)
Illustrious Member
Joined: 20 years ago
Posts: 5038
 

there's no simple answer, as lots goes on in the ear. frequency is a characteristic that is in the ear of the listener. different measuring systems yield different results -- probably better called estimates, as there are very few simple, sinusoidal tones in our world, and only those pure tones can be unequivocally assigned a single frequency metric. simple electronic tuners (most) measure frequency as zero crossings of the electronic waveform. the ear is much different, IIRC it uses a hybrid system of sensing both simple (zero crossing) frequency and wave (interference) patterns as experienced by the vibrating cochlear hairs and their related neurons. the ear is actually sensing a time window of the sound wave traveling through it. that would mean individual freqs cannot be identified precisely without a lot of "interesting" post processing (neural net, brain). depending upon exact condition of loudness and frequency and signal complexity, that neural algorithm for frequency estimation probably changes -- I'd be willing to bet it's been adapted over eons to best sensing and identifying sounds related first to detection of predators, prey and more recently, understandability of communications. of particular interest is what happens when bass frequencies reach the ears: these lower freqs sounds waves create relatively slow compression and rarefaction of the density of the medium in which they travel. that changes the speed of sound in those regions, and this also can be interpreted as a change in either or both frequency or wavelength depending upon the way the frequency is measured. okay, cans vs. monitors:

cans tightly couple the bass freqs into the ear. the bass freqs are pumping (compressing and rarefying) the air in the ear canal, the small bones and the medium in the cochlea. relative to the quick (and lower amplitude) goings-on of the mid and high freqs, these low freq pumpings produce slow changes in the speed, freq and wavelength of the mid to high freq sounds traveling in the ear. effectively, that means the bass is modulating the mids and highs -- sort of smearing their frequency characteristics the way vibrato does. monitors may not couple the bass so intimately into the ear, so the effect may not be as pronounced. whether the modulating of the mids and highs (where we better perceive freq) is a good or bad thing probably depends on the music. certainly vibrato, chorus and similar frequency smearing effects seem to take the edge off being out of tune.

another difference between headphone and monitors is isolation of the L and R signals. headphones provide each ear with a simpler signal to analyze. there are at least two ways to look at response to a simpler signal. the first is that the natural non-linearity (see below) of the ear will be presented with a simpler signal to "mess up" then analyze. the second is probably far more interesting: there is a big difference in the psycho-acoustic masking effects between cans and monitors. psycho-acoustic masking is the tendency of the ear to select and process only certain, parts of a complex signal -- usually the louder components (of course). but the masking is frequency, loudness and history dependent. monitors provide more masking, because they deliver a more complex signal to both ears. masking can have both positive and negative effects on the perceived aesthetic. concrete example: lossy compression algorithms rely heavily on psycho-acoustic masking. any signal components that are below the target (nominal listener models used here) levels of perceptibility can be tossed. a real difficulty in this is not all persons have the same psycho-acoustic thresholds, and different listening conditions (cans v. monitors) change the nature of the masking and its thresholds. pitch can be estimated differently in different situations where psycho-acoustic making occurs -- especially if lossy codecs are in use.

***************

another, more general point: from a black box PoV, the ear is a non-linear (logarithmic) dynamic detection device. and moreover, these non-linearities vary with level and frequency. very handy for detecting a wide dynamic range of sounds, but the ear does not capture and characterize sounds in the pure mathematical sense of what is propagating through the air. the ear interacts highly with many characteristics of the incoming sound (as described above).

the reason we hear beat notes because of the natural non-linear dynamic characteristics of our ears (= all the gooey stuff in and around the cochlea, the localized neural processing and further out brain audio center processing). the creation of a note at the "beat" frequencies (sum and difference) of two pure tones requires a multiplicative (e.g., a x b) or exponential power (e.g., a^2 = a x a or a^3 = a x a x a) process. sinusoidal signals multiplied together produce additional sinusoids at integer multiples of the the sums and differences of the original, component freqs. a logarithmic characteristic can be shown to contain the equivalent of multiplicative processing elements.*

*more accurately: reciprocal multiplicative products (nerdy types: see Taylor polynomial expansions), but the reciprocals of sinusoids produce more complicated sinusoids with more harmonics and phase offsets.

-=tension & release=-


   
ReplyQuote
 KR2
(@kr2)
Famed Member
Joined: 17 years ago
Posts: 2717
 

I don't know . . . but I'd like to take this opportunity . . .to give Gnease a well deserved -1 . . . on his new avatar.

It's the rock that gives the stream its music . . . and the stream that gives the rock its roll.


   
ReplyQuote
(@gnease)
Illustrious Member
Joined: 20 years ago
Posts: 5038
 

it's from one of my guitar straps. originally bought for my daughter's guitar, but traded back to me for its exceptional kitsch-factor.

-=tension & release=-


   
ReplyQuote
 KR2
(@kr2)
Famed Member
Joined: 17 years ago
Posts: 2717
 

Awww so . . . kitsch = cheesey . . . she has very good taste then . . . from her mother's side?

I believe it was Nuno who posted a link to a video on Paul McCartney's production of a song where he played all the instruments. It showed step by step the process of making the tracks for each instrument. In each case, he wore the enclosed headphones . . . but only for the drum track . . . no, not even the drum track (IIRC) . . . just the metronome track . . . apparently to keep the beat.

Ummh . . . that's my feeble attempt at making this post relevant . . . to the topic.

It's the rock that gives the stream its music . . . and the stream that gives the rock its roll.


   
ReplyQuote
 Nuno
(@nuno)
Famed Member
Joined: 18 years ago
Posts: 3995
 

Yep!

http://www.youtube.com/watch?v=-fW8Tk1JJ3M

But he can do it without headphones:

http://www.youtube.com/watch?v=vRq1JtvaG78

(Just trying to complete the Ken's post.)


   
ReplyQuote
(@joehempel)
Famed Member
Joined: 16 years ago
Posts: 2415
 

This has been an interesting topic to read, and the answers within may be the reason I stopped listening to headphones when recording a guitar track. There's just too much for me going on in the head phones that distract me from playing the piece when I do record.

In Space, no one can hear me sing!


   
ReplyQuote
(@nicktorres)
Illustrious Member
Joined: 16 years ago
Posts: 5381
 

Interesting stuff Mr. Nease, and I bow to your obvious expertise.

I beg your opinion....

In loud environments singers will sometimes block one ear so they can hear what they are singing. That has got to include more sinus based sound, no? I'm thinking, perhaps incorrectly, the reason why this may apply to the original question is that large speakers vibrate your body a heck of a lot more than cans and may not so subtley influence what you are hearing through the sinus, actual vibration of the bones around the ear, (heck the bones around ankle too) and even the eardrum itself independent of the hearing mechanism.

Also would a speaker that is not blocking all air flow to the ear, unlike a earphone which is only providing it's own compression and rarification, allow a much more significant sinus response? You know, like the reason a guitar has a sound hole, to allow air movement so the top can move. I know when I've had sinus infections, my hearing is severely impacted. Although the can doesn't block movement through the ear to the sinus, it must limit the movement to that compression/rarification.

One more thing, ears seem to be designed to identify location of the sound, to help in that need not to be eaten by predators. I read somewhere that the ridges in your ears help to create some kind of baffle to assist in that. So if you are wearing cans, earbuds, you must limit that effect. You won't hear the same way you hear without. If you are listening to monitors, you will probably move your head, adjust the angle of your ears to pick up some sound, etc. If you have cans on, no matter the orientation of your head the sound will be from the same location. So might this adaptation of our ears and the processing in our brain that helps locate the source of sound, screw things up when listening to an unnatural source like a sealed system headphone? I mean like pigeons bobbing their heads for binocular vision, we use our ears for stereo hearing. Duh, you say, but if one ear lacks any response from the source the other ear is hearing, that must throw things out of whack. Seems a bit like driving using a back up camera, it's helpful yes, but it isn't the same thing as seeing it with your eyes. Yes I know this is a stretch, but as you alluded to in your opening paragraph, I wonder if in the case of earphone vs monitor, the brain plays just as large a part in the perceived pitch as your ear.

I like the guitar strap. It's very schoolhouse rock.


   
ReplyQuote
(@scrybe)
Famed Member
Joined: 17 years ago
Posts: 2241
Topic starter  

Interesting stuff Mr. Nease, and I bow to your obvious expertise.

<snip snip>

I like the guitar strap. It's very schoolhouse rock.

a definite +1 on both counts, and thanks Nick for posting further questions about this phenomena. Im finding it a really interesting topic.

Ra Er Ga.

Ninjazz have SuperChops.

http://www.blipfoto.com/Scrybe


   
ReplyQuote
(@gnease)
Illustrious Member
Joined: 20 years ago
Posts: 5038
 

I guess I've thrown a lot of stuff on the wall at once. let me try to sort it out a bit.

first on the ear/brain working together: I don't know much beyond what the basic components are and their first order transducing and processing jobs. there is a lot more subtle processing each does, and I roughly understand those as analogues of modern electronics systems (or the other way 'round). Most of the time I go black-box simplistic and consider the aural antennas (Sat dish elements? :wink: = outer ears), mechanical resonators and filters (head, sinuses), the sensing elements (inner ears), neural net local to each inner ear (cochlear nerve cells and synaptic connections) the antenna array (two each of antennas, inner ears ... local neurals), and aural processing parts of the brain alltogethernow as the ear system. it's a bit like dealing with computer's HW, FW and SW. there are issues unique to each. but most of the time , I just concentrate on the application (matching or measuring pitch) and forget about actual implementation. that is unless I believe there is some quasi-meaningful issue that involves some subpart of the whole system. so if I write ear, it usually connotes the entire ear- or hearing system. however the fact that there are multiple aural paths to each inner ear. that there is pair (a dipolar array) of two outer/inner ears, that the cochlea senses sound in manifold, mysterious ways, and that the ear system is dynamically (not statically) perceptive in different dimensions: pitch, multi-pitch, dynamics, directivity (due to modified dipolar array) all seem significant to the discussion.

where was I going ....?

Nick: you seem to be focused on the singer's pitch matching -- you adding yourself to cacophony , while I took Scrybe's Q more generally one of mixing/mastering. so my last post pertainings to the general problem of pitch estimation (during mixing/mastering, but not recording or performing). let me emphasize, that absent the complexity of lossy codec processed source material, most of the issues with absolute pitch perception are related to bad signal-to-noise -- lots of other, loud stuff around the pitch-of-interest making it difficult for the ear to pull out an accurate estimation of pitch, coupled with the innate non-linearity of the ear system. situations such as those posited reveal the cracks the way we perceive sound. it seems we are optimized for survival and communications, and humming a good tune or singing in harmony is not usually relevant to either. consider how few people have the ability to ID absolute pitch (within the 12 tone scale anyway). yet most of us do well enough at relative pitch comparison. the second seems more important to survival (doppler depends on this). the first, not so much.

Scrybe's first part Q was the difference between cans and monitors -- I did NOT take that as a LOUD set of monitors, but something like near-fields at moderate level. and she is abs correct there are difference in pitch perception (estimation) in those cases. the reasons I can list come down to (sorry to restate):

* headphones affecting the outer/inner ear: sound isolating (sealed) headphones -- as opposite to "open air" -- couple bass more tightly into the ear and causing variations in air pressure which impinge upon the eardrum, small bones, and cochlear media. if you've ever had the experience of a jet landing with sinus congestion, you've probably also noticed the shifting of pitch caused by moderate, but steady changes in air pressure. the changes caused by tightly coupled bass (cans) are more subtle and faster changing, but these do modulate the frequency components of other arriving sounds. this essentially smears the pitch of affected sounds. that can make small pitch errors in the native material less objectionable.

many people who use headphones prefer "open-air" type, not because they can hear all the other sounds, but because they feel more comfortable to the ear. closed-back cans, not only create a feeling of isolation (desired), but of "pressure." that may be a real thing, unfortunately, for real studio work, isolating cans often are required. so if these are causing a shift in abs pitch, how do we use them effectively? (studio vocals often are recorded while listening to other tracks in the cans.) answer: the vocal also is fed to those cans. this method has worked for many years. possible lesson: feed all sources to the ears through the same path.

* the crossover of left and right source into right and left (outer/inner) ears that occurs for nearfield monitors definitely changes the problem as compared to isolating L=>L and R=>R. (this is particularly critical for lossy codec processed program material. it's quite possibly become THE dominant issue for cans v. monitors these days. more on that later -- probably a later post. but note this: never trust that pitches are accurately represented in either abs or rel terms in lossy codec [MP3, ACC, WMA. Ogg Vorbis ...] programming at low to medium coding rates. same for fundamentals and their harmonics. NEVER.)

* level differences: the ear (system) is non-linear. that means if you toss it a complex signal, it gets distorted. it's level dependent and has a memory -- recall the "cotton in the ears" post loud concert effect? that is not clogged sinuses. it's your ears system re-establishing its range of operation. I've noted how amusical music sounds during this condition. it's as if all the critical processing now works best at concert levels. everyday levels are now too low for the ear to process. indirectly seems to support the pitch estimation accuracy as a function of level argument.

For Nick's singing scenario: matching vocal pitch to an external source also seems to have a lot of moving parts. I really doubt the loud sounds from external monitors (performance now, not nearfield studio) find their way directly through the body or sinuses as much as through the ear canal. and what does make it through the sinuses is very distorted in frequency response -- predominantly low freqs, for which pitch recognition not so critical -- we don't do it as well as for mids and highs. OTOH, your vocal cords clearly can couple a lot of sound energy through the head to the inner ear. is there a pitch change through that path as compared to through the air (vocal from monitors)? well closing one ear to the outside will change ear canal air pressure, and that can change the pitch of sounds conducted though sinuses to the inner ear. try it. go to a quiet room. stick a finger in your to seal it off and listen. now slowly vary the pressure of the finger on the ear. pitch change? yep. enough so it is not mistaken for a volume (trem) effect. what does that mean for your pitch matching in a live situation? the other, unaltered outer/inner ear will get the "band", but you are sensing your voice pitch through your internal path to the pressure-shifted ear, and maybe even pressing that finger in pretty firmly. interesting, eh? might be interesting to compare this to the less sightly, but less invasive cupped-hand-between-mouth-and-ear method. I notice a lot of singers simply touch a hand to outer ear to do this. maybe they know something.

As far as a loud environment causing issues. I will point out that, yes, I can see a number of error causing problems -- whole non-linear/logarithmic ear problem, but in reality, the ear (system) seems to get less discerning of pitch in the presence of high SPLs -- at least until it adapts (cotton-in-ears state). is this really a problem?

it's well known that trad headphone designs do not support proper sound stage imaging from stereo program material created for speakers. one reason is the loss of the L=>R and R=>L leakage with that appropriate small time delay factor. if the right L/R crossover leakage and appropriate delay factors (how many mS wide is your head?) are added into a source signal intended for headphone listening, the headphones imaging improves. the SRS WOW software bundled with some MP3 and WMA players can do this.

I've avoided the words "psycho-acoustics" in all that. better? lossy codecs ... brilliant technology with interesting accuracy problems ... later.

-=tension & release=-


   
ReplyQuote
(@rahul)
Famed Member
Joined: 18 years ago
Posts: 2736
 

Ouch....ouch...my brain hurts...I never did think so much can be happening under the headphones I wear.


   
ReplyQuote
(@vocalthoughts)
New Member
Joined: 15 years ago
Posts: 2
 

This thread is fascinating for me. Here's something I find very interesting... I (and most singers, I find) hear best with 1/2 hear off. Not a whole side, just slide one half of a closed headphone off one ear. Not sure about the science behind why, but I do know it "orients" me acoustically to my voice and connects my voice to my ears (and the track), enabling me to sing with the most accurate pitch.

With the whole ear off one side, it throws me off. I think that's because messes with my breath control, like singing dry outside vs in a reverberating stairwell. Yes, I can accomplish similar results with open cans but they tend to feed back... and even the better ones made to limit feedback just don't give me as much confidence as closed cans, one off.

Every once in a while I come upon a singer who truly hears better with both cans fully on (of course they are being fed their amplified voice in addition to the track. This makes me suspect that no matter what the science, much of what we can accurately "hear" of pitch is a learned thing.... what our brains get used to hearing and figuring out among signals. I know some older session singers who still miss the days when they sang with speakers instead of headphones. I actually had a large musical theater chorus record that way recently, and with the monitors strategically placed, it worked pretty well.

I will also vouch for the fact that too much bass will throw me off pitch from overtones, and that it's best to pare the mix down as sparsely as possible without taking instruments that give support for emotion out. It's different with each song type.

But I'd love to have some "feedback" from you all on why 1/2 ear is usually the best for doing studio vocals.

Judy Rodman... Power, Path & Performance vocal training
http://judyrodman.com


   
ReplyQuote
Page 1 / 2