Synthesizing spatially complex sound in virtual space: an accurate offline algorithm moreJ Neurosci Methoods, 2001 |
3 views |
Journal of Neuroscience Methods 106 (2001) 29 – 38 www.elsevier.com/locate/jneumeth
Synthesizing spatially complex sound in virtual space: an accurate offline algorithm
Gilad Jacobson a,b,c,*, Iris Poganiatz d, Israel Nelken a,c
Department of Physiology, Hebrew Uni6ersity-Hadassah Medical School, P.O. Box 12272, Jerusalem 91120, Israel b Institute of Computer Science, Hebrew Uni6ersity, Jerusalem 91904, Israel c The Interdisciplinary Center for Neural Computation, Hebrew Uni6ersity, Jerusalem 91904, Israel d Institut fur Biologie II, Lehrstuhl fur Zoologie/Tierphysiologie, RWTH Aachen, Kopernikusstrasse 16, D-52074, Aachen, Germany ¨ ¨ Received 9 October 2000; received in revised form 20 December 2000; accepted 22 December 2000
a
Abstract The study of spatial processing in the auditory system usually requires complex experimental setups, using arrays of speakers or speakers mounted on moving arms. These devices, while allowing precision in the presentation of the spatial attributes of sound, are complex, expensive and limited. Alternative approaches rely on virtual space sound delivery. In this paper, we describe a virtual space algorithm that enables accurate reconstruction of eardrum waveforms for arbitrary sound sources moving along arbitrary trajectories in space. A physical validation of the synthesis algorithm is performed by comparing waveforms recorded during real motion with waveforms synthesized by the algorithm. As a demonstration of possible applications of the algorithm, virtual motion stimuli are used to reproduce psychophysical results in humans and for studying responses of barn owls to auditory motion stimuli. © 2001 Elsevier Science B.V. All rights reserved.
Keywords: Virtual space; Auditory motion; Psychophysics; Physiology; Sound localization; Sound synthesis; Barn owl; HRTF
1. Introduction The psychophysical study of auditory motion processing is still underdeveloped compared to the study of stationary spatial processing (Wightman and Kistler, 1993; Brown, 1994; Blauert, 1997). The physiological study of auditory motion processing is also sparse (Sovijarvi and Hyvarinen, 1974; Ahissar et al., 1992; ¨ ¨ Wagner et al., 1994; Jiang et al., 2000), and most of the physiological studies of spatial processing in the auditory system focus on sound localization of stationary objects (e.g. Middlebrooks and Knudsen, 1987; Imig et al., 1990; Rajan et al., 1990; Brugge et al., 1996 in cats; Moiseff and Konishi, 1983 in barn owls). To study auditory spatial processing, investigators usually use one of two sound delivery methods: freefield presentations using speakers, or headphone presentations of virtual space stimuli. Free-field
* Corresponding author. Tel.: + 972-2-6757087; fax: + 972-26439736. E-mail address: giladj@md.huji.ac.il (G. Jacobson).
presentation requires special mechanical setups and limits the number of possible spatial configurations that can be used within one experiment (Perrott and Tucker, 1988; Perrott and Marlborough, 1989; Saberi and Perrott, 1990; Grantham, 1997). Furthermore, complex spatial configurations, such as motion along curved trajectories or sources that change their velocity randomly over time are difficult to achieve in free-field. As a result, compromises must be made, such as presenting apparent motion stimuli by activating an array of speakers in a volley (e.g. Wagner et al., 1994), simulating motion in free-field (Grantham, 1986) or using artificial sounds that mimick some aspects of auditory motion (Stumpf et al., 1992; Griffiths et al., 1998; Baumgart et al., 1999; Jiang et al., 2000). Virtual space methods employ earphone presentation to simulate sound sources from different positions in space (e.g. Wightman and Kistler, 1989a,b). The pinna and body introduce frequency-dependent level and phase distortions in sounds reaching the eardrum. These effects are quantified by a position-dependent function called the Head Related Transfer Function
0165-0270/01/$ - see front matter © 2001 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 0 2 7 0 ( 0 1 ) 0 0 3 2 7 - 2
30
G. Jacobson et al. / Journal of Neuroscience Methods 106 (2001) 29–38
(HRTF). Virtual space stimuli are generated by modifying the spectrum of a sound source using HRTFs. When subjects are presented over earphones with stimuli modified by their own HRTFs, they usually report that the sound appears to be externalized (Hartmann and Wittenberg, 1996). The HRTFs of different animal species have also been measured (e.g. Musicant et al., 1990; Rice et al., 1992 in cats; Keller et al., 1998 in barn owls; Spezio et al., 2000 in Rhesus monkeys) though there is no known behavioral correlate for the subjective feeling of externalization. In principle, knowing HRTFs from all positions in space should enable simulation of sound generated by sources moving along arbitrary trajectories in space. In this paper, we describe an algorithm for synthesizing auditory motion stimuli in virtual space. The algorithm makes direct use of measured HRTFs from any source (human, animal, artificial ears). The HRTFs are assumed to be sampled densely enough in space so that interpolation of HRTFs at non-measured directions will be valid. No other assumptions are made about the properties of the HRTFs. The algorithm described in this paper is physically validated by comparing the waveforms it generates with waveforms measured during actual motion. Two applications of the algorithm are presented to demonstrate its capabilities. In one application, human subjects perform a discrimination task of motion direction. The detection thresholds are congruent with thresholds previously reported in the literature. In a second application, barn owls are trained to turn their head in response to virtual motion stimuli. 2. Methods
y(t)=
&
d~ s(t−~)hx(t − ~)(~).
(1)
0
To synthesize virtual space stimuli digitally, a discretized version of Eq. (1) must be used: y(l)= % s(l− n)hx(l − n)(n),
n=0
(2)
where it is assumed, for notational simplicity, that time is given in units of the sampling period. Our algorithm consists of a direct computation of this equation. To do so, an estimate of h at all positions along the trajectory of the sound source is required.
2.2. Virtual space algorithm
We assume that the algorithm receives a sampled waveform denoted by s(1), s(2), …, s(T), and a sampled motion trajectory x(1), x(2), …, x(T). Substituting k=l −n, Eq. (2) can be re-written in the form y(l)= %s(k)hx(k)(l− k)
k
(3)
2.1. The con6olution equation
The propagation of sound from any position in space to the eardrum is described by a linear transfer function, the HRTF. The HRTF contains information about both the delay and attenuation due to the propagation, and the spectral distortion due to the angular position of the sound source with respect to the head and torso. The time-domain counterpart of the HRTF is the head-related impulse response (HRIR), which will be denoted by hx (t). When a sound source travels along a trajectory x(t) in space emitting a waveform s(t), the HRIR changes with the position of the sound source. Calculating the expected waveform at the eardrum, y(t), is less trivial than in the stationary case, because the non-stationarity of hx(t) prevents the use of standard linear filtering methods. Nevertheless, the contributions of sounds emitted at different parts of the trajectory add up linearly at the eardrum. The waveform at the eardrum can therefore be described by
The advantage of this form is that y(l) can be calculated in time linear in T by iterating on k and treating l as a vector. Thus, at each time step, the effect of the source sample s(k) at all subsequent time steps is computed at once, and is used to update all future samples of the ear waveform. The synthesis algorithm can be summarized as follows: 1. Estimate the HRIRs hx(1), hx(2), …, hx(T) for all points along the trajectory, using any chosen interpolation method. 2. Initialize output vector by setting y(1), y(2), …, y(T+ L) to 0, where L is larger than the maximal length of HRIRs, including the absolute time delay. This lengthening of the output signal is required to guarantee no loss of the edges of the signal. 3. For n= 1 to T, Add s(n)·hx(n) to y(n), y(n+ 1), …, y(n+ L− 1). The above algorithm is applied twice, once for the waveform at each ear. The entire code was written on MATLAB 5.2 and 5.3 (©MathWorks Inc.) and was executed on Silicon Graphics workstations and PCs under Windows 95 or Linux.
2.3. Implementation details 2.3.1. HRIR sets The algorithm assumes that there is a set of measured HRIRs from a sphere surrounding the head. The radius of the measurement sphere is denoted by rm, and the set of measured HRIRs is denoted by {hx(i )}N 1, x(i )=(qi, i= i, rm). An underlying assumption is that HRIRs are measured densely enough on the sphere, so that an appropriate interpolation method will yield good estimates of actual HRIRs in the non-measured positions.
G. Jacobson et al. / Journal of Neuroscience Methods 106 (2001) 29–38
31
2.3.2. Pre-processing: dissociating relati6e time delays from measured HRIRs The dissociation of relative time delays from the measured HRIRs serves two purposes. First, it enables independent manipulation of the time delay as a cue when synthesizing virtual motion stimuli. An independent manipulation of localization and motion cues has earlier been used to study the relative importance of interaural time difference (ITD) and interaural level difference (ILD) in spatial processing (Rosenblum et al., 1987). Second, this dissociation improves the performance of many interpolation methods used to estimate HRIRs at non-measured positions. After this dissociation, a HRIR at a non-measured position is calculated by interpolating the time delay and the aligned HRIR independently, and recombining them to create an estimated HRIR with the correct time delay. In this work, spatially neighboring HRIRs were aligned by calculating the shift of their cross-correlation peak. The HRIRs were interpolated to achieve a subbin resolution of : 0.1 ms prior to calculating the cross-correlation. This was necessary to avoid phase discontinuities in the motion synthesis, which can be detected by the ear. The time delay for position x, calculated via the above alignment procedure, is denoted by xx. The aligned HRIR for the same position is denoted by h a . The relationship between the true HRIR x and the aligned one is described by hx =h a *l(t −xx ). x 2.3.3. Estimating HRIRs on the sphere For non-measured positions having the same radius as the measurement sphere, i.e. positions for which r= rm, HRIRs can be estimated by interpolation. The motion algorithm does not depend on the specific details of the interpolation method; it only assumes that the estimates are good approximation to the actual HRIRs along the motion trajectory. Appendix A describes the interpolation methods used in the examples shown below. 2.3.4. Estimating HRIRs beyond the sphere Most existing HRIR sets are measured at a single radius rm. To obtain an expression for HRIRs at different radii, amplitude attenuation which is proportional to the distance must be applied. Other environment-dependent distance effects such as non-uniform attenuation of the signal’s spectral envelope (‘coloring’) and reverbation are ignored by the algorithm as described here. Thus, for a given position x = (q, , r), an estimate of the HRIR is given by r (4) hx (t) = m hq, , r m(t− xx ) r Eq. (4) implies that the signal amplitude is proportional to r − 1, and therefore that the signal energy is proportional to r − 2. The absolute time delay for position x, xx, is approximated by
xx = xq, +
r , c
(5)
where c is the propagation velocity of sound in air. This implies that the ITD is assumed to be constant for each direction in space. Parallax errors, which may be significant when the ratio between rm and the head radius rh is small, can be corrected. The parallax correction formulae for the horizontal plane are found in Appendix B.
2.3.5. Modeling sound onset and offset In most experimental setups, the right and left ear signals are gated synchronously with a gradually increasing amplitude envelope. This gating may distort spatial cues present at the onset of the signal. A way around this is to multiply the source waveform by an appropriate amplitude envelope prior to applying the virtual space algorithm. The virtual space algorithm will introduce both the onset disparity and the ongoing disparities automatically. 2.3.6. Presentation of stimuli In many virtual space applications, stimuli are presented through delivery systems different from those used for estimating the HRTFs. In order to achieve accurate reproduction of free-field waveforms, the difference between the delivery systems must be taken into account. This amounts to a constant spectral correction of the signal, which takes into account the transfer function of both delivery systems (Hartmann and Wittenberg, 1996; Pralong and Carlile, 1996a,b).
3. Results
3.1. Analysis of the con6olution equation
As an illustration of the properties of Eq. (1), it is now shown that it implicitly contains the Doppler effect. To demonstrate this, let h be reduced to a delta function depending on the radial distance from the center of the head alone, i.e. hx(t − ~)(~)= l ~−
rt − ~ , c
(6)
where rt is the distance of x(t) from the center of the head. l(~) possesses the following property:
&
d~ (~)l ~−
rt − ~ = (~0), c
where ~0 solves ~− rt − ~ /c= 0. Substituting Eq. (6) in Eq. (1), y(t)=
&
d~ s(t−~)l ~−
0
rt − ~ . c
(7)
32
G. Jacobson et al. / Journal of Neuroscience Methods 106 (2001) 29–38
Consider a sound source traveling at a constant velocity on a radial trajectory away from the head, i.e. rt = 6t. Substituting in Eq. (7) yields y(t) =
=s
&
0
d~ s(t−~)l ~ −
1 t , 1+6/c
6(t −~) 6t =s t − c 6+ c
(8)
which is the correct expression for the Doppler effect (Feynman et al., 1963).
3.2. An example of a synthesized signal
To illustrate typical characteristics of sound synthesized by the virtual space algorithm, a harmonic source signal with a fundamental frequency of 2 kHz and equal amplitude in all higher harmonics up to 20 kHz is used. The sound source moves along a straight trajectory at elevation 0° in front of the head, at a constant velocity of 40 m/s. The trajectory starts 10 m to the left of the midline and ends 10 m to its right, passing in front of the head at a distance of 1 m. A set of HRIRs, which is currently being used in electrophysiological experiments, is used for this illustration. This set was measured from a cat dummy, composed of a polysterene-foam head and torso, with attached artificial ears. To generate the ears, imprints of real cat ears were made using dental impression material (Xantopren L, Bayer). The artificial ears were generated by covering the imprints with a thin coat of silicone and glass fiber dissolved in turpentine. In Fig. 1A the left and right ear waveforms are shown. In both ears, the amplitude is greater when the sound is close to the head (around the midline). Further away from the head, attenuation is greater at the ear contralateral to the sound source, which is a manifestation of ILD. Below, two enlarged portions of the waveform are shown. At first, when the sound source is to the left of the head, the waveform arrives earlier at the left ear and is less attenuated. After the sound source has crossed the midline, the opposite is true. Fig. 1B shows the spectrogram of the left ear waveform. The frequency shift of the partials (: +13.6% when approaching and : −11.7% when receding) is in line with the shift expected of a sound source traveling at 40 m/s.
SW2) and delivered through a standard twitter speaker, positioned approximately 15 cm away from the motor. A probe microphone (Knowles) was mounted on the motor, approximately 8 mm away from its rotation axis. Attenuator and microphone signals were sampled (TDT AD2) at 100 kHz. The motor was programmed to perform a 360° turn at an angular velocity of 1440°/s. The motion controller reported its position when reaching the nominal velocity, and set a hardware trigger for the sound delivery and sound recording systems. A stimulus consisting of six equal amplitude frequency components at 12.5, 15, …, 25 kHz and of duration 200 ms was delivered during the constant velocity portion of the motion. Immediately after this phase, the motor was moved back along the same trajectory in 1.98° steps. After each step, directional impulse responses (DIRs) were estimated using Golay codes (Zhou et al., 1992). An estimate of the expected waveform during motion was calculated by the algorithm, using the DIRs. Direct linear interpolation of aligned DIRs was used to estimate DIRs at non-measured directions (see Appendix A). Fig. 2 depicts the results of the physical validation. On the top of the figure, the full 200 ms of the recorded waveform (lower trace in box) and the estimated waveform (upper trace in box) are shown. Both waveforms were high-pass filtered to remove the frequency range of the motor noise (B5 kHz). The coarse amplitude
3.3. Physical 6alidation of the algorithm
To validate the accuracy of the virtual space algorithm, a waveform recorded during actual motion was compared with the waveform calculated by the algorithm. A servo motor (ESA 2S20, Motor Power Company) was controlled by a DMC-1414 motion controller (Galil Motion Control, Inc.). The stimuli were generated digitally, converted to analog voltage (TDT DA3-4), attenuated (TDT PA4), switched (TDT
Fig. 1. (A) The waveforms of a rightward moving harmonic sound source in the left and right ear. Below are two enlarged portions of the waveform. The arrows point from a peak in the signal at the leading ear to the corresponding peak in the signal at the lagging ear. The difference between the times of the two peaks is the instantaneous ITD. (B) The spectrogram of the left ear signal.
G. Jacobson et al. / Journal of Neuroscience Methods 106 (2001) 29–38
33
Fig. 2. The waveform estimated by the algorithm (upper trace in box) and the waveform recorded during actual motion (lower trace in box). Below, three 0.5 ms segments from the two waveforms are shown. Start times of the segments are 0.2, 100 and 180 ms after stimulus onset.
envelopes of the two waveforms are very similar. The fit of the estimate to the fine structure of the actual motion waveform can be seen in the enlarged 0.5 ms segments shown below. No delay accumulates between the measured waveform and the estimate, and they remain closely matched throughout the 200 ms stimulus. Thus, the overall delay and amplitude dynamics of the waveform are captured by the estimate. To further illustrate the ability of the algorithm to mimic real sounds, the amplitude envelope of each frequency component was extracted by taking the absolute value of the Hilbert transform of a band-pass version of the signal. The unwrapped phase of the Hilbert transform was differentiated and smoothed to yield the instantaneous frequency. Fig. 3 shows the actual and estimated amplitude envelopes and instantaneous frequencies for the six frequency components. The estimated traces closely follow the measured ones. The mean differences between the amplitude envelopes for each of the six components (from low frequency to high frequency) were 1, 0.8, 0.9, 1.7, 2.3 and 2.7 dB. Larger differences appeared in the lower amplitude components, probably due to the decreasing SNR.
psychophysical literature on motion detection (Grantham, 1986; Perrott and Tucker, 1988; Perrott and Marlborough, 1989; Saberi and Perrott, 1990; Grantham, 1997). The virtual sound source traveled along spherical, horizontal trajectories (elevation 0°) at constant angular velocities. The emitted sound was white noise instead of pure tones, as used by Grantham. HRIRs were taken from the ‘bru’ set in the Audis catalogue of human HRTFs (©HEAD acoustics GmbH, 1998); this set elicited the best (although rather poor) subjective externalization reports from the three subjects participating in this experiment. The rather poor externalization reports may be attributed to the fact that HRTFs were not the subjects’ own (Hartmann and Wittenberg, 1996). An adaptive 2-down, 1-up 2-alternative forced choice paradigm was used. The subjects were presented with two stimuli — one traveling from right to left, the other from left to right, in random order. The subjects had to report which interval contained the rightward traveling stimulus. Two different variations of the paradigm were used. In one, all stimuli traveled at the same velocity (45°/s), while the duration was changed adaptively to reach threshold. In the other paradigm, all stimuli had the same duration (300 ms), while the velocity was changed adaptively to reach threshold. The start azimuth of the sound source was chosen randomly for each interval, and ranged between −20° and 20° in 10° resolution. A track was terminated after 12 reversals. Detection thresholds are expressed as the displacement angle of the moving sound at which the subject correctly detected the direction of motion 71% of the time (Levitt, 1971). This point was calculated as the average of the last eight reversal points in the adaptive track. The results of three tracks were averaged for each subject. In Fig. 4, detection thresholds obtained from our experiments (n=3) are compared to detection thresholds as reported in Grantham (1986). The somewhat lower thresholds may be attributed to the use of broadband noise instead of pure tones. A broadband source improves the ability to localize stationary sound sources (Brown et al., 1980; Middlebrooks, 1992), and may affect motion detection too.
3.5. Beha6ioral studies in barn owls
The barn owl is a nocturnal predator that relies on sound localization for tracking prey. Barn owls can move their eyes only 2° –3° (Steinbach and Money, 1973) and cannot move their external ears. They respond to faint sounds by turning the head. The amplitude of the head turn increases with the eccentricity of the sound source. When a free-field sound is presented for a long duration, barn owls can turn their head to the true direction of the sound source. When a brief
3.4. Psychophysical testing in humans
To demonstrate the applicability of the algorithm to psychophysical research, an experiment testing the ability of subjects to detect direction of motion was conducted. The experiment was based on a paradigm used by Grantham (1986), with some modifications. Grantham’s results are representative of the current
34
G. Jacobson et al. / Journal of Neuroscience Methods 106 (2001) 29–38
sound is presented, the head turn starts after the end of the stimulus and the barn owls usually undershoot the true position of the sound source. This undershoot effect increases with the eccentricity of the sound source (Knudsen et al., 1979; Poganiatz et al., 2001). The mechanisms of auditory motion processing in barn owls have not been studied as extensively as stationary localization (Payne, 1971). An attempt to perform a behavioral study of motion processing in owls using simulated motion with a speaker array was not successful (Wagner, personal communication). In this attempt, a speaker array was activated in a volley to simulate motion. The barn owls responded by turning their head to the speaker activated first, disregarding the motion. This was part of the motivation to develop techniques that enable presentation of continuous motion stimuli in virtual space. Two barn owls (marked by their initials W and X) were trained to turn their heads in the direction of static sounds, first under free-field conditions and then under virtual space conditions. The full details of the
methods are found in Poganiatz et al. (2001). Briefly, barn owls were observed during the experiments using an infrared sensitive video system. A head tracker system was used to sample head position (Wagner, 1993) at 200 Hz. The virtual space stimuli were delivered through earphones (Knowles), and employed HRIR sets measured from the same animals. Both the earphones and the head tracker were attached to a metal post that was cemented to the skull. During the experiments the bird was sitting on a perch in a sound attenuated chamber (IAC double wall, 2.4×2.1 × 2.7 m). When the bird had its head pointing in a forward direction (09 10° in azimuth and elevation), a 50 ms noise burst from a randomly chosen azimuth was presented in virtual space over the headphones. The bird had to turn its head to the direction of the stimulus in order to receive a meat reward. After a training period of about 2 weeks, the barn owls responded reliably to static stimuli and turned their heads in the correct direction. The mean head
Fig. 3. (A) The amplitude envelopes of the six frequency components of the stimulus, for measured (solid line) and estimated (broken line) signals. (B) The instantaneous frequencies of the six components of the stimulus, for measured (solid line) and estimated (broken line) signals. The computed trace is shifted up by a constant to enable comparison. Instantaneous frequency plots are scaled so that the frequency range is 0.75% of the carrier frequency.
G. Jacobson et al. / Journal of Neuroscience Methods 106 (2001) 29–38
35
Fig. 4. The angular displacement of sound sources at the threshold for detecting motion direction, in two paradigms: (A) constant (300 ms) duration with varying velocity and (B) constant (45°/s) velocity with varying duration. Mean results of three threshold estimates are shown with standard error bars. Dashed lines represent results from Grantham (1986).
turning latency in response to static stimuli was 130 ms, and all head turns occurred after the end of the 50 ms stimulus. A full account of the responses to static stimuli can be found in Poganiatz et al. (2001). The virtual motion stimuli were then randomly interspersed between the static stimuli, and comprised approximately 10% of the stimuli. All motion stimuli started at azimuth 0° and ended at 950°, with a duration of 1.3 s (the angular velocity of motion stimuli was 9 38.46°/s). When presented with motion stimuli, the barn owls responded with a head turn that was in the
Fig. 6. (A) The amplitude of the first head turn (in degrees) as a function of the latency to the response offset for owl W. Each trial is marked by a circle (). The solid line represents the stimulus trajectory. The broken line represents the linear regression fit. (B) The same as (A), for owl X.
Fig. 5. (A), (B) Head turn responses of owl W. The thin broken line represents the motion of the virtual sound source. The thick solid line represents the head turn response. (C), (D) Head turn responses of owl X.
correct direction in more than 90% of the trials. Fig. 5 depicts four such responses. In each panel, the thin broken line represents the motion stimulus. The thick solid line indicates the head turning response of the owl. For example, Fig. 5A shows the response of owl W to a virtual motion stimulus moving to the right. The response is composed of two head turns to the right. The response latency varied significantly between trials and between owls. The mean response latency of owl W was 670 ms, and ranged between 240 and 1250 ms. The mean response latency of owl X was 420 ms, and ranged between 50 and 960 ms. To gain a better understanding of the responses to the virtual motion stimuli, Fig. 6 presents the relationship between the amplitude and latency of the head turns. Twelve trials of owl W and 13 trials of owl X were used. In Fig. 6A, each circle represents a first head turn of owl W (second head turns, when present, were not included in this analysis). Its position in time is the head turn offset latency, and the magnitude is the absolute value of the head turn in degrees. The solid line depicts the location of the virtual motion stimulus as a function of time. The broken line is the linear regression fit. The slope of the regression line is 37.4°/s (R 2 = 0.5, P= 0.01). Fig. 6B depicts the results for owl
36
G. Jacobson et al. / Journal of Neuroscience Methods 106 (2001) 29–38
X. The slope is 15.5°/s (R 2 =0.56, P = 0.003). The same analysis was carried out using the onset latencies of the head turn, but the fit to the data was worse. The regression slope for owl W is close to the actual stimulus velocity (38.46°/s), while the slope for owl X is significantly lower. Taking the ratio between the regression slope and the stimulus velocity as a measure of undershoot, owls W and X exhibited undershoots of 0.97 and 0.4 respectively in the motion responses (with confidence intervals of [0.29,1.66], [0.16,0.64], respectively at h = 0.05). The undershoot values using static stimuli were 0.73 and 0.66 for the same two owls (Poganiatz et al., 2001). Thus, the undershoot of owl W did not differ statistically between static and motion stimuli, while that of owl X was marginally different.
the motor. This work was supported by a grant from the German –Israeli Foundation (GIF).
Appendix A. Estimation methods for non-measured HRIRs A number of methods have been proposed for reducing the dimensionality of measured HRIR sets, such as principal component analysis (PCA) (Kistler and Wightman, 1992; Jenison et al., 1998) and pole-zero modeling (Jenison, 1995). Different methods have also been proposed for estimating HRIRs at non-measured positions, using measured HRIRs or their reduced descriptions (e.g. Jenison, 1995; Jenison and Fissell, 1996; Hartung et al., 1999). In the present study, two different estimation methods were used. All estimations were performed after dissociating the absolute time delays from the HRIRs. The human and dummy cat HRIR sets were first approximated with a pole-zero model to reduce dimensionality (Jenison, 1995). In pole-zero modeling, each HRIR h(t) is approximated in the z-domain by a rational function of order K: H(z) = B(z) b0 + b1z − 1 + ···+bKz − K , = a0 + a1z − 1 + ···aKz − K A(z) (9)
4. Discussion We presented an accurate algorithm for synthesizing virtual space sound. While virtual space sound algorithms based on measured HRIR sets have been described previously (Jenison et al., 1998), our algorithm attempts to achieve accurate reconstruction of the eardrum waveform without extraneous assumptions. The validity of this equation is tested directly by comparing waveforms recorded during real motion with those estimated by the algorithm. The applicability of the algorithm for auditory research is demonstrated by a psychophysical experiment in humans and a behavioral study in barn owls. The basic approach is independent of the specifics of HRIR modeling and interpolation, which could be implemented in many different ways. In our implementation we emphasized a high level of fidelity to the spectral details of the HRTFs and a high temporal resolution in the extraction of the time delays during the alignment procedure. One direction for further development of the algorithm is to test the sensitivity to these parameters psychophysically and physiologically. It may well be that in both respects, lower fidelity would result in essentially indistinguishable stimuli (e.g. Kulkarni and Colburn, 1998). In this case, the computational burden on the algorithm might be eased.
where the numerator roots are the zeros of the transfer function, and the denominator roots are its poles (Papoulis, 1977). Following Jenison (1995), the HRIR sets were approximated by a set of poles common to the whole set (a0, a1, a2 …, aK ), and specific zero coefficients for each direction (bx,0, bx,1, bx,2, …, bx,K ). Therefore, for each position in space x, the approximated transfer function were written as Hx (z)= Bx (z) . A(z) (10)
Acknowledgements The authors thank Professor Hermann Wagner and Nachum Ulanovsky for comments on the manuscript, and Yehoshua Yehuda for help with programming
The common pole coefficients were calculated using the Prony method (see e.g. Lim and Oppenheim, 1987). The impulse response, h(t), is the inverse ztransform of the transfer function H(z). To estimate a HRIR at a non-measured positions on the sphere, bicubic interpolation was performed independently for the time delay xx (extracted from the alignment procedure), and for each of the zero coefficients (bx,0, bx,1, bx,2 …, bx,K ), using values at spatially neighboring measurement points. In the physical validation procedure, DIRs at nonmeasured positions were estimated by a linear interpolation of the two closest measured DIRs. The time delays were estimated in a similar fashion. This was justified by the dense sampling of HRIRs in this application, which ensured that DIRs changed very little between sampling points.
G. Jacobson et al. / Journal of Neuroscience Methods 106 (2001) 29–38
Appendix B. Correcting parallax errors in the horizontal plane One problem that arises when generalizing HRIRs beyond the sphere is the parallax effect. The spectral properties of the HRIR are determined by the directional acoustic shadow of the head and torso. This is determined by the angle between the sound source and the ear, while HRIRs are usually given in terms of the angle with respect to the center of the head, q. Fig. 7 illustrates the parallax problem in the horizontal plane schematically. The measured HRIR which has the same angle with respect to the ear as the sound source, has . an angle of q =q+ with respect to the center of the head. is given by = y − i−k
h= arctan
r cos q . r sin q− rh
37
When synthesizing sound sources which move along spherical trajectories, the constant stimulus radius, r, can be set to equal the measurement radius, rm. In this case, it can be shown that = 0, making parallax corrections unnecessary. For other trajectories, the size of the parallax error depends on the specific trajectory, but also on the ratio rh/rm. When this ratio is small, the parallax error is negligible. This motivates the use of a large rm when recording HRIRs. As an example of possible parallax errors, when rh = 0.1 m (typical human head) and rm = 1 m, the maximum parallax error (obtained at q= 0°, r= ) is : 6°.
k = y− arcsin i = q + h− y 2
r sin i rm
References
Ahissar M, Ahissar E, Bergman H, Vaadia E. Encoding of soundsource location and movement: activity of single neurons and interactions between adjacent neurons in the monkey auditory cortex. J Neurophysiol 1992;67:203– 15. Baumgart F, Gaschler-Markefski B, Woldorff MG, Heinze HJ, Scheich H. A movement-sensitive area in auditory cortex. Nature 1999;400:724– 6. Blauert J. Spatial Hearing: The Psychophysics of Human Sound Localization, revised ed. Cambridge, MA: MIT Press, 1997. Brown CH. Sound localization. In: Fay RR, Popper AN, editors. Comparative Hearing: Mammals. New York: Springer, 1994. Brown CH, Beecher MD, Moody DB, Stebbins WC. Localization of noise bands by old world monkeys. J Acoust Soc Am 1980;68(1):127– 32. Brugge JF, Reale RA, Hind JE. The structure of spatial receptive fields of neurons in primary auditory cortex of the cat. J Neurosci 1996;16:4420– 37. Feynman RP, Leighton RB, Sands M. The Feynman Lectures on Physics, vol. I. Reading, MA: Addison-Wesley, 1963. Grantham DW. Detection and discrimination of simulated motion of auditory targets in the horizontal plane. J Acoust Soc Am 1986;79(6):1939– 49. Grantham DW. Auditory motion perception: snapshots revisited. In: Gilkey RH, Anderson TR, editors. Binaural and Spatial Hearing in Real and Virtual Environments, 1997:295– 313. Griffiths TD, Rees G, Rees A, Green GG, Witton C, Rowe D, Buchel C, Turner R, Frackowiak RS. Right parietal cotrex is involved in the perception of sound movement in humans. Nat Neurosci 1998;1(1):74– 9. Hartmann WM, Wittenberg A. On the externalization of sound images. J Acoust Soc Am 1996;99(6):3678– 88. Hartung K, Braasch J, Sterbing SJ. Comparison of different interpolation methods for the interpolation of head-related transfer functions. In: Proceedings of the AES 16th International Conference on Spatial Sound Reproduction. Rovaniemi, Finland, 1999. Imig TJ, Irons WA, Samson FR. Single-unit selectivity to azimuthal direction and sound pressure level of noise bursts in cat high-frequency primary auditory cortex. J Neurophysiol 1990;63(6):1448– 66. Jenison RL. A Spherical Basis Function Neural Network for PoleZero Modeling of Head-Related Transfer Functions. IEEE Applications of Signal Processing to Audio and Acoustics, 1995. p. 92 – 95. Jenison RL, Fissell K. A spherical basis function neural network for modeling auditory space. Neural Comp 1996;8(1):115– 28.
Fig. 7. A schematic illustration of the parallax effect in the horizontal plane. The circle with the small radius represents the head. The HRIRs were measured at locations on the circle with the large radius. The angle of a sound source is measured with respect to the center of the head, and 0° is the forward direction. A sound source outside the measurement circle has an angle of q° with respect to the center of the head. The appropriate measured HRIR should have the same angle as the sound source with respect to the ear. This implies that the angle of the measured HRIR with respect to the center of the head should be q + degrees. Angles h, i and k are used in the derivation of the angle .
38
G. Jacobson et al. / Journal of Neuroscience Methods 106 (2001) 29–38 Pralong D, Carlile S. Generation and validation of virtual auditory space. In: Carlile S, editor. Virtual Auditory Space: Generation and Applications. Austin: R.G. Landes, 1996a:109– 52. Pralong D, Carlile S. The role of individualized headphone calibration for the generation of high fidelity virtual auditory space. J Acoust Soc Am 1996b;100(6):3785– 93. Rajan R, Aitkin LM, Irvine DR, McKay J. Azimuthal sensitivity of neurons in primary auditory cortex of cats. I. Types of sensitivity and the effects of variations in stimulus parameters. J Neurophys 1990;64(3):872– 87. Rice JJ, May BJ, Spirou GA, Young ED. Pinna-based spectral cues for sound localization in cat. Hearing Res 1992;58:132– 52. Rosenblum LD, Carello C, Pastore RE. Relative effectiveness of three stimulus variables for locating a moving sound source. Perception 1987;16:175– 86. Saberi K, Perrott DR. Minimum audible movement angles as a function of sound source trajectory. J. Acoust. Soc. Am. 1990; 88: 2639– 44 [published erratum appears in J. Acoust. Soc. Am 1991; 89 (5):2464]. Sovijarvi AR, Hyvarinen J. Auditory cortical neurons in the cat ¨ ¨ sensitive to the direction of sound source movement. Brain Res 1974;73:455– 71. Spezio ML, Keller CH, Marrocco RT, Takahashi TT. Head-related transfer functions of the Rhesus monkey. Hearing Res 2000;144:73– 88. Steinbach MJ, Money KE. Eye movements of the owl. Vision Res 1973;13(4):889– 91. Stumpf E, Toronchuk JM, Cynader MS. Neurons in cat primary auditory cortex sensitive to correlates of auditory motion in three-dimensional space. Exp Brain Res 1992;88:158– 68. Wagner H. Sound-localization deficits induced by lesions in the barn owl’s auditory space map. J Neurosci 1993;13(1):371– 86. Wagner H, Trinath T, Kautz D. Influence of stimulus level on acoustic motion-direction sensitivity in barn owl midbrain neurons. J Neurophysiol 1994;71:1907– 16. Wightman FL, Kistler DA. Headphone simulation of free-field listening. I: Stimulus synthesis. J Acoust Soc Am 1989a;85:858– 67. Wightman FL, Kistler DA. Headphone simulation of free-field listening. II. Psychophysical validation. J Acoust Soc Am 1989b;85:868– 78. Wightman FL, Kistler DJ. Sound localization. In: Yost WA, Popper AN, Fay RR, editors. Human Psychophysics. New York: Springer, 1993. Zhou B, Green DM, Middlebrooks JC. Characterization of external ear impulse responses using Golay codes. J Acoust Soc Am 1992;92(2):1169– 71 Pt 1.
Jenison RL, Neelon MF, Reale RA, Brugge JF. Synthesis of virtual motion in 3D auditory space. IEEE Eng Med Biol 1998;20:1096– 100. Jiang H, Lepore F, Poirier P, Guillemot JP. Responses of cells to stationary and moving sound stimuli in the anterior ectosylvian cortex of cats. Hearing Res 2000;139:69–85. Keller CH, Hartung K, Takahashi TT. Head-related transfer functions of the barn owl: measurement and neural responses. Hearing Res 1998;118(1– 2):13– 34. Kistler DA, Wightman FL. A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction. J Acoust Soc Am 1992;91:1637–47. Knudsen EI, Blasdel GG, Konishi M. Sound localization by the barn owl (Tyto alba) measured with the search coil technique. J Comp Phys 1979;133:1– 11. Kulkarni A, Colburn HS. Role of spectral detail in sound-source localization. Nature 1998;396(6713):747–9. Levitt H. Transformed up-down methods in psychoacoustics. J Acoust Soc Am 1971;49(2):467–77 part 2. Lim JS, Oppenheim AV. Advanced Topics in Signal Processing. New Jersey: Prentice-Hall, 1987. Middlebrooks JC, Knudsen EI. Changes in external ear position modify the spatial tuning properties of auditory units in the cat’s superior colliculus. J Neurophys 1987;57(3):672–87. Middlebrooks JC. Narrow-band sound localization related to external ear acoustics. J Acoust Soc Am 1992;92:2607–24. Moiseff A, Konishi M. Binaural characteristics of units in the owl’s brainstem auditory pathway: precursors of restricted spatial receptive fields. J Neurosci 1983;3(12):2553–62. Musicant AD, Chan JCK, Hind JE. Direction-dependent spectral properties of cat external ear: new data and cross-species comparisons. J Acoust Soc Am 1990;87(2):757–81. Papoulis A. Signal Analysis. New York: McGraw-Hill, 1977. Payne RS. Acoustic location of prey by barn owls (Tyto alba). J Exp Biol 1971;54(3):535– 73. Perrott DR, Tucker J. Minimum audible movement angle as a function of signal frequency and the velocity of the source. J Acoust Soc Am 1988;83:1522–7. Perrott DR, Marlborough K. Minimum audible movement angle: marking the end points of the path traveled by a moving sound source. J Acoust Soc Am 1989;85:1773–5. Poganiatz I, Nelken I, Wagner H. Sound-localization experiments with barn owls in virtual space: influence of interaural time difference on head-turning behaviour. J Assoc Res Otolaryngol 2001 (submitted).
.