Language and music

From Citizendium
Jump to navigation Jump to search
This article is a stub and thus not approved.
Main Article
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
This editable Main Article is under development and subject to a disclaimer.

Music and language share a number of common neurobiological, evolutionary and formal similarities and at the same time differ profoundly in other aspects. Our understanding of music as well as of language therefore profits hugely from advances in the other domain.

Formal aspects

Both language and music are based on a limited number of discrete elements which are combined into highly complex, hierarchically structured signals [1]. In the field of linguistics these elements are mainly the phoneme and the morpheme: Phonemes are defined as distinct sounds that confer a significant difference in meaning, they are language-specific: The German letter "r" can be pronounced either between the tongue and the hard palate ("rolling" r) or deeper in the throat, this is mainly a difference of dialect. In Arabic, on the other hand, the same two sounds (ر [Rā] and غ [Ġain]) constitute two entirely different phonemes. At the same time the use of allophones (different pronunciations of the same phoneme) may be rule-governed: While the English sounds [d] and [ð] signal the difference between the words "day" and "they" and are therefore distinct phonemes, in Spanish they are allophones of the letter d – [d] mainly next to the letter n, [ð] mainly between vowels [1]. Looking into music the analogy of notes as phonemes of music is striking. Besides a few exceptions all cultures use scales of discrete pitches arranged within the octave. Another widespread characteristic are differing intervals between single pitch steps as well as the avoidance of steps much smaller than the western semitone [2] – all facilitating the recognition of specific notes. Even allophones are found in (western) music: The so called melodic minor scale uses semitone changes depending on ascending or descending melodies, thus being an example for context sensitive allophony.

From a less formal, more bio-acoustic point of view we can compare music and speech. Music and speech share among other features that they are transposable: As melodies are defined by relationship between notes, not absolute frequencies, a piece of music is considered "the same" when performed on a different starting note. Likewise, a sentence is still the same whether it is uttered by a woman or a man, although differing considerably in pitch. It is not entirely clear if this transposability is unique to human music perception [3]. At the same time music and speech differ in tonal and temporal discreteness: While pitch varies continuously in speech, there is the above mentioned discrete tonal scale in virtually every music. The same is true within the temporal domain; In most of the world's music styles there is an underlying periodic beat providing a reference for sound durations.

The maybe most striking and at the same time highly controversial difference between language and music is meaning. Music in general is not referential in a way language is, transporting an unlimited number of arbitrary meanings and ideas. But as shown below, music is far from meaningless and there is clearly semantic processing of tonal information. Fitch [3]calls this capacity of music to communicate hard-to-define, often subconscious "meanings" a-referentially expressive.

Similarities and differences in neural procession


See also Musical syntax).

Language as well as music consists of rule-governed combinations of basic elements. The formation of words, phrases and sentences resp. chords, chord progressions and keys is highly structured and violations of these rules (e.g. grammatical errors or "sour notes") are easily recognized by perceivers familiar with the combination principles. Syntactical operations allow the mind to transform the sequential input into hierarchical, meaningful patterns. In language these patterns usually follow the structure "who did what to whom", i.e. the syntactical concept of subject, predication and object. In music syntax is described by tension and resolution patterns over time.

While there are well-documented dissociations between syntactical processing in music and language with amusic individuals showing no signs of aphasia (and occasional case reports of aphasic musicians), neuroimaging points to an overlap in the brain regions involved in the processing of language and music. Both sentences and musical chord sequences with varying levels of incongruence resp. dissonance evoke similar EEG potentials. fMRI studies have shown that musical processing activates Broca's and Wernicke's areas, normally involved in language perception. [4]

Formally, both language and music processing involve two steps: Structural storage and structural integration. Structural storage denotes the stacking of predicted elements to come. When a noun is perceived, a verb is expected in order to form a complete clause. In a musical cadence for example the tonic is expected as the final 'resting point'. Structural integration is the connection of every incoming word or pitch with the prior element it depends on.

A way to calculate the perceived complexity of grammatical sentences is Gibson's Dependency Locality Theory (DLT). It evaluates the integration costs for every word in a sentence depending on its location and integration distance, the main idea is that connecting distant elements requires more neural resources. A similar concept exists in the musical domain: Lerdahls Tonal Pitch Space (TPS). Pitches (notes) are perceived as more or less stable depending on their position in the key. At the same time chords are also seen in a hierarchy of stability, with the tonic, dominant and subdominant being the most stable. Finally, musical keys themselves form a structured set in the "circle of fifths". Distance in the TPS is among others a function of stability and proximity in the "circle of fifths", leading to a numerical prediction of the tension felt by listeners of a certain pitch sequence.

In both models integration cost increases with distance between an element and its integration partner in the abstract model space. Reading-time studies or subjective rating of tension profiles in music confirm these theoretical predictions.

Patel's "shared syntactic integration resource hypothesis" (SSIRH) [4] tries to reconcile the paradox between clinical and neuroimaging observations outlined above. It states that there are distinct representation areas in the brain for linguistic syntactic elements (e.g. word classes) and musical pitches or chords, but only one common integration center. This integration center activates the representations of an incoming element and its (known or expected) integration partner. While predicted integration partners are primed in advance, activation levels decay over time. This explains why the integration cost is higher for unexpected partners like "distant" chords as well as for distant words in a clause.

Using SSIRH one is able to explain why there are occurrences of isolated amusia or aphasia (when the respective representations are damaged or underdeveloped), but also neurophysiological signs of shared processing of language and music.


See also Musical semantics).

While it is fairly obvious that specific melodies can confer meaning by learned associations (e.g. national anthems or certain motives to signify character or season in western music) it is less clear if the brain processes music semantically in a way similar to language. Even statements like "Clementia glicked the plag" not referring to existing actions or objects convey the meaning of Clementia doing something to something, semantic information that is encoded in the phrase syntax. Music might be meaningful in a similar way without referring to anything outside but relaying exclusively on its own syntactic structure.

Studies by Steinbeis et al.[5] have shown that, similar to syntactic processing in language and music there is a common sematic integration center. A certain EEG potential (N5) usually observed in subjects when musical sequences fail to fulfill harmonic expectations is reduced only when these subjects are simultaniously confronted with semantically improbable sentences. This interaction is not seen with linguistic syntax violations in semantically "correct" statements, thus pointing towards a distinct semantic integration process that is shared between language and music. The N5 potential can therefore be interpreted as the semantic procession of musical tension-resolution patterns.

Another study [6] shows that the meaning conveyed by music is mainly emotional, a fact that has been assumed for a long time. Using the N400 potential (an established marker for procession of meaning) as an indicator for semantic integration, the authors could show that priming with harmonic or disharmonic chords and subsequent reception of positive (e.g. "love") or negative (e.g. "hate") words lead to increased semantic procession in case of a mismatch. They conclude that music is emotionally meaningful in a way comparable to language.


1. Bright W. (1963) Language and Music: Areas for Cooperation. Ethnomusicology 7:26-32

2. Ball P. (2008) Science & music: facing the music. Nature 453:160-162

3. Fitch WT. (2006) The biology and evolution of music: a comparative perspective. Cognition 100:173-215

4. Patel AD. (2003) Language, music, syntax and the brain. Nat Neurosci 6:674-681

5. Steinbeis N, Koelsch S. (2008) Shared neural resources between music and language indicate semantic processing of musical tension-resolution patterns. Cereb Cortex 18:1169-1178

6. Steinbeis N, Koelsch S. (2008) Comparing the processing of music and language meaning using EEG and FMRI provides evidence for similar and distinct neural representations. PLoS ONE 3:e2226