CAAPR - A Combined Anglo-American Pronunciation Reference

Alan Beale
  17 April 2007

This page describes CAAPR, the Combined Anglo-American Pronunciation Reference.  CAAPR is a pronouncing dictionary for both British English (RP) and American English (GA).  It is written in a compact and easy-to-read notation which systematizes the differences between these two varieties of English, while still handling exceptions gracefully and accurately.  The CAAPR notation is itself called CAAPR, this time standing for Combined Anglo-American Pronunciation Representation.  It is in many ways a generalization of my FLOSS notation.  To some extent, the CAAPR list resembles the FEWL list, but it avoids some of the complexities of that list by concerning itself almost exclusively with phonological rather than morphological information.

One may reasonably ask the question: What is CAAPR good for?  It is possible that, like FEWL, CAAPR may in time come to be useful for computer generation of dictionaries for reformed English orthographies.  At this time, however, this seems premature, especially since most reformers are unwilling to take on the labor of  trying to please both sides of the Atlantic at once.  I see CAAPR mostly as a useful tool for self-education.  In compiling it, I have learned a lot about the systematic differences between the two English varieties, and I recommend it to anyone else who feels the need for greater insight in this area.

CAAPR is based on two primary sources, the FEWL list for American English and the online EPD dictionary for British English.  This latter dictionary is of very high quality, and I wish I knew who collected it so I could offer them my extravagant thanks.  During the development of CAAPR, I transformed, rearranged and occasionally corrected the EPD, but I find it remarkable how few corrections were needed.  Whatever merits CAAPR may possess are in large measure derived from the labor of the unknown contributors to the EPD.

CAAPR actually consists of three word lists: CAAPR-A, CAAPR-B and CAAPR-C.  CAAPR-A(merican) is a list of words with their GA pronunciations, derived via reformatting from the FEWL list.  CAAPR-B(ritish) is a somewhat different list of words with their RP pronunciations, derived via subsetting and reformatting from the online EPD dictionary (plus some additional common words unaccountably omitted from that document).  CAAPR-C(ombined) is a combined list, comprising the words which are both in CAAPR-A and CAAPR-B, showing both pronunciations together in a single notation.  As will be described here, the CAAPR notation is slightly different for each document, in ways that are unlikely to give the user any difficulty.  The CAAPR-A and CAAPR-B lists each contain approximately 30,000 words.  The CAAPR-C list contains the approximately 28,000 words common to both the A and B lists.

The CAAPR lists show a single (RP or GA) pronunciation for each word on the list.  Of course, the world is much more complicated than this, because many words have multiple acceptable pronunciations, with conflicting data about which is preferable or most prevalent.  Both lists use the technique of lexicographic consensus to resolve such questions.  Note that differences between the GA and RP pronunciations listed in CAAPR may not reflect actual Anglo-American differences.  For instance, both forms may represent pronunciations which are common in both varieties, distinguished more by happenstance than by geography.

The pronunciations in CAAPR-A were determined by consensus of the following dictionaries: The Longman Pronunciation Dictionary, the Merriam-Webster Collegiate Dictionary CD-ROM, and the American Heritage Dictionary CD-ROM.  In difficult cases, the Random House Unabridged Dictionary CD-ROM was also consulted.  Similarly, the pronunciations of CAAPR-B were determined by consensus of the online EPD, the Longman Pronunciation Dictionary and the Shorter OED CD-ROM.  In difficult cases, the Cambridge Pronunciation Dictionary was also consulted.  The Longman Pronunciation Dictionary is the most precise of all these sources, and was often used to resolve issues which could not be easily settled using the other, less technical sources.

Each list has a similar format.  A typical entry from the CAAPR-A and CAAPR-B lists looks like this:

wordplay : w&'dpLE·

The entry is divided by the delimiting string " : " into the traditional spelling and the CAAPR representation of a word.  In some cases, like the following:

ill-use (n) : i'LyU's
ill-use (v) : i'LyU'z


different forms of the same word may be distinguished by a qualifying word or phrase in parentheses.

The format of the CAAPR-C list is similar.  Here are a few example lines:

combat (n) : kombat
combat (v) : k[ø|o]mbat *

Some entries may be followed by an asterisk, which indicates that the American and British stress patterns for the word are different.  The programming which determines this is still under development, and the absence of the asterisk cannot be trusted to mean that the two pronunciations are in fact similar in their stress.  (Note that the CAAPR combined notation does not directly indicate stress, due to technical difficulties.)

The three CAAPR lists can be downloaded using the following links:

All the above files include a copy of this page for easy reference.

An Example of CAAPR

CAAPR is not intended to be used for transcription of continuous text.  Nevertheless, it is useful at this point to give you an idea of the overall appearance of CAAPR, and I know no better way to do this than to transcribe a short bit of prose.  Here is the first paragraph of H.G. Wells' "The Star" (see here for a plain English version, among others) written in CAAPR-C.  It may seem cryptic at first glance, but as one uses CAAPR, one rapidly becomes accustomed to its conventions, and soon such passages present little mystery.

it w[u/o]z on Dø f&ßst dE øv Dø n!U yïr Dat Dý ønWnsm°nt w[u/o]z mEd, ØLmOst s[Y/i]m°LtEnÿøsLý fr[u<o]m TrI øbz&ßvøtòrý$, Dat Dø mOX°n øv Dø pLanît nept!Un, Dý WtRmOst øv ØL Dø pLanît$ Dat µIL øbWt Dø sun, had bikum verý iratik. ø ritAßdEX°n in its v3Losêtý had b[i\I]n søspektîd in disembR. Den, ø fEnt, rimOt spek øv LYt w[u/o]z diskuvRþ in Dø rIj°n øv Dø pRt&ßbþ pLanît. at f&ßst Dis did not kØz ený verý grEt iksYtm°nt. sYøntifik pIp°L, hWevR, fWnd Dý inteLîj°ns rimAßkøb°L inuf, Iv°n bifØß it bikEm nOn Dat Dø n!U bodý w[u/o]z rapîdLý grOiG LAßjR and brYtR, and Dat its mOX°n w[u/o]z kwYt dif°r°nt fr[u<o]m Dý ØßdRLý pr[o|O]gr[ø<e]s øv Dø pLanît$.

Notations on this page

The rest of this page uses certain notations to add precision to the discussion.  English words, used as examples, are enclosed in angle brackets, like <this>.  CAAPR representations of words are enclosed in double brackets, like «Di's».  Individual CAAPR symbols or symbol sequences are enclosed in apostrophes, like 'sO'.  Sampa phonemic transcriptions are enclosed in slashes, like /soU/.  Individual letters or short sequences from traditional spelling are generally written without any punctuation, as in "the letter t" or "the sequence ng".


The CAAPR Notation

As used in the CAAPR-A and CAAPR-B lists, the CAAPR notation is mostly phonemic, with certain non-phonemic notations added.  Because the sound repertoires for GA and RP are different, the same symbol will sometimes have a distinct (but related) meaning for the two varieties.  For both varieties of English, not all speakers have exactly the same phonemes.  CAAPR-A and CAAPR-B target idealized speakers of GA and RP respectively.  The GA pronunciations are based on an idealized American who distinguishes <which> and <witch>, <marry> and <merry>, and <cot> and <caught>, and for whom the two vowels of <above> are distinct, as are the two vowels of <murder>.  Similarly, the RP pronunciations are based on an idealized Briton who distinguishes <candid> and <candied>, and for whom the two vowels of <murder> are distinct.  Speakers with fewer phonemes than the ideal can merge symbols as necessary to represent their own speech.

Note that CAAPR is not suitable for use as a spelling system.  Quite apart from its complexity, it often requires distinct spellings for a single sound, which cannot be resolved by referring to the speech of any particular speaker.  From the perspective of a learner rather than of a linguist, the distinctions would seem quite arbitrary.

CAAPR uses a large repertoire of symbols, including upper and lower case alphabetic characters, punctuation, and letters with diacritics.  The symbols are organized into groups so that the members of each group are somewhat similar, making it easier to master the entire system.  As with any complex system, there are occasional exceptions to this organization, as described below.

The symbol groups and their significance is as follows:

  1. The lower-case alphabetic letters.  Each symbol in this group is assigned its natural English phonemic meaning.  All the vowels are short.  (Note that some letters, notably c, q and x, are omitted.)

  2. Upper-case alphabetic letters, plus a few special symbols and punctuation characters.  The symbols in this group are assigned meanings that are usually related in some fashion to the corresponding letter (or, in the case of symbols and punctuation, a letter they resemble in shape).  Most of the English long vowels, diphthongs, and less common short vowels fall into this group.

  3. Letters with a dieresis (such as ë and Ü).  These generally indicate vowels or diphthongs that occur primarily before the letter r.  There is usually a resemblance in sound to the unaccented letter.

  4. Letters with a circumflex (such as ê and û).  These generally indicate sounds which are different between American and British English, except for ê and î, which indicate indistinct sounds within both American and British English.  There is usually a resemblance in sound to the unaccented letter.

  5. Letters with a grave accent (such as è and ò).  These indicate sounds which not only differ between American and British English but are also differently stressed.  When stressed, there is generally a resemblance in sound to the unaccented letter.

  6. The letters ý, and Ý.  These ought to be written as ŷ and Ŷ, but as these are not in the standard Latin-1 character set, the acute accented y is used instead.

  7. The special characters ', ·, $ and þ.  The first two characters are stress marks, and the latter two serve the purpose of identifying plural and past tense inflections.

CAAPR Phonemic Symbols

The following table shows the phonemic symbols of the CAAPR notation.  In some cases, as noted, a symbol is phonemic only for one of the English varieties.  In such cases, the symbol may be used for the other variety with a similar but non-phonemic meaning.

Symbol
Sampa
Example
Applies to
Notes
a
{
ka't (cat)
Both

ã
A~
elã' (elan)
Both
(1)
A
A:
fA'Døß (father)
Both
(2)
b
b
bE'bý (baby)
Both

C
tS
Ce'LO (cello)
Both
(3)
d
d
de'd (dead)
Both
(4)
D
D
Da't (that)
Both

e
E, e
e'g (egg)
Both

ë
e@
bë'ß (bear)
Brit
(5)
E
eI
ka'nøpE (canape)
Both
(6)
f
f
fY'f (fife)
Both

g
g
ga'g (gag)
Both

G
N
si'GiG (singing)
Both
(7)
h
h
hO'm (home)
Both

H
~
u'HuH (uh-uh)
Both
(8)
i
I
bi'g (big)
Both
(9)
ï
I@
pï'ßs (pierce)
Brit
(10)
I
i:
møXI'n (machine)
Both
(11)
j
dZ
ju'j (judge)
Both

J
Z
vi'J°n (vision)
Both
(12)
k
k
ki'k (kick)
Both

K
x
Lo'K (loch)
Both
(1)
L
l
Li'Lý (lily)
Both
(13)
m
m
me'mbøß (member)
Both

n
n
nu'n (none)
Both

o
Q
to'p (top)
Brit
(14)
õ
o~
kõ'nsýëßJ (concierge)
Both
(1)
ø
@
sO'fø (sofa)
Both
(15)
O
oU, @U
rO'd (road)
Both

Ø
O:
pØ'z (pause)
Both

p
p
po'p (pop)
Both

Q
OI
kQ'n (coin)
Both
(16)
r
r
rØ'riG (roaring)
Both
(17)
&
3`, 3
rif&'r°l (referral)
Both
(18)
s
s
sØ's (sauce)
Both
(19)
t
t
ti'Lt (tilt)
Both
(4)
T
T
Ti'k (thick)
Both

u
V
fu'z (fuzz)
Both
(20)
U
u:, u
sU'p (soup)
Both
(21)
Ü
U@
øbskyÜ'ß (obscure)
Brit

v
v
va'lv (valve)
Both

V
U, u
gV'd (good)
Both
(20), (21)
w
w
wE'wøßd (wayward)
Both
(22)
µ
hw, W
µi'C (which)
Amer
(23)
W
aU
frW'n (frown)
Both

X
S
no'kXøs (noxious)
Both
(24)
y
j
yu'mý (yummy)
Both

Y
aI
Y's (ice)
Both

z
z
zi'gzag (zigzag)
Both
(19)

Notes:

  1. 'ã', 'õ' and 'K' represent non-English sounds that are used by some speakers in a few words.  Unlike the FEWL list, the online EPD shows pronunciations for these words that do not use these sounds.  I have resolved this difficulty by including in each CAAPR list two distinct pronunciations for these words, one using the unusual sounds, and one using only everyday English sounds.  While the dictionaries clearly prefer the foreign pronunciations for these words, I've encountered very few individuals who actually use them.  (But it must be admitted that most of these words are read far more commonly than they are spoken.)

  2. Most occurrences of the 'A' sound in American English are spelled with 'o' in CAAPR-A, as described here.

  3. Technically, the ch sound represented by 'C' is the juxtaposition of the two sounds 't' and 'X'.  When these sounds occur in different components of a compound word (such as <potshot>), the are spelled in CAAPR with 'tX' rather than with 'C': the CAAPR representation of <potshot> is «po'tXot».

  4. When the sound of /d/ or /t/ occurs as a past tense ending (as in <walked> and <crawled>), the symbol þ is used in its place («wØ'kþ» and «krØ'Lþ»), as described here.

  5. The 'ë' sound is generally found only in British English, though it may occur in American English for Americans who pronounce <Mary> differently from <merry>.  The 'ë' symbol is used in American CAAPR in many words where the British pronunciation would be represented as 'ë', as described here.

  6. Most mixed case phonemic notations use the symbol 'N' rather than 'G' for this sound.  I have chosen to use 'G' because I find the spelling «si'GiG» for <singing> easier to read correctly at first glance than «si'NiN».

  7. At first glance, use of the symbol 'E' for the English long a may seem highly unnatural.  I find it easy to get used to because of the similarity between the long a and the short e sound represented by 'e'.  Use of 'E' rather than 'A' for this sound is also unsurprising for those familiar with IPA or Romance languages.

  8. In my particular dialect of American English, there are three words with a nasalized short u: <huh>, <uh-huh> and <uh-uh>.  In CAAPR-A, these words are represented by «hu'H», «u'HhuH» and «u'HuH».  These words are not in the CAAPR-B list, but the same spellings could be used if these words are pronounced similarly in RP.

  9. In many words, the /I/ sound is represented in CAAPR by 'ê' rather than 'i', as described here.  In British English, /I/ may also be represented by 'ý', as noted below.

  10. The 'ï' sound is generally found only in British English.  The 'ï' symbol is used in American CAAPR in many words where the British pronunciation would be represented as 'ï', as described here.

  11. At first glance, use of the symbol 'I' for the English long e may seem highly unnatural.  I find it easy to get used to because of the similarity between the long e and the short i sound represented by 'i'.  Use of 'I' rather than 'E' for this sound is also unsurprising for those familiar with IPA or Romance languages.  In American CAAPR, an unstressed 'I' sound will generally be written as 'ý', as noted below.

  12. Most mixed-case phonemic notations use the symbol 'Z' rather than 'J' for this sound.  This is mostly a question of  taste; «kØrsA'J» is probably a better notation than «kØrsA'Z» for <corsage>, but «vi'Z°n» is probably superior to «vi'J°n» for <vision>.

  13. The symbol for /l/ ought to really be 'l', but in many fonts it is impossible to tell a lower-case l from an upper-case I.  Using 'L' for this sound represents a triumph of practicality over principle.

  14. The 'o' sound is found only in British English, as the standard short o sound.  The 'o' symbol is used in place of 'A' in American CAAPR in most words where the British pronunciation would be represented as 'o', as described below.

  15. The 'ø' symbol designates the schwa, treated by CAAPR as distinct from both the short u ('u') and the stressed vowel of <bird> ('&').  In many words, the schwa is represented in CAAPR by 'ê' rather than 'ø', as discussed later.

  16. Using 'Q' to represent a vowel does take a little bit of getting used to.  I find it helps to think of the tail of the 'Q' as an I attached to an O with an unusually placed ligature.

  17. The 'r' symbol is used for the consonant r.  In British English, it is used only when an /r/ is always pronounced.  Otherwise, the symbol 'ß' is used, as described here.

  18. The symbol '&' represents the sound /3`/ in American CAAPR, but /3/ in British CAAPR.  This means that <bird> is represented as «b&'d» in American CAAPR, but as «b&'ßd» in British CAAPR.  As will be seen, when CAAPR is used as a combined notation, the British usage prevails.  The character '&' was chosen for its visual resemblance to a capital R.

  19. When the sound of /s/ or /z/ occurs as a plural ending (as in <walks> and <crawls>), the symbol $ is used in its place («wØ'k$» and «krØ'L$»), as noted below.

  20. Those familiar with IPA and Sampa might expect 'u' to represent the /U/, and 'V' to represent the /V/ sound.  While this would indeed be logical, I find 'u' for /V/ to be considerably more natural and readable, especially given the prevalence of this sound compared to /U/.

  21. The Longman pronunciation dictionary uses the symbol /u/ as a "neutralization" of /u:/ in many words, such as <situation> and <regulate>.  In American CAAPR, I have transcribed this vowel as 'U', which is in agreement with the popular American dictionaries.  But for British English, I represent it as 'V', which is consistent with popular British dictionaries.  When CAAPR is used as a combined notation, the British usage prevails.

  22. Some occurences of the 'w' sound in British English are spelled with 'µ' in CAAPR, as described here.

  23. The 'µ' sound is found only in American English, in association with the digraph wh.  (Even though most Americans do not use this sound, American CAAPR is oriented towards an ideal speaker who does.)  The 'µ' symbol is used in British CAAPR in most words where the American pronunciation would be represented as 'µ', as noted below.

  24. Most mixed-case phonemic notations use the symbol 'S' rather than 'X' for this sound.  I have chosen to use 'X' because I find spellings such as «pre'Xøs» rather than «pre'Søs» for <precious> to be easier to read correctly at first glance.


CAAPR Ortho-phonemic symbols

Both of the English varieties targeted by CAAPR have some unique sounds whose use is generally predictable from the spelling of the words which contain it.  For instance, the sound designated by CAAPR 'µ' does not occur in British English, but one can predict, in almost all cases, that a word pronounced with a /w/ in British English, but spelled with wh, will be pronounced as 'µ' in American English (at least by those Americans who use that sound).  I call the symbols with this property ortho-phonemic, as they have phonemic significance in one variety, but orthographic significance in the other variety.

The use of the ortho-phonemic symbols in CAAPR brings the American and British spellings closer to one another, in a way that makes sense even for speakers not familiar with the other variety.

The ortho-phonemic symbols are:


Symbol
Sampa
Example
Variety
Spelling
Notes
ë
E
bë'r (bear)
Amer
air, ar, are, ear, eir
(1)
ï
I
pï'rs (pierce)
Amer
ear, eer, er, ere, ier
(2)
o
A:
to'p (top)
Amer
o, qua, wa, en
(3)
ß
(r)
rØ'ß (roar)
Brit
r, final or before cons.
(4)
µ
w
µi'C (which)
Brit
wh
(5)


Notes:

  1. Most Americans pronounce words which in RP are pronounced with the /e@/ diphthong with a simple /E/, normally written in CAAPR as 'e'.  Examples of such words are <fair>, <Mary>, <spare>, <bear> and <their>.  American CAAPR represents the combination /Er/ with 'ër' when the regular spelling uses one of the forms listed in the table above.

  2. Most Americans pronounce words which in RP are pronounced with the /I@/ diphthong with a simple /I/, normally written in CAAPR as 'i'.  Examples of such words are <fear>, <beer>, <serious>, <here> and <pierce>.  American CAAPR represents the combination /Ir/ with 'ïr' when the regular spelling uses one of the forms listed in the table above.

  3. Most Americans pronounce words which in RP use the short o vowel /Q/ with /A:/, normally written in CAAPR as 'A'.  Examples of such words as <stop>, <qualify>, <wander> and <entree>.  American CAAPR represents /A:/ with 'o' when the regular spelling has one of the forms listed in the table above.

  4. RP is a non-rhotic form of English.  This means that the letter r is generally not pronounced when it is followed by a consonant, or at the end of a word.  (The consonant may, however, be spoken in speech at the end of a word when the next word begins with a vowel.)  For British English, these suppressed r's are represented in CAAPR by the letter 'ß', as in «fAß» (far) or «sØßd» (sword).  This symbol was chosen for its visual resemblance to a capital R.  (In fact, the symbol 'R' could have been used, but I think an unusual character is better at communicating the unique nature of the construct.)  Note that CAAPR indicates pronunciation of words, not of larger units, so the phrase "here and there" would be written in CAAPR as «hïß ønd Dëß», even though the sequence of sounds would be more like «hïrønDëß».

  5. Britons generally pronounce words which Americans might pronounce with the /hw/ sound with a simple /w/, normally written in CAAPR as 'w'.  Examples of such words are <where> and <awhile>.  British CAAPR represents the sound /w/ with 'µ' when the regular spelling uses wh.


CAAPR Special symbols

CAAPR uses a number of additional non-phonemic symbols for various purposes.  These symbols are listed in the table below, and explained in the following notes.


Symbol
Sampa
Type
Example
Notes
ê
ø, I, 1
Indef. sound
ma'gnêt (magnet)
(1)
ý
I, i, i:
Indef. sound
ha'pý (happy)
(2)
ÿ
I, i, j
Indef. sound
prI'vÿøs (previous)
(3)
°
(ø)
Optional sound
ma'jik°Lý (magically)
(4)
*
(ø)
Optional sound
tY'*L (tile)
(5)
¹
(ø), (I), (1)
Optional sound
kri'm¹n°L (criminal)
(6)
þ
d, t
Morpheme
dra'gþ (dragged)
(7)
$
s, z
Morpheme
dru'g$ (drugs)
(8)
'
" (or ')
Stress
øLY'v (alive)
(9)
·
% (or ,)
Stress
do·mênE'X°n (domination)
(9)


Notes:

  1. One of the features of both GA and RP is the "indistinct i".  Consider the word <magnet>.  Some people pronounce this as «ma'gnit», while others pronounce it as «ma'gnøt».  Some speakers may use either pronunciation at random.  In such cases, I use the term "indistinct i" to describe the vowel.  It is not a distinct sound: it is always pronounced as /@/ or /I/ (or according to some authorities as /1/).  The indistinct i is represented in CAAPR by the symbol 'ê'.

  2. Many English words are spelled with a final y used as a vowel, as in <many> and <quality>.  American dictionaries generally show the sound of this vowel as an unstressed long e.  British dictionaries, on the other hand, often show it as a short i.  The Longman dictionary uses the symbol /i/ to represent it.  For British English /i/ is distinguished from /i:/ by length as well as stress, while for American English, only the stress difference is apparent.  CAAPR uses the symbol 'ý' where Longman uses /i/.  One difference is that, for American English only, where an unstressed /i:/ occurs, it is spelled in CAAPR as 'ý' rather than 'I'.  This mostly affects words ending in /i:z/ such as <rabies>, which rhymes with <babies> for Americans.  <rabies> is spelled «rE'býz» in American CAAPR, but «rE'bIz» in British CAAPR.

  3. The symbol 'ÿ' is a variant of 'ý', representing a sound which may either be one of the vowel sounds of 'ý', or the consonant 'y'.  The Longman dictionary uses the symbol /i/, linked to the following sound by a tie, to represent it.  The sample word <previous> is typical of the words where it occurs, as the word might be pronounced either «prI'výøs» or «prI'vyøs».

  4. The symbol '°' (a superscript 0) is an alternate form of 'ø', indicating a schwa sound which may be omitted.  '°' will be followed by one of the liquid sounds 'L', 'm', 'n', 'r' or 'ß'.  When '°' is the last vowel of a word, or when it is followed by two consonants, it indicates that the following consonant may be syllabic, as in «ba't°L» or American «pO'k°r».  The Longman dictionary uses a raised schwa symbol, indicating a sound ordinarily omitted but sometimes spoken, in this situation.  The '°' notation is unusual in that CAAPR uses this notation if any of its source dictionaries show the sound as optional rather than insisting on consensus.  The notation was chosen due to its suggestion of a tiny 'ø' (or of Longman's raised schwa).

  5. The '*' symbol is an alternate form of '°', occurring in certain situations where the possible schwa sound is often considered to be an interpolation, generally not indicated by the traditional spelling.  The main problem it addresses is that many speakers will insert a schwa sound between a long vowel and an 'L' or 'r', a phenomenon Longman calls "breaking".  For instance, I pronounce <boil> and <royal> as a rhyme, «bQ'°L» and «rQ'°L» respectively.  Others may pronounce them both without the schwa, or may pronounce only <royal> with the schwa.  I feel it is useful to somehow distinguish the CAAPR notations for these two words, and so use «bQ'*L» for <boil>, but «rQ'°L» for <royal>.  Use of '*' in this fashion after a vowel is the most common use, but it may also appear after a consonant, as in «k&'r*L» for <curl> (American) or «re's*LiG» for wrestling.  The notation was chosen for its similarity to '°'.  Distinguishing '*' from '°' is probably only useful when CAAPR is being used as the basis for a spelling system.

  6. The symbol '¹' (a superscript 1) is an alternate form of 'ê', indicating an indistinct i sound which may be omitted.  It is probably easiest to think of it as indicating a choice between '°' and 'i'.  As with '°', CAAPR will use this notation in place of 'ê' if justification is found in any of its source dictionaries.  The notation was chosen due to its suggestion of a tiny 'i'.

  7. The symbol 'þ' is used in CAAPR to represent a regular past-tense inflection, spelled as -d or -ed in standard spelling.  The pronunciation is either /t/ or /d/, /t/ if preceded by a voiceless consonant, or /d/ otherwise.  'þ' is also used for words derived from past tenses, such as <confusedly> («kønfyU'zêþLý»), and words where an -ed suffix is applied to a noun, as <jeweled> («jU'øLþ»).  The symbol 'þ' was chosen here because of its similarity in appearance to a capital D, and its phonetic association with t.

  8. The symbol '$' is used by CAAPR to represent a regular plural or possessive inflection, spelled as -s, -es or 's in standard spelling.  The pronounciation is either /s/ or /z/, /s/ if preceded by a voiceless consonant, or /z/ otherwise.  '$' is used with Latin or Greek plurals ending with the /z/ sound, even though the form is irregular as in <diagnoses> («dY·øgnO'sI$»).  '$' is also used for word derived from plurals or possessives, such as <salesman> («sE'L$møn»).  The symbol '$' was chosen here because of its similarity in appearance to a capital S.

  9. In both CAAPR-A and CAAPR-B, stress is marked.  The marks appear after the vowel letter.  ''' indicates primary stress, and '·' indicates secondary stress.  For both CAAPR-A and CAAPR-B, the placement of stress represents a lexicographic consensus.  But there is an interesting issue here.  American dictionaries and British dictionaries use different and rather incompatible systems for representing stress.  Consider the words <specify>, <newspaperman>, <predisposition> and <everyday>.  The American and British pronunciations are essentially the same, but the consensus American stress is «spe'søfY·», «nU'zpE·p°rma·n», «prI·di·spøzi'X°n» and «e'vrýdE'», while British dictionaries assert «spe'sêfY», «nyU'zpEpøßman», «prI·dispøzi'X°n» and «e'vrýdE».  These systematic differences in representation make it more or less impossible to come up with an accurate picture of the stress differences between the two varieties, and for this reason, CAAPR-C omits stress marking altogether.

    Exactly how to place stress marks is controversial.  Apparently, the best regarded technique is to show stress before the start of the syllable.  I reject this for CAAPR simply because it makes things harder.  It requires that syllable boundaries in words be established, which is not otherwise required.  Further, it is no small task, and one about which various authorities disagree.  I believe that for CAAPR the only practical approach is to place the mark adjacent to the vowel which is stressed.    Even here, there is controversy as to whether the mark is best placed before or after the vowel.  I prefer to place it after, but don't regard my reasons to be so compelling as to spend time justifying this decision.  Note that CAAPR shows stress in one syllable words, except for weak forms of words like <the> and <of>.  This aids computer processing and transformation of CAAPR text.

    (Just as a curiosity, I observe that the actual definition of X-Sampa calls for the use of the characters /"/ and /%/ to represent stress.  I've never actually seen this done: in my experience, the symbols /'/ and /,/, which closely resemble the equivalent IPA symbols, are used instead.  This explains the strange notation in the Sampa column of the table above.)


CAAPR-C - Putting it all together

CAAPR-C is the combined CAAPR notation, which attempts to merge the American and British spelling for each word, producing a reasonable composite.  The process works as follows.

First, stress marks, which are not used in CAAPR-C, are dropped.  Next, if the CAAPR-A transcription uses the '&' symbol, it is replaced by '&r'.  Then, if the remaining transcriptions are identical (as for the word <soggy> - «sogý»), this is the CAAPR-C representation.  If the revised transcriptions are not identical, then corresponding characters which are different are collected into a bracketed pair, first the American version, and then the British one.  For instance, consider the word <forecast>.  The American «fØrkast» and the British «fØßkAst» are combined into «fØ[r,ß]k[a,A]st».  This may possibly be the end of it, but usually it is not.  In many cases, this combined transcription will contain pairs which are common enough that there are rules for replacing them with a single letter.  For <forecast>, we have two pairs, [r,ß] and [a,A].  Almost always, a British 'ß' will be paired with an American 'r', and the symbol 'ß' will be used as the combined representation.  This reduces <forecast> to the string «fØßk[a,A]st».  But the combination [a,A] is also very frequent, occurring in words like <bath>, <class>, <shaft>, etc.  For this reason, the combination is given the representation 'â' in CAAPR-C.  So the final CAAPR-C version of <forecast> is «fØßkâst». 

If there are any bracketed pairs that cannot be reduced to a single symbol in this fashion, the CAAPR-C allows the comma between the symbols of the pair to be replaced by a character indicating whether one of the pronunciations indicated may be more generally recognizable than the other.  This process and the additional symbols it uses is described in a later section.

Stress information is dropped from CAAPR-C because of the incompatible systems used in CAAPR-A and CAAPR-B.  However, the process of determining the composite CAAPR-C representation will usually notice if the stress has changed in a significant way; these words are marked in the list with an asterisk.  About 1 in every 40 words is marked like this.

This process introduces a new class of CAAPR symbols, which I call "synthetic" symbols, as they represent a synthesis of an American and a British pronunciation.  Some of the symbols (such as 'ß') are extended in meaning in a natural way, while others, like 'â', are new symbols introduced explicitly to represent a common pair.

CAAPR Synthetic Symbols

The following table defines the CAAPR-C synthetic symbols in terms of the corresponding pairs of symbols they replace:

Symbol
Replaces
Example
Notes
â
[a,A]
kLâs (class)

3, ê, î
A mixture of i, ê and ø, unstressed
paL3t (palate), sIkrêt (secret), bLaGkît (blanket)
(1), (2)
è
[e or ë, ø or ° or {no sound}]
sekrêtèrý (secretary)
(3)
¹ or ³
A mixture of  i, ê, ¹, ø, ° or {no sound}, unstressed
fert¹LYz (fertilize), kuz³n (cousin)
(1)
!
[,y] before U or
V or Ü or ø
d!Utý (duty)
(4)
ô
[Ø,o]
krôs (cross)

ò
[Ø,ø]
mandøtòrý (mandatory)
(3)
° or *
A mixture of ø, ° (or *), or {no sound}
Epr°n (apron)
(1)
R
[°r, øß]
piCR (pitcher)
(5)
ß
[r,ß]
mAßk (mark)
(1)
ü
[&,u]
würý (worry)

û
[ø,V]
regyûLR (regular)

Ü
[V,Ü]
pyÜß (pure)
(1)
V
[U,V] before vowel
v&ßCVøs (virtuous)
(6)
ÿ
A mixture of y, ÿ or ý
yUnÿøn (union)
(1)
Ý
[ê or ø, Y or Y*]
ØßgønÝzEX°n (organization)


Notes:

  1. For these symbols, the synthetic meaning is simply a generalization of its phonemic, ortho-phonemic or morphemic meaning as described above.

  2. In CAAPR-C, the symbol used for the indistinct i depends on how indistinct it is.  If one English variety (usually the British) uses 'i' and the other is indistinct, the synthetic symbol 'î' is used.  If one variety (again, usually the British) uses 'ø' and the other is indistinct, the synthetic symbol '3' is used.  In the remaining cases, where both varieties are indistinct, or one uses 'ø' and the other uses 'i', the symbol 'ê' is used, as in CAAPR-A and CAAPR-B.  Of course, it can be argued that one symbol, 'ê', would do for all of them, and there is something to this.  But one could also regard the 'î' as "almost an i", and '3' as "almost a schwa", and CAAPR allows you to choose whichever simplification you prefer.  I'm not thrilled with using a digit as a representation symbol here, as CAAPR has otherwise managed to avoid this, but it suggests the IPA "turned e" (ə), and looks better than any alternative I've come up with.  Similarly, the symbol ³ is used as an alternate form of ¹ when the schwa (or absent) pronunciation dominates the short i.

  3. The grave accented symbols 'è' and 'ò' are noteworthy in that their use always implies a stress change.  In words like <secretary> in which they occur, the syllable is stressed in GA, but unstressed in RP.  In fact, in RP, the vowel is sometimes known to disappear entirely.

  4. The '!' symbol was chosen here for its resemblance to an upside down i, because some authorities regard the 'yU' combination to in fact be an /iu:/ diphthong.

  5. The 'R' represents a syllabic r ('°r') in American English, and a schwa ('øß') in British English.  Though it appears in the CAAPR-C representations of most words pronounced in American English with a syllabic r, there are exceptions, such as <favorite> («fEv°rêt») and <modern> («mod°ßn»).
  6. As noted above, where Longman uses a representation of /u/, CAAPR-A uses the symbol 'U', while CAAPR-B uses 'V'.  In CAAPR-C they are recombined as V when preceding a vowel.  If [U,V] occurs before a consonant, it is possible that one or both of the vowels was different from /u/, and therefore the pair cannot be reduced.


About 1 in 14 words in the CAAPR-C list contain symbol pairs that cannot be reduced to synthetic symbols.  Without the use of the synthetic symbols, the percentage of differences would be very much higher.


CAAPR-C Embellished Pair Notation (Dominance and Equivalence)

Most English words have a CAAPR-C representation without any symbol pairs, meaning that their British and American pronunciations differ only in the typical ways cataloged by the synthetic symbols above.  A small number of words, however, have differences whose low frequency makes it impractical to define single symbols for them.  If one is seeking the holy grail of an orthography that will have a single workable spelling for all of English, then one naturally asks of such words whether there is additional information that would allow one to choose between the two incompatible pronunciations.  The answer, as it happens, is "Maybe".

It may happen that, in one of these words, one or both of the pronunciations may have some international recognition.  Here are some simple words illustrating the possibilities:

  1. <byproduct> - written in CAAPR-C as «bYprod[ø,u]kt», that is, the consensus American pronunciation is «bYprodøkt», and the consensus British pronunciation is «bYprodukt».  However, the British Shorter OED gives the primary pronunciation «bYprodøkt», and the American Merriam-Webster gives the primary pronunciation «bYprodukt».  So it would appear that both pronunciations are commonly used on both sides of the Atlantic.  In some sense, the pronunciations are equally recognizable, or equivalent.

  2. <clerk> - written in CAAPR-C as «kL[&,A]ßk».  This is the exact opposite of the situation above - the American dictionaries do not list the British pronunciation, and vice versa.  One might call the alternatives incompatible.

  3. <version> - written in CAAPR-C as «v&ß[J,X]°n».  This is intermediate between the two previous cases.  «v&X°n» is shown as acceptable by some American dictionaries, and «v&ßJ°n» by some British dictionaries, but in neither case is it shown as the primary pronunciation.  In this situation, I refer to the pronunciations as "weakly equivalent".

  4. <mushroom> - written in CAAPR-C as «muXr[U,V]m».  The British EPD dictionary shows «muXrUm» as the primary pronunciation, but none of the American dictionaries I consulted do the same for «muXrVm».  In this case, the American form dominates the British one, which is to say that it appears that the American form is more acceptable in British English than vice versa.

  5. <from> - written in CAAPR-C as «fr[u,o]m».   This is a weaker form of the situation above.  Several American dictionaries recognize «from» as an alternate (but not primary) American pronounciation, but no British dictionary I've seen gives such acceptance to «frum».  «from» is an acceptable, but minority, American form.  I would say that the British pronunciation "weakly dominates" the American one in this case.

The CAAPR-C list embellishes the representations of words like these by using another symbol in place of the comma within pairs to indicate equivalence or dominance of the two pronunciations, as follows:

  1. The symbol '=' indicates equivalence; thus, the embellished CAAPR-C form of <byproduct> is «bYprod[ø=u]kt».

  2. The symbols '<' and '>' indicate dominance.  The symbols have their mathematical sense.  The word <mushroom> is represented as «muXr[U>V]m», showing the American pronunciation is dominant.

  3. The symbol '|' indicates weak equivalence.  The embellished representation of <version> is «v&ß[J|X]°n».

  4. The symbols '/' and '\' indicate weak dominance.  The symbol leans to the side which dominates.  Thus, the word <from> is represented as «fr[u/o]m», showing the weak dominance of the British pronunciation.

  5. The symbol '?' indicates incompatibility.  <clerk> is represented as «kL[&?A]ßk», reminding us that there's no pleasing everyone.

Of course, these embellishments can and should be ignored by those uninterested in this additional distributional information, and in general when I cite CAAPR spellings, I use comma separators except in cases where the embellishments are of interest.


Changes from version 1

Version 2 of CAAPR differs from version 1 in two important regards.  The more important of the two is that the CAAPR-B list has been revised to mark use of the indistinct i, as well as using the plural and past tense symbols '$' and 'þ'.  This makes the A list and the B list equivalent in terms of the amount and style of information presented.

The other change consists of enhancements to the notation itself.  Most of the enhancements related to finer classification of indistinct, unstressed sounds (the symbols '°', '*', '¹', '³', 'ÿ', '3', 'î' and 'R').  Also, the embellished symbol pair notation was introduced, and the two symbols 'à' and 'Ÿ' were dropped, the former because there was no meaningful distinction from 'è', and the latter as a side-effect of the introduction of 'R' («LYR» is a better representation of <liar> than «LŸß»).

It may be questioned whether the increased precision in the marking of indistinct sounds is really a good thing.  Does the difference between «E'prøn» and «E'pr°n» really matter?  One answer is that some folks think it does, and will argue with you for as long as you want about whether «je'nørøL» or «je'nrøL» is more correct.  But I think the real benefit of this degree of precision is an ironic one: it emphasizes how much uncertainty and variance there is in the pronunciation of unstressed sounds.  I developed CAAPR in the hopes it would be useful to spelling reformers.  One of the problems with many spelling reforms is that they end up reflecting the minutiae of their inventor's dialect, setting forth as certain that one of »rabit« or »rabut« is correct, and the other a blatant error.  CAAPR's rather thorough demonstration of the uncertainty of English pronunciation serves as a persuasive argument for abandoning the pure phonetic principle for the spelling of unstressed syllables.  I think this lesson is an important one, and that any practical reformed spelling system for world English must take it seriously.

One other area in which I have enhanced version 2 of CAAPR is that, despite the internationality of the CAAPR-C notation, the version 1 CAAPR-C list included only traditional American spellings.  That is, it had an entry for <color>, but not for <colour>.  This flaw has been remedied in version 2.

Note that as of March, 2007, I have changed the format of the lists slightly, to make it easier to transfer their content into Microsoft Excel.

Final comments

Version 1 of the CAAPR list included a number of "signature words", whose CAAPR representation deviated from the rules, generally in order to give greater consistency to the spelling of related words.  Except as the result of errors, this version contains no such words.  CAAPR is not intended as a spelling system, and the inconsistencies of the language itself as well as of the sources of CAAPR should not be obscured.  There are good reasons to spell <princess> and <duchess> with the same ending in a practical orthography, but it seems best to leave the data alone, and represent them as «pri'nses» and «du'Cis» in CAAPR-B, which after all is a reference notation and not a spelling system.

Because this version of the CAAPR notation is more complicated than the previous version, I will continue to make the version 1 lists downloadable here.  I note that, in addition to its advantage of simplicity, the version 1 CAAPR-B list takes no account of the "indistinct i", which may be of use to those who doubt its existence.

Though this version of CAAPR has been thoroughly proofread, it is still likely to contain errors and other faults.  Thus, you should inform me when you encounter errors, whether isolated or systematic.  If you discover ways in which CAAPR could be changed to improve its usefulness, I'd also like to hear of them.  Sometimes I suspect that my work on this site is no more than talking to myself in public.  If this is not so, and there are ways I can make my forays into dictionary building more generally useful, it would be a shame if no one bothered to tell me.



To comment on this page, e-mail Alan at wyrdplay.org

Go to wyrdplay.org home page
Go to wyrdplay.org spelling system roster