This page describes CAAPR, the Combined Anglo-American
Pronunciation Reference. CAAPR is a pronouncing dictionary
for both British English (RP) and American English (GA). It
is written in a compact and easy-to-read notation which
systematizes the differences between these two varieties of
English, while still handling exceptions gracefully and
accurately. The CAAPR notation is itself called CAAPR, this
time standing for Combined Anglo-American Pronunciation
Representation. It is in many ways a generalization of my FLOSS notation. To some extent,
the CAAPR list resembles the FEWL
list, but it avoids some of the complexities of that list by
concerning itself almost exclusively with phonological rather than
morphological information.
One may reasonably ask the question: What is CAAPR good
for? It is possible that, like FEWL, CAAPR may in time come
to be useful for computer generation of dictionaries for reformed
English orthographies. At this time, however, this seems
premature, especially since most reformers are unwilling to take
on the labor of trying to please both sides of the Atlantic
at once. I see CAAPR mostly as a useful tool for
self-education. In compiling it, I have learned a lot about
the systematic differences between the two English varieties, and
I recommend it to anyone else who feels the need for greater
insight in this area.
CAAPR is based on two primary sources, the FEWL
list for American English and the online EPD
dictionary for British English. This latter dictionary
is of very high quality, and I wish I knew who collected it so I
could offer them my extravagant thanks. During the
development of CAAPR, I transformed, rearranged and occasionally
corrected the EPD, but I find it remarkable how few corrections
were needed. Whatever merits CAAPR may possess are in large
measure derived from the labor of the unknown contributors to the
EPD.
CAAPR actually consists of three word lists: CAAPR-A,
CAAPR-B and CAAPR-C. CAAPR-A(merican) is a list of words
with their GA pronunciations, derived via reformatting from the
FEWL list. CAAPR-B(ritish) is a somewhat different list of
words with their RP pronunciations, derived via subsetting and
reformatting from the online EPD dictionary (plus some additional
common words unaccountably omitted from that document).
CAAPR-C(ombined) is a combined list, comprising the words which
are both in CAAPR-A and CAAPR-B, showing both pronunciations
together in a single notation. As will be described here,
the CAAPR notation is slightly different for each document, in
ways that are unlikely to give the user any difficulty. The
CAAPR-A and CAAPR-B lists each contain approximately 30,000
words. The CAAPR-C list contains the approximately 28,000
words common to both the A and B lists. (All of these lists now
include a relatively small number of additional words from other
sources.)
The CAAPR lists show a single (RP or GA) pronunciation for
each word on the list. Of course, the world is much more
complicated than this, because many words have multiple acceptable
pronunciations, with conflicting data about which is preferable or
most prevalent. Both lists use the technique of
lexicographic consensus to resolve such questions. Note that
differences between the GA and RP pronunciations listed in CAAPR
may not reflect actual Anglo-American differences. For
instance, both forms may represent pronunciations which are common
in both varieties, distinguished more by happenstance than by
geography.
The pronunciations in CAAPR-A were determined by consensus
of the following dictionaries: The Longman Pronunciation
Dictionary, the Merriam-Webster Collegiate Dictionary CD-ROM, and
the American Heritage Dictionary CD-ROM. In difficult cases,
the Random House Unabridged Dictionary CD-ROM was also
consulted. Similarly, the pronunciations of CAAPR-B were
determined by consensus of the online EPD, the Longman
Pronunciation Dictionary and the Shorter OED CD-ROM. In
difficult cases, the Cambridge Pronunciation Dictionary was also
consulted. The Longman Pronunciation Dictionary is the most
precise of all these sources, and was often used to resolve issues
which could not be easily settled using the other, less technical
sources.
Each list has a similar format. A typical entry from
the CAAPR-A and CAAPR-B lists looks like this:
wordplay : w&'dpLE·
The entry is divided by the delimiting string "
: " into the traditional spelling and the CAAPR
representation of a word. In some cases, like the following:
ill-use (n) : i'LyU's
ill-use (v) : i'LyU'z
different forms of the same word may be distinguished by a
qualifying word or phrase in parentheses.
The format of the CAAPR-C list is similar. Here are a
few example lines:
combat (n) : kombat
combat (v) : k[ø|o]mbat *
Some entries may be followed by an asterisk, which
indicates that the American and British stress patterns for the
word are different. The programming which determines this is
still under development, and the absence of the asterisk cannot be
trusted to mean that the two pronunciations are in fact similar in
their stress. (Note that the CAAPR combined notation does
not directly indicate stress, due to technical difficulties.)
The three CAAPR lists can be downloaded using the following
links:
CAAPR is not intended to be used for transcription of
continuous text. Nevertheless, it is useful at this point to
give you an idea of the overall appearance of CAAPR, and I know no
better way to do this than to transcribe a short bit of
prose. Here is the first paragraph of H.G. Wells' "The Star"
(see here for a
plain English version, among others) written in CAAPR-C. It
may seem cryptic at first glance, but as one uses CAAPR, one
rapidly becomes accustomed to its conventions, and soon such
passages present little mystery.
it
w[u/o]z on Dø f&ßst dE øv Dø n!U yïr Dat Dý ønWnsm°nt
w[u/o]z mEd, ØLmOst s[Y/i]m°LtEnÿøsLý fr[u<o]m TrI
øbz&ßvøtòrý$, Dat Dø mOX°n øv Dø pLanît nept!Un, Dý
WtRmOst øv ØL Dø pLanît$ Dat µIL øbWt Dø sun, had bikum verý
iratik. ø ritAßdEX°n in its v3Losêtý had b[i\I]n søspektîd in
disembR. Den, ø fEnt, rimOt spek øv LYt w[u/o]z diskuvRþ in Dø
rIj°n øv Dø pRt&ßbþ pLanît. at f&ßst Dis did not kØz
ený verý grEt iksYtm°nt. sYøntifik pIp°L, hWevR, fWnd Dý
inteLîj°ns rimAßkøb°L inuf, Iv°n bifØß it bikEm nOn Dat Dø n!U
bodý w[u/o]z rapîdLý grOiG LAßjR and brYtR, and Dat its mOX°n
w[u/o]z kwYt dif°r°nt fr[u<o]m Dý ØßdRLý pr[o|O]gr[ø<e]s
øv Dø pLanît$.
The rest of this page uses certain notations to add
precision to the discussion. English words, used as
examples, are enclosed in angle brackets, like <this>.
CAAPR representations of words are enclosed in double brackets,
like «Di's». Individual CAAPR symbols or symbol sequences
are enclosed in apostrophes, like 'sO'. Sampa phonemic
transcriptions are enclosed in slashes, like /soU/.
Individual letters or short sequences from traditional spelling
are generally written without any punctuation, as in "the letter
t" or "the sequence ng".
As used in the CAAPR-A and CAAPR-B lists, the CAAPR
notation is mostly phonemic, with certain non-phonemic notations
added. Because the sound repertoires for GA and RP are
different, the same symbol will sometimes have a distinct (but
related) meaning for the two varieties. For both varieties
of English, not all speakers have exactly the same phonemes.
CAAPR-A and CAAPR-B target idealized speakers of GA and RP
respectively. The GA pronunciations are based on an
idealized American who distinguishes <which> and
<witch>, <marry> and <merry>, and <cot>
and <caught>, and for whom the two vowels of <above>
are distinct, as are the two vowels of <murder>.
Similarly, the RP pronunciations are based on an idealized Briton
who distinguishes <candid> and <candied>, and for whom
the two vowels of <murder> are distinct. Speakers with
fewer phonemes than the ideal can merge symbols as necessary to
represent their own speech.
Note that CAAPR is not suitable for use as a spelling system. Quite apart from its complexity, it often requires distinct spellings for a single sound, which cannot be resolved by referring to the speech of any particular speaker. From the perspective of a learner rather than of a linguist, the distinctions would seem quite arbitrary.
CAAPR uses a large repertoire of symbols, including upper
and lower case alphabetic characters, punctuation, and letters
with diacritics. The symbols are organized into groups so
that the members of each group are somewhat similar, making it
easier to master the entire system. As with any complex
system, there are occasional exceptions to this organization, as
described below.
The symbol groups and their significance is as follows:
The lower-case alphabetic letters. Each symbol in this group is assigned its natural English phonemic meaning. All the vowels are short. (Note that some letters, notably c, q and x, are omitted.)
Upper-case alphabetic letters, plus a few special symbols and punctuation characters. The symbols in this group are assigned meanings that are usually related in some fashion to the corresponding letter (or, in the case of symbols and punctuation, a letter they resemble in shape). Most of the English long vowels, diphthongs, and less common short vowels fall into this group.
Letters with a dieresis (such as ë and Ü). These
generally indicate vowels or diphthongs that occur primarily
before the letter r. There is usually a resemblance in
sound to the unaccented letter.
Letters with a circumflex (such as ê and û).
These generally indicate sounds which are different between
American and British English, except for ê and î, which
indicate indistinct sounds within both American and British
English. There is usually a resemblance in sound to the
unaccented letter.
Letters with a grave accent (such as è and ò).
These indicate sounds which not only differ between American
and British English but are also differently stressed.
When stressed, there is generally a resemblance in sound to
the unaccented letter.
The letters ý, and Ý. These ought to be written as ŷ and Ŷ, but as these are not in the standard Latin-1 character set, the acute accented y is used instead.
The special characters ', ·, $ and þ. The first
two characters are stress marks, and the latter two serve the
purpose of identifying plural and past tense inflections.
Symbol |
Sampa |
Example |
Applies to |
Notes |
a |
{ |
ka't (cat) |
Both |
|
ã |
A~ |
elã' (elan) |
Both |
(1) |
A |
A: |
fA'Døß (father) |
Both |
(2) |
b |
b |
bE'bý (baby) |
Both |
|
C |
tS |
Ce'LO (cello) |
Both |
(3) |
d |
d |
de'd (dead) |
Both |
(4) |
D |
D |
Da't (that) |
Both |
|
e |
E, e |
e'g (egg) |
Both |
|
ë |
e@ |
bë'ß (bear) |
Brit |
(5) |
E |
eI |
ka'nøpE (canape) |
Both |
(6) |
f |
f |
fY'f (fife) |
Both |
|
g |
g |
ga'g (gag) |
Both |
|
G |
N |
si'GiG (singing) |
Both |
(7) |
h |
h |
hO'm (home) |
Both |
|
H |
~ |
u'HuH (uh-uh) |
Both |
(8) |
i |
I |
bi'g (big) |
Both |
(9) |
ï |
I@ |
pï'ßs (pierce) |
Brit |
(10) |
I |
i: |
møXI'n (machine) |
Both |
(11) |
j |
dZ |
ju'j (judge) |
Both |
|
J |
Z |
vi'J°n (vision) |
Both |
(12) |
k |
k |
ki'k (kick) |
Both |
|
K |
x |
Lo'K (loch) |
Both |
(1) |
L |
l |
Li'Lý (lily) |
Both |
(13) |
m |
m |
me'mbøß (member) |
Both |
|
n |
n |
nu'n (none) |
Both |
|
o |
Q |
to'p (top) |
Brit |
(14) |
õ |
o~ |
kõ'nsýëßJ (concierge) |
Both |
(1) |
ø |
@ |
sO'fø (sofa) |
Both |
(15) |
O |
oU, @U |
rO'd (road) |
Both |
|
Ø |
O: |
pØ'z (pause) |
Both |
|
p |
p |
po'p (pop) |
Both |
|
Q |
OI |
kQ'n (coin) |
Both |
(16) |
r |
r |
rØ'riG (roaring) |
Both |
(17) |
& |
3`, 3 |
rif&'r°l (referral) |
Both |
(18) |
s |
s |
sØ's (sauce) |
Both |
(19) |
t |
t |
ti'Lt (tilt) |
Both |
(4) |
T |
T |
Ti'k (thick) |
Both |
|
u |
V |
fu'z (fuzz) |
Both |
(20) |
U |
u:, u |
sU'p (soup) |
Both |
(21) |
Ü |
U@ |
øbskyÜ'ß (obscure) |
Brit |
|
v |
v |
va'lv (valve) |
Both |
|
V |
U, u |
gV'd (good) |
Both |
(20),
(21) |
w |
w |
wE'wøßd (wayward) |
Both |
(22) |
µ |
hw, W |
µi'C (which) |
Amer |
(23) |
W |
aU |
frW'n (frown) |
Both |
|
X |
S |
no'kXøs (noxious) |
Both |
(24) |
y |
j |
yu'mý (yummy) |
Both |
|
Y |
aI |
Y's (ice) |
Both |
|
z |
z |
zi'gzag (zigzag) |
Both |
(19) |
Notes:
'ã', 'õ' and 'K' represent non-English sounds that are used by some speakers in a few words. Unlike the FEWL list, the online EPD shows pronunciations for these words that do not use these sounds. I have resolved this difficulty by including in each CAAPR list two distinct pronunciations for these words, one using the unusual sounds, and one using only everyday English sounds. While the dictionaries clearly prefer the foreign pronunciations for these words, I've encountered very few individuals who actually use them. (But it must be admitted that most of these words are read far more commonly than they are spoken.)
Most occurrences of the 'A' sound in
American English are spelled with 'o' in CAAPR-A, as described here.
Technically, the ch sound represented by 'C' is the juxtaposition of the two sounds 't' and 'X'. When these sounds occur in different components of a compound word (such as <potshot>), the are spelled in CAAPR with 'tX' rather than with 'C': the CAAPR representation of <potshot> is «po'tXot».
When the sound of /d/ or /t/ occurs as a past tense ending (as in <walked> and <crawled>), the symbol þ is used in its place («wØ'kþ» and «krØ'Lþ»), as described here.
The 'ë' sound is generally found only in British English, though it may occur in American English for Americans who pronounce <Mary> differently from <merry>. The 'ë' symbol is used in American CAAPR in many words where the British pronunciation would be represented as 'ë', as described here.
Most mixed case phonemic notations use the symbol 'N' rather than 'G' for this sound. I have chosen to use 'G' because I find the spelling «si'GiG» for <singing> easier to read correctly at first glance than «si'NiN».
At first glance, use of the symbol 'E' for the English long a may seem highly unnatural. I find it easy to get used to because of the similarity between the long a and the short e sound represented by 'e'. Use of 'E' rather than 'A' for this sound is also unsurprising for those familiar with IPA or Romance languages.
In my particular dialect of American English, there are three words with a nasalized short u: <huh>, <uh-huh> and <uh-uh>. In CAAPR-A, these words are represented by «hu'H», «u'HhuH» and «u'HuH». These words are not in the CAAPR-B list, but the same spellings could be used if these words are pronounced similarly in RP.
In many words, the /I/ sound is
represented in CAAPR by 'ê' rather than 'i', as described here.
In
British English, /I/ may also be represented by 'ý', as
noted below.
The 'ï' sound is generally found only in British English. The 'ï' symbol is used in American CAAPR in many words where the British pronunciation would be represented as 'ï', as described here.
At first glance, use of the symbol 'I' for the English long e may seem highly unnatural. I find it easy to get used to because of the similarity between the long e and the short i sound represented by 'i'. Use of 'I' rather than 'E' for this sound is also unsurprising for those familiar with IPA or Romance languages. In American CAAPR, an unstressed 'I' sound will generally be written as 'ý', as noted below.
Most mixed-case phonemic notations use the symbol 'Z' rather than 'J' for this sound. This is mostly a question of taste; «kØrsA'J» is probably a better notation than «kØrsA'Z» for <corsage>, but «vi'Z°n» is probably superior to «vi'J°n» for <vision>.
The symbol for /l/ ought to really be 'l', but in many fonts it is impossible to tell a lower-case l from an upper-case I. Using 'L' for this sound represents a triumph of practicality over principle.
The 'o' sound is found only in British English, as the standard short o sound. The 'o' symbol is used in place of 'A' in American CAAPR in most words where the British pronunciation would be represented as 'o', as described below.
The 'ø' symbol designates the schwa, treated by CAAPR as distinct from both the short u ('u') and the stressed vowel of <bird> ('&'). In many words, the schwa is represented in CAAPR by 'ê' rather than 'ø', as discussed later.
Using 'Q' to represent a vowel does
take a little bit of getting used to. I find it helps to
think of the tail of the 'Q' as an I attached to an O with an
unusually placed ligature.
The 'r' symbol is used for the consonant r. In British English, it is used only when an /r/ is always pronounced. Otherwise, the symbol 'ß' is used, as described here.
The symbol '&' represents the
sound /3`/ in American CAAPR, but /3/ in British CAAPR. This
means that <bird> is represented as «b&'d» in American
CAAPR, but as «b&'ßd» in British CAAPR. As will be seen,
when CAAPR is used as a combined notation, the British usage
prevails. The character '&' was chosen for its visual
resemblance to a capital R.
When the sound of /s/ or /z/ occurs as a plural ending (as in <walks> and <crawls>), the symbol $ is used in its place («wØ'k$» and «krØ'L$»), as noted below.
Those familiar with IPA and Sampa might expect 'u' to represent the /U/, and 'V' to represent the /V/ sound. While this would indeed be logical, I find 'u' for /V/ to be considerably more natural and readable, especially given the prevalence of this sound compared to /U/.
The Longman pronunciation dictionary uses the symbol /u/ as a "neutralization" of /u:/ in many words, such as <situation> and <regulate>. In American CAAPR, I have transcribed this vowel as 'U', which is in agreement with the popular American dictionaries. But for British English, I represent it as 'V', which is consistent with popular British dictionaries. When CAAPR is used as a combined notation, the British usage prevails.
Some occurences of the 'w' sound in
British English are spelled with 'µ' in CAAPR, as described here.
The 'µ' sound is found only in American English, in association with the digraph wh. (Even though most Americans do not use this sound, American CAAPR is oriented towards an ideal speaker who does.) The 'µ' symbol is used in British CAAPR in most words where the American pronunciation would be represented as 'µ', as noted below.
Most mixed-case phonemic notations use the symbol 'S' rather than 'X' for this sound. I have chosen to use 'X' because I find spellings such as «pre'Xøs» rather than «pre'Søs» for <precious> to be easier to read correctly at first glance.
Both of the English varieties targeted by CAAPR have some
unique sounds whose use is generally predictable from the spelling of
the words which contain it. For instance, the sound designated
by CAAPR 'µ' does not occur in British English, but one can predict,
in almost all cases, that a word pronounced with a /w/ in British
English, but spelled with wh, will be pronounced as 'µ' in American
English (at least by those Americans who use that sound). I call
the symbols with this property ortho-phonemic,
as they have phonemic significance in one variety, but orthographic
significance in the other variety.
The use of the ortho-phonemic symbols in CAAPR brings the
American and British spellings closer to one another, in a way that
makes sense even for speakers not familiar with the other variety.
The ortho-phonemic symbols are:
Symbol |
Sampa |
Example |
Variety |
Spelling |
Notes |
ë |
E |
bë'r (bear) |
Amer |
air, ar, are, ear, eir |
(1) |
ï |
I |
pï'rs (pierce) |
Amer |
ear, eer, er, ere, ier |
(2) |
o |
A: |
to'p (top) |
Amer |
o, qua, wa, en |
(3) |
ß |
(r) |
rØ'ß (roar) |
Brit |
r, final or before cons. |
(4) |
µ |
w |
µi'C (which) |
Brit |
wh |
(5) |
Notes:
Most Americans pronounce words which in RP are pronounced with the /e@/ diphthong with a simple /E/, normally written in CAAPR as 'e'. Examples of such words are <fair>, <Mary>, <spare>, <bear> and <their>. American CAAPR represents the combination /Er/ with 'ër' when the regular spelling uses one of the forms listed in the table above.
Most Americans pronounce words which
in RP are pronounced with the /I@/ diphthong with a simple /I/,
normally written in CAAPR as 'i'. Examples of such words are
<fear>, <beer>, <serious>, <here> and
<pierce>. American CAAPR represents the combination
/Ir/ with 'ïr' when the regular spelling uses one of the forms
listed in the table above.
Most Americans pronounce words which in RP use the short o vowel /Q/ with /A:/, normally written in CAAPR as 'A'. Examples of such words as <stop>, <qualify>, <wander> and <entree>. American CAAPR represents /A:/ with 'o' when the regular spelling has one of the forms listed in the table above.
RP is a non-rhotic form of English. This means that the letter r is generally not pronounced when it is followed by a consonant, or at the end of a word. (The consonant may, however, be spoken in speech at the end of a word when the next word begins with a vowel.) For British English, these suppressed r's are represented in CAAPR by the letter 'ß', as in «fAß» (far) or «sØßd» (sword). This symbol was chosen for its visual resemblance to a capital R. (In fact, the symbol 'R' could have been used, but I think an unusual character is better at communicating the unique nature of the construct.) Note that CAAPR indicates pronunciation of words, not of larger units, so the phrase "here and there" would be written in CAAPR as «hïß ønd Dëß», even though the sequence of sounds would be more like «hïrønDëß».
Britons generally pronounce words
which Americans might pronounce with the /hw/ sound with a simple
/w/, normally written in CAAPR as 'w'. Examples of such
words are <where> and <awhile>. British CAAPR
represents the sound /w/ with 'µ' when the regular spelling uses
wh.
CAAPR uses a number of additional non-phonemic symbols for
various purposes. These symbols are listed in the table below,
and explained in the following notes.
Symbol |
Sampa |
Type |
Example |
Notes |
ê |
@, I, 1 |
Indef. sound |
ma'gnêt (magnet) |
(1) |
ý |
I, i, i: |
Indef. sound |
ha'pý (happy) |
(2) |
ÿ |
I, i, j |
Indef. sound |
prI'vÿøs (previous) |
(3) |
° |
(@) |
Optional sound |
ma'jik°Lý (magically) |
(4) |
* |
(@) |
Optional sound |
tY'*L (tile) |
(5) |
¹ |
(@), (I), (1) |
Optional sound |
kri'm¹n°L (criminal) |
(6) |
þ |
d, t |
Morpheme |
dra'gþ (dragged) |
(7) |
$ |
s, z |
Morpheme |
dru'g$ (drugs) |
(8) |
' |
" (or ') |
Stress |
øLY'v (alive) |
(9) |
· |
% (or ,) |
Stress |
do·mênE'X°n (domination) |
(9) |
Notes:
One of the features of both GA and RP is the "indistinct i". Consider the word <magnet>. Some people pronounce this as «ma'gnit», while others pronounce it as «ma'gnøt». Some speakers may use either pronunciation at random. In such cases, I use the term "indistinct i" to describe the vowel. It is not a distinct sound: it is always pronounced as /@/ or /I/ (or according to some authorities as /1/). The indistinct i is represented in CAAPR by the symbol 'ê'.
Many English words are spelled with a final y used as a vowel, as in <many> and <quality>. American dictionaries generally show the sound of this vowel as an unstressed long e. British dictionaries, on the other hand, often show it as a short i. The Longman dictionary uses the symbol /i/ to represent it. For British English /i/ is distinguished from /i:/ by length as well as stress, while for American English, only the stress difference is apparent. CAAPR uses the symbol 'ý' where Longman uses /i/. One difference is that, for American English only, where an unstressed /i:/ occurs, it is spelled in CAAPR as 'ý' rather than 'I'. This mostly affects words ending in /i:z/ such as <rabies>, which rhymes with <babies> for Americans. <rabies> is spelled «rE'býz» in American CAAPR, but «rE'bIz» in British CAAPR.
The symbol 'ÿ' is a variant of 'ý',
representing a sound which may either be one of the vowel sounds
of 'ý', or the consonant 'y'. The Longman dictionary uses
the symbol /i/, linked to the following sound by a tie, to
represent it. The sample word <previous> is typical of
the words where it occurs, as the word might be pronounced either
«prI'výøs» or «prI'vyøs».
The symbol '°' (a superscript 0) is an alternate form of 'ø', indicating a schwa sound which may be omitted. '°' will be followed by one of the liquid sounds 'L', 'm', 'n', 'r' or 'ß'. When '°' is the last vowel of a word, or when it is followed by two consonants, it indicates that the following consonant may be syllabic, as in «ba't°L» or American «pO'k°r». The Longman dictionary uses a raised schwa symbol, indicating a sound ordinarily omitted but sometimes spoken, in this situation. The '°' notation is unusual in that CAAPR uses this notation if any of its source dictionaries show the sound as optional rather than insisting on consensus. The notation was chosen due to its suggestion of a tiny 'ø' (or of Longman's raised schwa).
The '*' symbol is an alternate form
of '°', occurring in certain situations where the possible schwa
sound is often considered to be an interpolation, generally not
indicated by the traditional spelling. The main problem it
addresses is that many speakers will insert a schwa sound between
a long vowel and an 'L' or 'r', a phenomenon Longman calls
"breaking". For instance, I pronounce <boil> and
<royal> as a rhyme, «bQ'°L» and «rQ'°L» respectively.
Others may pronounce them both without the schwa, or may pronounce
only <royal> with the schwa. I feel it is useful to
somehow distinguish the CAAPR notations for these two words, and
so use «bQ'*L» for <boil>, but «rQ'°L» for
<royal>. Use of '*' in this fashion after a vowel is
the most common use, but it may also appear after a consonant, as
in «k&'r*L» for <curl> (American) or «re's*LiG» for
wrestling. The notation was chosen for its similarity to
'°'. Distinguishing '*' from '°' is probably only useful
when CAAPR is being used as the basis for a spelling system.
The symbol '¹' (a superscript 1) is
an alternate form of 'ê', indicating an indistinct i sound which
may be omitted. It is probably easiest to think of it as
indicating a choice between '°' and 'i'. As with '°', CAAPR
will use this notation in place of 'ê' if justification is found
in any of its source dictionaries. The notation was chosen
due to its suggestion of a tiny 'i'.
The symbol 'þ' is used in CAAPR to represent a regular past-tense inflection, spelled as -d or -ed in standard spelling. The pronunciation is either /t/ or /d/, /t/ if preceded by a voiceless consonant, or /d/ otherwise. 'þ' is also used for words derived from past tenses, such as <confusedly> («kønfyU'zêþLý»), and words where an -ed suffix is applied to a noun, as <jeweled> («jU'øLþ»). The symbol 'þ' was chosen here because of its similarity in appearance to a capital D, and its phonetic association with t.
The symbol '$' is used by CAAPR to represent a regular plural or possessive inflection, spelled as -s, -es or 's in standard spelling. The pronounciation is either /s/ or /z/, /s/ if preceded by a voiceless consonant, or /z/ otherwise. '$' is used with Latin or Greek plurals ending with the /z/ sound, even though the form is irregular as in <diagnoses> («dY·øgnO'sI$»). '$' is also used for word derived from plurals or possessives, such as <salesman> («sE'L$møn»). The symbol '$' was chosen here because of its similarity in appearance to a capital S.
In both CAAPR-A and CAAPR-B, stress
is marked. The marks appear after the vowel letter.
''' indicates primary stress, and '·' indicates secondary
stress. For both CAAPR-A and CAAPR-B, the placement of
stress represents a lexicographic consensus. But there is an
interesting issue here. American dictionaries and British
dictionaries use different and rather incompatible systems for
representing stress. Consider the words <specify>,
<newspaperman>, <predisposition> and
<everyday>. The American and British pronunciations
are essentially the same, but the consensus American stress is
«spe'søfY·», «nU'zpE·p°rma·n», «prI·di·spøzi'X°n» and «e'vrýdE'»,
while British dictionaries assert «spe'sêfY», «nyU'zpEpøßman»,
«prI·dispøzi'X°n» and «e'vrýdE». These systematic
differences in representation make it more or less impossible to
come up with an accurate picture of the stress differences between
the two varieties, and for this reason, CAAPR-C omits stress
marking altogether.
Exactly how to place stress marks is controversial.
Apparently, the best regarded technique is to show stress before
the start of the syllable. I reject this for CAAPR simply
because it makes things harder. It requires that syllable
boundaries in words be established, which is not otherwise
required. Further, it is no small task, and one about which
various authorities disagree. I believe that for CAAPR the
only practical approach is to place the mark adjacent to the vowel
which is stressed. Even here, there is controversy as
to whether the mark is best placed before or after the
vowel. I prefer to place it after, but don't regard my
reasons to be so compelling as to spend time justifying this
decision. Note that CAAPR shows stress in one syllable
words, except for weak forms of words like <the> and
<of>. This aids computer processing and transformation
of CAAPR text.
(Just as a curiosity, I observe that the actual definition of X-Sampa calls for the use of the characters /"/ and /%/ to represent stress. I've never actually seen this done: in my experience, the symbols /'/ and /,/, which closely resemble the equivalent IPA symbols, are used instead. This explains the strange notation in the Sampa column of the table above.)
CAAPR-C is the combined CAAPR notation, which attempts to merge
the American and British spelling for each word, producing a
reasonable composite. The process works as follows.
First, stress marks, which are not used in CAAPR-C, are
dropped. Next, if the CAAPR-A transcription uses the '&'
symbol, it is replaced by '&r'. Then, if the remaining
transcriptions are identical (as for the word <soggy> - «sogý»),
this is the CAAPR-C representation. If the revised
transcriptions are not identical, then corresponding characters which
are different are collected into a bracketed pair, first the American
version, and then the British one. For instance, consider the
word <forecast>. The American «fØrkast» and the British
«fØßkAst» are combined into «fØ[r,ß]k[a,A]st». This may possibly
be the end of it, but usually it is not. In many cases, this
combined transcription will contain pairs which are common enough that
there are rules for replacing them with a single letter. For
<forecast>, we have two pairs, [r,ß] and [a,A]. Almost
always, a British 'ß' will be paired with an American 'r', and the
symbol 'ß' will be used as the combined representation. This
reduces <forecast> to the string «fØßk[a,A]st». But the
combination [a,A] is also very frequent, occurring in words like
<bath>, <class>, <shaft>, etc. For this
reason, the combination is given the representation 'â' in
CAAPR-C. So the final CAAPR-C version of <forecast> is
«fØßkâst».
If there are any bracketed pairs that cannot be reduced to a
single symbol in this fashion, the CAAPR-C allows the comma between
the symbols of the pair to be replaced by a character indicating
whether one of the pronunciations indicated may be more generally
recognizable than the other. This process and the additional
symbols it uses is described in a later
section.
Stress information is dropped from CAAPR-C because of the
incompatible systems used in CAAPR-A and CAAPR-B. However, the
process of determining the composite CAAPR-C representation will
usually notice if the stress has changed in a significant way; these
words are marked in the list with an asterisk. About 1 in every
40 words is marked like this.
This process introduces a new class of CAAPR symbols, which I
call "synthetic" symbols, as they represent a synthesis of an American
and a British pronunciation. Some of the symbols (such as 'ß')
are extended in meaning in a natural way, while others, like 'â', are
new symbols introduced explicitly to represent a common pair.
The following table defines the CAAPR-C synthetic symbols in
terms of the corresponding pairs of symbols they replace:
Symbol |
Replaces |
Example |
Notes |
â |
[a,A] |
kLâs (class) |
|
3, ê, î |
A mixture of i, ê and ø,
unstressed |
paL3t (palate), sIkrêt
(secret), bLaGkît (blanket) |
(1),
(2) |
è |
[e or ë, ø or ° or
{no sound}] |
sekrêtèrý (secretary) |
(3) |
¹ or ³ |
A mixture of i, ê, ¹,
ø, ° or {no sound}, unstressed |
fert¹LYz (fertilize), kuz³n
(cousin) |
(1) |
! |
[,y] before U or V or Ü or
ø |
d!Utý (duty) |
(4) |
ô |
[Ø,o] |
krôs (cross) |
|
ò |
[Ø, ø or ° or {no sound}] |
mandøtòrý (mandatory) |
(3) |
° or * |
A mixture of ø,
° (or *), or {no sound} |
Epr°n (apron) |
(1) |
R |
[°r, øß] |
piCR (pitcher) |
(5) |
ß |
[r,ß] |
mAßk (mark) |
(1) |
ü |
[&,u] |
würý (worry) |
|
û |
[ø,V] |
regyûLR (regular) |
|
Ü |
[V,Ü] |
pyÜß (pure) |
(1) |
V |
[U,V] before vowel |
v&ßCVøs (virtuous) |
(6) |
ÿ |
A mixture of y, ÿ or ý |
yUnÿøn (union) |
(1) |
Ý |
[ê or ø, Y or Y*] |
ØßgønÝzEX°n (organization) |
Notes:
For these symbols, the synthetic meaning is simply a generalization of its phonemic, ortho-phonemic or morphemic meaning as described above.
In CAAPR-C, the symbol used for
the indistinct i depends on how indistinct it is. If one
English variety (usually the British) uses 'i' and the other is
indistinct, the synthetic symbol 'î' is used. If one variety
(again, usually the British) uses 'ø' and the other is indistinct,
the synthetic symbol '3' is used. In the remaining cases,
where both varieties are indistinct, or one uses 'ø' and the other
uses 'i', the symbol 'ê' is used, as in CAAPR-A and CAAPR-B.
Of course, it can be argued that one symbol, 'ê', would do for all
of them, and there is something to this. But one could also
regard the 'î' as "almost an i", and '3' as "almost a schwa", and
CAAPR allows you to choose whichever simplification you
prefer. I'm not thrilled with using a digit as a
representation symbol here, as CAAPR has otherwise managed to
avoid this, but it suggests the IPA "turned e" (ə), and looks
better than any alternative I've come up with. Similarly,
the symbol ³ is used as an alternate form of ¹ when the schwa (or
absent) pronunciation dominates the short i.
The grave accented symbols 'è' and 'ò' are noteworthy in that their use always implies a stress change. In words like <secretary> in which they occur, the syllable is stressed in GA, but unstressed in RP. In fact, in RP, the vowel is sometimes known to disappear entirely. These symbols are only used preceding an 'r'.
The '!' symbol was chosen here for its resemblance to an upside down i, because some authorities regard the 'yU' combination to in fact be an /iu:/ diphthong.
As noted above, where Longman uses a representation of /u/, CAAPR-A uses the symbol 'U', while CAAPR-B uses 'V'. In CAAPR-C they are recombined as V when preceding a vowel. If [U,V] occurs before a consonant, it is possible that one or both of the vowels was different from /u/, and therefore the pair cannot be reduced.
About 1 in 14 words in the CAAPR-C list contain symbol pairs that cannot be reduced to synthetic symbols. Without the use of the synthetic symbols, the percentage of differences would be very much higher.
Most English words have a CAAPR-C representation without any
symbol pairs, meaning that their British and American pronunciations
differ only in the typical ways cataloged by the synthetic symbols
above. A small number of words, however, have differences whose
low frequency makes it impractical to define single symbols for
them. If one is seeking the holy grail of an orthography that
will have a single workable spelling for all of English, then one
naturally asks of such words whether there is additional information
that would allow one to choose between the two incompatible
pronunciations. The answer, as it happens, is "Maybe".
It may happen that, in one of these words, one or both of the
pronunciations may have some international recognition. Here are
some simple words illustrating the possibilities:
<byproduct> - written in CAAPR-C as «bYprod[ø,u]kt»,
that is, the consensus American pronunciation is «bYprodøkt», and
the consensus British pronunciation is «bYprodukt». However,
the British Shorter OED gives the primary pronunciation
«bYprodøkt», and the American Merriam-Webster gives the primary
pronunciation «bYprodukt». So it would appear that both
pronunciations are commonly used on both sides of the
Atlantic. In some sense, the pronunciations are equally
recognizable, or equivalent.
<clerk> - written in CAAPR-C as
«kL[&,A]ßk». This is the exact opposite of the situation
above - the American dictionaries do not list the British
pronunciation, and vice versa. One might call the
alternatives incompatible.
<version> - written in CAAPR-C as
«v&ß[J,X]°n». This is intermediate between the two
previous cases. «v&X°n» is shown as acceptable by some
American dictionaries, and «v&ßJ°n» by some British
dictionaries, but in neither case is it shown as the primary
pronunciation. In this situation, I refer to the
pronunciations as "weakly equivalent".
<mushroom> - written in CAAPR-C as
«muXr[U,V]m». The British EPD dictionary shows «muXrUm» as
the primary pronunciation, but none of the American dictionaries I
consulted do the same for «muXrVm». In this case, the
American form dominates the British one, which is to say that it
appears that the American form is more acceptable in British
English than vice versa.
<from> - written in CAAPR-C as «fr[u,o]m». This is a weaker form of the situation above. Several American dictionaries recognize «from» as an alternate (but not primary) American pronounciation, but no British dictionary I've seen gives such acceptance to «frum». «from» is an acceptable, but minority, American form. I would say that the British pronunciation "weakly dominates" the American one in this case.
The CAAPR-C list embellishes the representations of words like
these by using another symbol in place of the comma within pairs to
indicate equivalence or dominance of the two pronunciations, as
follows:
The symbol '=' indicates equivalence; thus, the embellished CAAPR-C form of <byproduct> is «bYprod[ø=u]kt».
The symbols '<' and '>' indicate dominance. The symbols have their mathematical sense. The word <mushroom> is represented as «muXr[U>V]m», showing the American pronunciation is dominant.
The symbol '|' indicates weak equivalence. The
embellished representation of <version> is «v&ß[J|X]°n».
The symbols '/' and '\' indicate weak dominance. The symbol leans to the side which dominates. Thus, the word <from> is represented as «fr[u/o]m», showing the weak dominance of the British pronunciation.
The symbol '?' indicates incompatibility. <clerk> is represented as «kL[&?A]ßk», reminding us that there's no pleasing everyone.
Of course, these embellishments can and should be ignored by those uninterested in this additional distributional information, and in general when I cite CAAPR spellings, I use comma separators except in cases where the embellishments are of interest.
Version 2 of CAAPR differs from version 1 in two important
regards. The more important of the two is that the CAAPR-B list
has been revised to mark use of the indistinct i, as well as using the
plural and past tense symbols '$' and 'þ'. This makes the A list
and the B list equivalent in terms of the amount and style of
information presented.
The other change consists of enhancements to the notation itself. Most of the enhancements related to finer classification of indistinct, unstressed sounds (the symbols '°', '*', '¹', '³', 'ÿ', '3', 'î' and 'R'). Also, the embellished symbol pair notation was introduced, and the two symbols 'à' and 'Ÿ' were dropped, the former because there was no meaningful distinction from 'è', and the latter as a side-effect of the introduction of 'R' («LYR» is a better representation of <liar> than «LŸß»).
It may be questioned whether the increased precision in the marking of indistinct sounds is really a good thing. Does the difference between «E'prøn» and «E'pr°n» really matter? One answer is that some folks think it does, and will argue with you for as long as you want about whether «je'nørøL» or «je'nrøL» is more correct. But I think the real benefit of this degree of precision is an ironic one: it emphasizes how much uncertainty and variance there is in the pronunciation of unstressed sounds. I developed CAAPR in the hopes it would be useful to spelling reformers. One of the problems with many spelling reforms is that they end up reflecting the minutiae of their inventor's dialect, setting forth as certain that one of »rabit« or »rabut« is correct, and the other a blatant error. CAAPR's rather thorough demonstration of the uncertainty of English pronunciation serves as a persuasive argument for abandoning the pure phonetic principle for the spelling of unstressed syllables. I think this lesson is an important one, and that any practical reformed spelling system for world English must take it seriously.
One other area in which I have enhanced version 2 of CAAPR is that, despite the internationality of the CAAPR-C notation, the version 1 CAAPR-C list included only traditional American spellings. That is, it had an entry for <color>, but not for <colour>. This flaw has been remedied in version 2.
Note that as of March, 2007, I have changed the format of the
lists slightly, to make it easier to transfer their content into
Microsoft Excel.
Version 1 of the CAAPR list included a number of "signature
words", whose CAAPR representation deviated from the rules, generally
in order to give greater consistency to the spelling of related
words. Except as the result of errors, this version contains no
such words. CAAPR is not intended as a spelling system, and the
inconsistencies of the language itself as well as of the sources of
CAAPR should not be obscured. There are good reasons to spell
<princess> and <duchess> with the same ending in a
practical orthography, but it seems best to leave the data alone, and
represent them as «pri'nses» and «du'Cis» in CAAPR-B, which after all
is a reference notation and not a spelling system.
Because this version of the CAAPR notation is more complicated than the previous version, I will continue to make the version 1 lists downloadable here. I note that, in addition to its advantage of simplicity, the version 1 CAAPR-B list takes no account of the "indistinct i", which may be of use to those who doubt its existence.
Note: The dictionaries were updated in 2019 by the addition of
a significant number of additional words, most of them frequently used
capitalized words, as well as the correction of a few errors. I am not
calling this update "version 3", as the CAAPR notation itself was not
changed.
Though this version of CAAPR has been thoroughly proofread, it
is still likely to contain errors and other faults. Thus, you
should inform me when you encounter errors, whether isolated or
systematic. If you discover ways in which CAAPR could be changed
to improve its usefulness, I'd also like to hear of them.
Sometimes I suspect that my work on this site is no more than talking
to myself in public. If this is not so, and there are ways I can
make my forays into dictionary building more generally useful, it
would be a shame if no one bothered to tell me.
To comment on this page, e-mail
Alan at wyrdplay.org
Go to wyrdplay.org home page
Go to wyrdplay.org spelling system
roster