Traditional English spelling is complicated. Many
defenders of traditional spelling seem unaware of the full extent of
its complexity. With the help of the FEWL
dictionary and a good bit of programming, I've put together some tables
showing just how complicated the relationship between spelling and
pronunciation is. These tables are based on the 30,000 FEWL
dictionary - using a larger word list would of course change the
results (and not by making them simpler).
There are four tables, described here. I recommend
you read the detailed explanations below before following the links.
A table of phonograms and their pronunciations, sorted by spelling.
A table of English sounds and their spellings, sorted by pronunciation.
A table of phonograms and their pronunciations, sorted by unweighted frequency, that is, the frequency of their occurrence in the set of words in the FEWL word list.
A table of phonograms and their pronunciations, sorted by weighted frequency, that is, the estimated frequency of their occurrence in English text.
Because a number of compromises and shortcuts have been
made in building these tables, they should be regarded as quite
approximate.
The pronunciations represented in the table are based on
the consensus of three major dictionaries, as discussed on the FEWL page. (The
alternate pronunciations
for the FEWL signature words (listed here)
were not used.) The word
frequencies were taken from a word frequency list based on the British
National Corpus, as documented on this
site. This list has some flaws, such as not including
frequencies for contractions, but I
haven't yet found anything better. Especially annoying is the
fact that the pronunciations are American, while the word frequencies
are based on British sources. Even with these caveats, I believe
these tables to be accurate enough in their broad outlines.
Each table has five columns, arranged differently from one
table to the next. The column headed "Trad Spelling" contains a phonogram
from traditional English spelling, or the symbol ~ for the occasional
cases where a pronounced sound is omitted from a word's spelling.
Some phonograms end with the notation _e,
which indicates that the symbol is followed by a single consonant
(except when _e follows the
letter r, in which case the r is the consonant) and a silent e. The column headed "FLOSS (Phonemic)" contains a
representation of a sound in the FLOSS
spelling system. There are a few extensions to FLOSS for this
purpose, described in the next paragraph. This column also uses
the symbol ~, to indicate a phonogram with no corresponding sound, that
is, one or more silent letters. The column headed "Unweighted frequency" contains the
number of times the phonogram/sound pair occurs in the FEWL word
list. The column headed "Weighted
frequency" contains a scaled indication of the frequency of the
pair in the BNC frequency list. The figure listed is the number
of occurences divided by 16 (which makes the weighted and unweighted
frequencies somewhat comparable). Pairs whose frequencies display
as
- do not occur in the BNC list. The column headed "Example" contains one or more sample
words for
the phonogram and sound, selected at random by the programs which
generated the tables. Because the words were selected randomly,
some of them may be unfamiliar or afflicted with multiple
pronunciations. This is regrettable, but a more hands-on
selection of example words didn't seem practical.
The FLOSS
spelling in the
tables is augmented in the following ways.
FLOSS uses the symbol $ to indicate the plural ending, regardless of its pronunciation. The tables use the spelling $ when pronounced /z/, and the spelling ß when pronounced /s/.
FLOSS uses the symbol þ to indicate the past tense ending, regardless of its pronunication. The tables use the spelling þ when pronounced /t/, and the spelling ð when pronounced /d/.
The notation =#, where # represents any letter, indicates an initial letter, pronounced as the letter name, as in the words T-shirt and Xmas.
Note that FLOSS makes some non-phonemic distinctions,
based
on stress, morphemics and the corresponding British (RP)
pronunciation. I believe this makes it more, not less, useful in
this context. A previous version of these tables also treated the
vowel sound of new different
from that of crew, based on
the distinction in British pronunciation. I've concluded that
making this distinction was more confusing than helpful, and so it has
been removed.
Certain very common words (a, an, and, for, of, to, into) were
assumed to have their weak (unstressed) pronunciation. The word the is represented in both its
strong and weak form. Because these words are so very common, the
effect is to substantially increase the frequency of schwa spellings
compared to those that would result if the strong forms had been
assumed.
The word frequency data from which these tables were
compiled can be downloaded here. The
original BNC data has been rearranged and manipulated to a certain
extent, with presumably some loss in accuracy.
To comment on this page,
e-mail Alan at wyrdplay.org
Go to wyrdplay.org home
page
Go to wyrdplay.org spelling
system roster