How to crack the Voynich code, and how not to…..

Here is a pastiche of the type of message I get almost every week:

“Dear Mr Bax,

I am writing to tell you that I saw your website and I have deciphered the Voynich manuscript. You won’t believe me but it took me only two hours! I haven’t read anything much about the manuscript, but I am 100% sure that it is written in Icelandic/Hottentot/Mayan/Greek. Personally, I don’t know anything of that language but I put some words into Google Translate and it came out with this text:

……..  What do you think?”

Since I get these messages so regularly I thought it might be useful to sketch out why this approach to the script and language of the Voynich Manuscript (VM) is so wrong-headed, and so unlikely to lead to a successful decoding.  I will then to set out what I see as a potentially more fruitful, though far more difficult, direction.

Let’s return to my composite email above. Believe me that I am not exaggerating when I say that that all elements of it occur repeatedly in many messages I receive. Some of them are obviously foolish – for example, no-one should be so arrogant as to work on the manuscript without at least doing some basic research into what others have tried to do for many years, starting with René Zandbergen’s site.

However, the aspect which I want to discuss here, and to discourage as strongly as I can, is what I will call the ‘word-by-word‘ approach. This means looking at each word of the manuscript, trying to spot a resemblance in another language (say Greek)  then moving to the next word and finding another word in Greek which might fit, and so on.  Often this is done with the help of Google Translate, by putting in the words one by one, or even together.

An example of this approach is the work of Maurice and Anita Israel which you can find here, which claims that the manuscript is written in Greek. The authors kindly corresponded with me a while ago and set out their methods, which involve a letter-by-letter then word-by-word match of the type I describe above. Unfortunately, I’m very sorry to say that the results are unintelligible, and the authors admit that they have no knowledge of Greek at all. However, these authors are by no means alone –  we have seen other attempts to work word-by-word in a similar way. A common variant is to try to find anagrams to make the words fit better (see the ideas of Tom O’Neil here for example), which seem to me equally fruitless.

I know it is perhaps bad manners on my part to cite their work. After all, they are working hard and sincerely like everyone else to crack the code. However, I feel it is important at the same time to discuss different methods so as to help us to move towards a solution, so apologies to them if they disagree.

But why is this not a good strategy?


Why the word-by-word approach is not a good one

The reason why I feel that this approach is fruitless and (sorry to be so direct) a complete waste of time and energy, is because it fails to take account of some crucial elements of language, namely grammar and syntax. Dictionaries tend to give the ‘base words’ of a language, i.e. without any grammatical inflections. So for example a Latin dictionary will give you the base form for ‘girl’ as “puella”. However, it will not usually tell you that the other forms of the word, depending on the context, could be:

Nominative puella puellae
Accusative puellam puellas
Dative puellae puellis
Genitive puellae puellarum
Ablative puella puellis

(See here for discussion of Latin declensions)

In other words, the ‘word-by-word’ approach fails to take account of grammatical endings (for example case endings). So if you wanted to investigate whether the Voynich language is Latin, it would be foolish to take each word one by one, look them up in a dictionary (and even treat then as anagrams), unless you could also show that the words you see in the manuscript also match the expected grammatical case endings as well.

For example, if you want to show that it is Latin, you would need to demonstrate that a word which you think says ‘girl’ has the shape ‘puella‘ when it is the subject of the sentence (nominative), but the shape ‘puellam’ when it is the object (accusative). In other words, you could not claim that the language is Latin (or any other) until you can give us not only evidence about the word, but also evidence about the grammar of the language you are linking it with. No-one has successfully done this yet.

I have so far been talking about those grammatical elements which in many languages form part of the word (e.g. case endings) but we also need to take account of syntax (in simple terms the sequence of words in the language). In many languages the grammatical elements work together with the syntax in interesting ways. For example, in Latin a preposition (such as ‘in’) is typically followed by the noun in the accusative or ablative case. So if you want to argue that the Voynich is in Latin, you should be able to identify prepositions and then show us that the following word is a noun in one of those cases.

My point here is to argue that a simple word-by-word approach is doomed to failure UNLESS it takes account of the grammar and syntax as well. Sadly, I have never seen a proposed decoding of the VM which even tries to do this.

(I could put this another way and say please do not send me any proposed translations which fail to take account of grammar! 🙂 )

A better way?

This brings me to the more positive part of the post, namely how I feel we could make progress on decoding the manuscript.

What I said above does not mean we shouldn’t work on the sounds and words of the manuscript. We should of course continue to work on trying to identify possible sound-letter correspondences. and also words as best we can.  But what we should also be doing is working on identifying grammatical and syntactical elements which might give us clues as to the underlying language. These might be case endings, for example, so we might find that a word occurs with different endings in different positions. Or we could try to identify particular word sequences which might be syntactic clues.

Let me offer three examples of what I see as possible grammatical/syntactic elements in the Voynich manuscript, to illustrate the kind of direction which, although it is far more difficult, might be more fruitful in the longer run than a word-by-word approach.

1. Grammatical elements – a possible conjunction

My first extended discussion  of the Voynich manuscript explored the possibility that the most common word in the manuscript, namely the word transcribed in EVA as ‘d a ii n’ might be a conjunction meaning something like ‘and’. Here is the word highlighted in folio 25v:


I revisited this argument in a post which you can find here.

I have suggested that this word could be read as something like ‘taur’ or ‘daur’ or ‘thaur’ (and see Derek Vogt’s transcription system here).  Since then, I have been on the lookout for languages which might use such a word as a conjunction. A few months ago I came across an interesting possibility, namely the Romani language, i.e. the language of the Roma or Gypsies (not to be confused with Romanian). You can find an authoritative site about the language here.

Many people have suggested Romani as the language of the Voynich, on various other grounds, including Derek Vogt on this site. Like Derek, I am especially interested  in the linguistic dimensions. In this light we can see that several dialects of Romani have a conjunction which is not dissimilar to ‘taur’, as you can see from this table from Yaron Matras’ book:



Matras, Y. (2002) Romani A Linguistic Introduction, Cambridge University Press (p201)

Matras also mentions variants such as ta / taj / thaj / te.  Now, this might not seem particularly close to ‘taur’, but we should not dismiss a connection too quickly, since there are many dialects of Romani with a lot of variation, and many changes could have occurred since the 15th century when the VM was probably written.

I was interested when Marco Ponzi independently noted the resemblance in his comment here, also discussed in Derek’s reply here. This does not in itself mean that we are correct, of course, but it is at least encouraging that others see similar connections.


2. The definite article

A second element, and one which attracted my attention when I first looked at to the VM, is the frequent occurrence at the beginning of words of what looks like the Arabic attached definite article “al-“.  This feature is common in the Voynich star names (see e.g. here) where they could well be derived from Arabic names (such as ‘The bear’, ‘The serpent’ and so on).

What is odd about the cluster at the start of many Voynich words, however, is that they show more variation that (say) the Arabic definite article, sometimes apparently realised as EVA:o, and in other places as EVA:ok, EVA:ol and EVA:of. You can see possible examples here:

If we are looking at a definite article, why would it vary from word to word? That is something which has never been satisfactorily explained, and has led many to doubt that they are definite articles at all.

However, I recently noticed that again Romani has just such an odd feature, namely it has a definite article which comes before the noun (i.e. it is preposed), which looks in places rather like the Voynich particles, and which varies depending on aspects of the noun with which it is associated – with respect to gender, number and case.  See this table, again from Matras:


Matras, Y. (2002) Romani A Linguistic Introduction, Cambridge University Press, p97

Again, we need to remember the passage of time between these modern dialects and the Voynich manuscript, but the point is that Romani definite articles vary in unusual ways just as the (possible) Voynich definite articles appear to do. It is possible that this could help to explain the variety in (what look like) Voynich definite articles.  Most of the variation in the table above has to do with vowels and with /l/, but Matras also mentions a Romani indefinite article ‘ek’ which could also come into the picture (ibid p98).

In addition, the comprehensive Romani website at Manchester University  offers an interesting list of “Early Romani deictic and anaphoric expressions”. The authors note that (as in other languages) the definite article might ultimately be derived from these demonstratives, a link which Matras also highlights (2002:111), so it is interesting to see  ‘vowel + k’ and ‘vowel + l’ in these too:


In  summary, this shows that if the elements in the VM which look as if they might be definite articles (e.g. in the star names) are indeed definite articles, then Romani gives a precedent for a degree of systematic variation in their form.

3. Another conjunction or particle?

A third possible contender for a grammatical element can be seen here:


The second word in the first example (from Voynich f6r line one)  is the sign which resembles a 9 (EVA:y). In the second example, from Voynich f3v line 1, we see the first word (EVA:koaiin) repeated later in the line, this time also with the 9 (EVA:y) in front of it.

This second example is to my mind highly significant, since the chances of this word (EVA:koaiin) being repeated in the same line ‘accidentally’ are extremely slim. One reason for this is that the word itself is very rare in the manuscript, occurring in isolation only three times:

Furthermore, with a prefix it is also very rare (6 occurences):

This makes it highly likely that the word is a lexical item (not a grammatical one), and possibly a noun. I have argued before that it is probably the name of the plant pictured in f3v. It further means that it is probable that the 9 symbol (EVA:y) in front of the second occurrence of the  word must be a grammatical or syntactic marker of some sort, a finding which we can state with unusual confidence. But what does it mean?

Derek Vogt recently posted an interesting discussion, suggesting that this particle, which might represent an /n/ sound (see my original 2014 article and Derek’s larger scheme) could be “something that goes between alternative names for the same thing, regardless of whether it’s attached or independent”. In other words he also sees it as a possible conjunction of some sort. He offers further examples of the same sign being used in an (apparently) similar way on folios 6r, 6v, and 17r:

^pawr n xaš hašar^
^kãwy n $wr hõokwr^
^twrwr ntwrhar^

(Transcription using Derek’s scheme)

If this is true, and we have some sort of conjunction expressed by the prefixed or detached sound /n/, then it would be useful to look for languages which use this kind of particle in this way.

I cannot find any examples of this in modern Romani dialects (which does not mean that they did not exist in the 15th century), but curiously, Romani does use a similar marker to indicate a negative. Matras (2002: 114) reports on a “negative particle na” in Romani and in fact in Indo-European languages this is not uncommon. Fortson, in his Indo-European Language and Culture: An Introduction (2011:148) gives the example of French:

“in the modern language, verbs are negated with a preceding ne (historically the true negator, from Latin non) and a following pas.”

This could mean that in the Voynich example above we are instead dealing with a negation of some sort. However, I feel that a positive, additive meaning would more suitably fit the context, if we can find one!


What does all this mean?

I have offered the examples above to illustrate my wider point, namely that we will make more progress in our attempt to decode the Voynich script and language if we can identify and then try to elucidate elements of the grammar and syntax, and not limit attention to the vocabulary alone.  I hope my examples illustrate how we can start to do this, even if it is difficult.


What does it NOT mean?

In addition, I have offered some examples from Romani  to show how we might then try to match what we find in the VM with a known language. (It is curious that Derek has also recently identified some possible Romani words in other parts of the manuscript, see here.)

However, I feel it is important to emphasise that I am NOT arguing that the language of the VM is in fact a form of Romani. The evidence is simply too thin to make any such deductions at this stage.

I am sadly aware from past experience that some people in the Voynich community jump on the smallest statement, twist it, exaggerate it, and then misquote it to further their own agendas. (Previous experience has been compelling 🙂 in this respect.)

So let me state it loud and clear – I am not stating that the manuscript is written in Romani, merely that we should treat this as an interesting possibility and keep investigating it carefully and with an open mind to other possibilities.



Fortson, B.  (2011) Indo-European Language and Culture: An Introduction John Wiley & Sons

Manchester University Romani site:

Matras, Y. (2002) Romani A Linguistic Introduction, Cambridge University Press,





  1. Dear Professor Bax,

    I am writing to tell you that I have deciphered the Voynich manuscript. You won’t believe me but it took me only six years.

    Here is how to deconstruct the almost 2,400 words found two or more times in the VMS text and label words into a pattern of six groups of smaller parts. This is the beginning of the presentation and verification of the decipherment process.

    Voynich Manuscript Solution, Proposed

    Method of Exactly How to Deconstruct the Voynich Manuscript Words, text or label, found twice or more in the Voynich Manuscript (10Feb17) (EVA Version)


    The Voynich Manuscript was manufactured/written somewhere around the year 1421 AD if the carbon-14 dating is to be accepted as fairly accurate, which I do. I believe it was probably manufactured or written within 25, at the most, 50, years of that date. It may have been a copy of an earlier manuscript.

    In early Fifteenth Century Europe, one of the best groups of ideas about how to write in a secret way which protected the secret was a group that had been set forth over a hundred years earlier by Friar Roger Bacon.

    I do not think the Voynich Manuscript was written by Friar Bacon, only by someone using several of his ideas about hiding secrets. His methods for hiding secrets consisted of seven techniques which he listed as:
    1. Magic figures and spells.
    2. Mysterious symbols and words, or what would more properly be termed gobbledygook, insider slang and doubletalk at the present time.
    3. Writing in different ways such as by consonants alone.
    4. Commingling letters of different kinds such as using Hebrew, Greek and Latin letters on the same line.
    5. Using letters not otherwise used, but arbitrarily invented by themselves. This may mean nulls? I understand this to also mean using letters which stand for sounds not normally considered as one letter in an alphabet such as “ng” or “dw”.
    6. Inventing not characters like letters, but geometrical figures with points and marks differently arranged which acquire the significance of letters.
    7. By using a shorthand method of noting and writing down as briefly as one pleases and as rapidly as one desires.

    Outline of the Proposed Method of Forming/Deconstructing VMS Words

    I think the VMS words may have been formed using several of the above techniques plus one other, that of joining together several shorthand codes (from number 7 above) together into what look like words to the uninitiated, always using the same method of joining them together, but leaving out some codes (that would normally be included) when they are not necessary to the understanding of the meaning of the VMS word. I think the other techniques used in formulating the VMS were numbers 4, 5 and 7 above.

    I believe that a preponderance of the Voynich words can be formed by one method, using the same techniques, for each word, and that each different Voynich word will have the same meaning no matter where in the manuscript it is found.

    The method consists of joining together from two to five code groups in the same way for each word, the number of such codes required to form the word depending on the necessary elements needed to differentiate that word from the others. The elements are joined together in a strict sequence of events, starting with the glyphs in the middle, then moving to the glyphs on the left end and finishing with the glyphs on the right end of the VMS word. Deconstructing the words requires the steps be taken in the opposite order, starting at the right side of the word, moving to the left end and finishing with the middle.

    The Voynich words are not words in the general sense, rather they are groups of codes bundled together. They just look like words. What a way to hide the coded information – right out in the open but looking like something else. While some of the glyph code elements are found in almost all VMS words two or more glyphs in length, the rest of the glyph code elements are seemingly optional as to whether or not they are included.

    VMS words only one glyph in length are a special class which I believe each represents one of the required codes from Group I, discussed below.

    These code group positions are each represented by a different group of glyph character codes (each code from 1 to 5, maybe 6, glyph characters in length – most 1 or 2 glyph characters) and assembled using the same order of assembly. I am including six proposed groups and their possibly included glyph character codes. These groups, especially the one for the glyph character(s) at the left end of the Voynich words (Group I) are not yet complete, but account for most of the repeated VMS words.

    This proposed method of forming VMS words also works if followed in reverse to deconstruct each VMS word into its component elements. Thus, to decode a VMS word, one must start with the glyphs at the right end of the word, move next to the left end of the word and finish with the glyphs (if any remain) in the center of the word.

    The forming of words using groups of code elements made up of different glyphs or groups of glyphs can also help to explain why certain glyphs are always or almost always found only in certain places in the VMS words. The glyph resembling a 4 is usually found at the left end of the VMS words, with somewhere around 30 appearances in different VMS words as the second or third from the left, something that may be explained by it only being found in the first element group of glyph abbreviations/codes (Group I) for the left end of the VMS words. Since the glyph is not found in any of the other five groups, one would not expect to find that particular glyph anywhere else in a VMS word.

    The codes in each of the six code groups correspond to certain unvarying positions (if, indeed, the code tables are represented) in the VMS words with the Group I codes being found on the left end of the VMS words and the Group V and Group VI codes being found on the right end of the VMS words with the others being found in numerical order (if represented in the word) – see the simplified diagram below.

    The Group I codes are the prefix part of the prefix-midfix-suffix idea originally put forward by Professor Jorge Stolfi. The Groups II and III codes, if present, are the midfix part of the idea. The Group IV, Group V and Group VI codes, if present, are the suffix part of the idea. Only one or two of these last three code groups will be found in any individual Voynich word, never all three. Almost all of the words in the manuscript fit this pattern.

    Proposed Method of Exactly How to Deconstruct the Voynich Manuscript Words

    Glyph words that are only one glyph in length show a representative one-glyph code from Group I, the first group of codes.

    All Voynich words two or more glyphs in length have both a representative of the first group (Group I) and either a representative of Group V or Group VI, or both.

    The Voynich words are deconstructed by first understanding the right end of the words, then understanding the left end of the words followed by understanding any glyphs still remaining in the middle of the words. This sequence is mandatory if enough glyphs are present.

    After deconstructing the right end of a word, there MUST remain at least one or more glyphs at the left end of the word (to enable a Group I code to be present).

    As each word is deconstructed, the longest possible code from each group will be considered the correct one UNLESS it does not allow a successful deconstruction. When it does not, this is usually resolved by shortening the last code appearing before the impasse was reached by one glyph. This is usually found in Group IV and Group I codes as the deconstruction progresses. It may eliminate entirely the appearance of a Group IV code in some cases.

    Step 1

    At the right hand end of the VMS word, the last glyph or two glyphs will almost always be one of the following: chy, iiin, iin, in, n, ny, s, sy, y, or it will be one of the following: cfh, ckh, cph, cth, d, f, iir, im, ir, k, l, m, o, p, r, t,

    If the last glyph of the word is iiin, iiin, in, n, d, please mark it as a code from Group VI-a.

    If the last glyph of the word is iiin, iiin, in, n, d, and that glyph is preceded by any glyph(s) from among the group a, da, do, e, l, o, m, ra, ro, s, then please mark the a, da, do, e, l, o, m, ra, ro, s, glyph(s) as a code from Group IV.

    The must be one or more glyph(s) remaining at the left end of the VMS word after these operations (required). If the operation of identifying a Group IV representative does not leave the remaining left glyph(s), there may be only a one glyph representative of Group IV or maybe none. Otherwise, always use the longest possible Group IV code.

    If the last glyph(s) of the Voynich word are from among the group y, sy, ny, chy y, please mark it/them as a member of Group VI-b.

    If the last glyph of the word is y, there may be a glyph from among the group cfh, ckh, cph, cth, d, f, iir, im, ir, k, l, m, o, p, r, t, preceding it. If so, please mark the cfh, ckh, cph, cth, d, f, iir, im, ir, k, l, m, o, p, r, t, glyph as a member of Group V.

    If the last glyph of the word is from among the group cfh, ckh, cph, cth, d, f, iir, im, ir, k, l, m, o, p, r, t, please mark the cfh, ckh, cph, cth, d, f, iir, im, ir, k, l, m, o, p, r, t, glyph as a member of Group V.

    (Note – Group V glyphs can, if present, either occupy the last position in a Voynich word or the next to last position, but not both.)

    If the last glyph of the word is from among the group cfh, ckh, cph, cth, d, f, iir, im, ir, k, l, m, o, p, r, t, it may be preceded by a representative of the group a, da, do, e, l, o, m, ra, ro, s. If so, the a, da, do, e, l, o, m, ra, ro, s, glyph(s) should be marked a member of Group IV.

    As noted above, at the end of these operations there must still be at least one unidentified glyph at the left end of the word. There may be more. If there is no glyph there, the steps above must be redone until there is at least one as yet unidentified glyph remaining at the left end of the word.

    These groups of successfully identified glyphs and code groups will not be revisited unless no solution to understanding the word is possible otherwise. The other, remaining, glyphs will next be identified by their groups.

    Step 2

    After understanding the right end of the Voynich word, we move to the left end.

    Starting at the leftmost glyph of those remaining, find the longest string of glyphs that will fit sequentially in the word from among the following glyph/glyph groups:
    a, ai, aiin, air, ak, al, alch, alk, alke, aloe, ar, ara, aro,
    cfh, cfhe,
    ch, cha, chcfh, chcfhe, chckh, chckhe, chcph, chcphe, chcth, chcthe, chd, chde, che, checkh, checkhe, checphe, checthe, chee, cheek, cheeke, chef, chek, cheke, cheo, cheok, cheoke, cheol, cheolk, cheot, cheote, chep, chet, chete, chey, chk, chke, chkol, chl, cho, chockhe, chocthe, chod, choe, choee, chok, choke, chol, cholk, cholke, chop, chor, chot, chote, choto, chp, chs, chse, chsh, cht, chte, chy, chyk, chyke,
    ckh, ckhe, ckheo,
    cph, cphe, cphol,
    cth, cthe, cthee, ctheo, ctho, cthy,
    d, da, daiin, dair, dal, dalke, dara, dch, dche, dchee, dcho, de, dee, dk, dke, dl, do, dol, dsh, dshe, dy, dyk, dyke, dyt, dyte,
    e, ee, ep,
    f, fch, fche, fchee, fchol, fo, fshe,
    k, ka, kai, kal, kalk, kar, kch, kche, ke, kee, keo, ko, kol, ksh, kshe, kshee,
    l, lch, lche, lchee, lf, lk, lke, lkee, lo, loee, lok, lol, lolk, lpch, lsh, lshe, lte,
    o, och, oche, ockh, ockhe, octh, octhe, od, ode, oe, oee, oek, oeke, of, ok, oka, okal, okch, oke, okee, okeo, oko, okol, ol, olch, olche, ole, olk, olke, oll, olo, olp, olsh, olt, olte, op, opalk, opch, opchor, opy, oq, oqok, or, ora, orch, ore, os, osh, ot, ota, otair, otal, otar, otch, otche, otchee, ote, otee, oteeo, oteo, oteol, oto, otol, otsh, oty, oykee,
    p, pch, pche, po, pol, psh, pshe,psho, py,
    q, qckhe, qcthe, qe, qek, qke, qo, qoch, qoche, qockh, qockhe, qocph, qocth, qocthe, qod, qoe, qoee, qof, qok, qokal, qokch, qocth, qoek, qoke, qoky, qol, qolch, qolk, qolsh, qop, qopch, qot, qotch, qote, qoto, qoty,
    r, ra, rche, rchee, ro, rsh,
    s, sa, sal, sch, sche, scho, see, sk, so, sok, sol, solk, solke, sote,
    sh, sha, shckh, shckhe, shcthe, she, sheckh, sheckhe, shecthe, shee, sheek, shek, sheke, sheo, sheockhe, sheoke, sheol, shep, shk, shke, shl, sho, shockhe, shok, shoke, shol, sholk, shop, shot, shote, shs, shy,
    t, ta, tch, tche, tchee, tcho, te, tee, teo, to, toe, tok, tol, tor, tsh, ty,
    y, ych, yche, ychee, yckhe, yfol, yk, yka, ykch, yke, ykee, ykeeo, yko, ylk, yp, ypch, ys, ysh, yshe, yshee, yt, ytch, yte,

    Yeah, I know it’s a huge list. This group of codes is Group I.

    This concludes step 2. The next step will be for us to understand the groups of possibilities for the remaining glyphs in the middle of the word, if any.

    Step 3

    Look to see if there are any remaining glyphs (in the middle of the word) that haven’t already placed into Groups I, IV, V, or VI, as shown above.

    If there are, they should belong to either the Group II codes of d, f, i, ir, k, s, t, or they should be found in the Group III codes of a, at, ch, che, co, ee, eo, eote, o, ok, ol, sh, she, y.

    This should be the end of the deconstruction process. All glyphs in each word should be assigned to one of the groups.

    In Conclusion

    Each Voynich word should only have one or no code representatives from any one group of codes, never two or more from the same group.

    If the deconstructed group codes are reassembled in numerical order (Groups I through VI), the glyphs and order should be identical to the glyphs in the original word.

    The repetitions of these almost 2,400 words comprise more than 80% of all the words in the Voynich Manuscript. Many of the remaining single-repetition words in the manuscript will also successfully deconstruct using only these codes. Most of the rest will also successfully deconstruct if more codes are added, mostly to Group I. If this method of deconstructing the VMS text words is correct, only 12 of the almost 2,400 words duplicated two or more times in the text words or labels will not deconstruct in accordance with these rules.

    It also means that, although the first group of codes (Group I) is huge, a total of only 55 other codes is needed to modify Group I codes to construct/deconstruct 99.5 percent of the duplicated words and labels (as noted above, 12 words do not deconstruct successfully).

    Only a total of 55 modifiers in 5 sequenced groups is far too few for any but a very restrictedly constructed and constrained language, definitely not any natural one.

    Individual ingredient components in recipes is one of the few possibilities that I can visualize for the construction idiosyncrasies of the VMS words (from so few possible coded parts in such a rigidly structured form and from a limited sequence of code group possibilities).

    Group I is very large because it gives a code group for each of the hundreds of herbs being used in the manuscript in recipes. The other codes are modifiers used to construct exact herbal ingredients showing plant parts used, type of preparation, the amount, and some rarely used special modifiers. Each VMS word is an individual ingredient component in an herbal medicine recipe.

    The almost 2,400 words repeated two or more times in the VMS text and labels have been deconstructed in accordance with the above and may be viewed at my website – except for twelve words which do not deconstruct successfully – a rate of about 99.5% success.

    Even though the proposed deconstruction process consistently works (as I show), it doesn’t give results that most other Voynicheroes seem to want to accept. It may be too hard for some to understand.

    Thank you.

    Don of Tallahassee

  2. curious traveller

    Regarding the question “If we are looking at a definite article, why would it vary from word to word? ”

    This might be unrelated, but that is exactly what happens in Italian: 2 definite articles for masculine (“il” and “lo”: selection of the proper article depends on the following word) + one for feminine (“la”) -> each of them with its respective plural -> a total of 6 definite articles.

    So, in Italian there is a word dependency on genre (masculine vs feminine), number (singular vs plural) and the actual, specific word following (masculine “il” article vs masculine “lo” article).
    No intention to suggest any relationship with Italian. I actually see none.
    Also German has an heavy dependence on the case, the number and the genre (3 genres).
    Nonetheless, might be a similar mechanism at work here?

  3. Bertin

    Mister Bax,
    I globally agree with you but if you find the same words, I mean exactly, according to the phonetic, letter by letter, the same mutations, same radicals, the same prefixes and suffixes, there is an high probability you have found at least the main family language, right?
    Your method is certainly the best, but with all due respect, don’t forget the author is an artist (as I am) with it’s personality and probably tricks…

  4. Darren Worley

    I’ve been intrigued by the identification of EVA:d with the sound t or “th”. What I find curious is that the first character looks like a cursive “d” (or Greek delta). This way of writing “d” often occurs often in medieval manuscripts.

    A good example is EVA:doaro, from f68r3, that Stephen identified as a label for Taurus in his 2014 paper (see page 19). It would be pronounced as T/A/U/R/N.

    I’ve noticed that “t” and “d” are often interchanged, notably in the evolution of Germanic languages.

    For example the evolution of the word meaning “Day”

    “tag” (Old High German) -> dagr (Old Norse) > dag (Modern Swedish)
    “tag” (Old High German) -> dagr (Old Norse) > dæg (Old English) > day (Modern English)

    Another example is the word for “door”:

    thura (Greek) > turi (Old High German) > dyrr (Old Norse) > duru (Old English) > door (Modern English)

    I seem to have independently noticed the High German consonant shift.

    I was hoping someone with more linguistic knowledge – might be able to explain if Voynichese is also observing this “law”. [i.e. Taurus (Greek/Latin) > intermediate language(s) > (t/d)aur(n/us) (Voynichese)]

    There seems to have been a lot of iconographic evidence suggesting a German origin for the VMS, but I was curious to know if this “consonant shift” (if present in VMS) might indicate a Germanic influence in Voynichese too.

    I’m not sure this is the only explanation, since I’ve also observed (d>t) when Greek is translated to Indian languages, and presumably the reverse (t>d) also holds when translating from an Indian language.

    • Derek Vogt

      Voicing and devoicing, along with dephonemicizing of the difference between voiced & voiceless (meaning the difference is not phonemic; they’d count as the same phoneme) are certainly things that can happen in any language at any time. Proto-Germanic and certain later Germanic groups have had at least two separate rounds, with yet another round of it happening now in some flavors of English and another separate one in German that happened recently enough to not be reflected in spellings. So English-German pairs like “day-Tag” and “dance-tanz” can actually exist for separate reasons with different histories (in this case, one PIE word began with “t” and the other began with “dʰ”, and one went through a phase as a fricative for a while before becoming plosive again). And it’s not just the Germanic languages. PIE had a set of voiced aspirated plosives which are preserved that way in most Indic langauges (घ, ध, भ; “gh”, “dh”, “bh”) but ended up as Greek’s voiceless aspirated plosives (θ, φ, χ; “th”, “ph”, “ch/kh”) and were also devoiced & fricatized in Proto-Italic before merging on /h/ and then shifting to /f/ in Latin.

      But there are a couple of different ways that can work out, and another couple of phenomena that can be mixed up with one of them:
      1. One sound in the voiced-voiceless pair switches sides in only some cases and not others, so the language still has both sounds.
      2. One sound in the voiced-voiceless pair switches sides in every case, so the two phonemes merge into one.
      3. For any potential voiced-unvoiced pair, a language can have just one sound instead of both, for some other reason instead of a merger of the original pair: a different kind of sound shift, or just that. For example, in Arabic, /p/ became /f/ but /b/ didn’t become /v/, so now it has both /f/ and /b/ unpaired (no /v/ or /p/), and there’s no sign that there was ever a counterpart for /q/ (which has actually become voiced in some dialects).

      In case 1, you would normally expect the alphabet to have two separate letters. The very oldest Latin writing breaks that rule with the letter [c] being all they had for both /k/ and /g/, but not for long before they invented [g] for the specific purpose of fixing that problem, and that’s the only time the Romans ever felt compelled to invent a completely new letter like that. (They seem to have gotten the Greek alphabet indirectly from somebody else in Italy with a non-Indo-European language which had no difference between /k/ and /g/ or was missing one of them, so kappa had been dropped.) But if you were figuring out such an alphabet by comparison with others through cognates, and a partial voicing/devoicing shift had happened, it would look like one letter sometimes encroached on the other one’s sound (like a “p” where you’d think a “b” belonged, or a “b” where you’d think a “p” belonged). Those crossed-over examples would be the ones where the sound shift had happened. There is also a chance of finding two separate letters in a language where they had merged (case 2) but old spellings had been retained. In that case, when you drew the correlations for two separate sounds, you’d be recovering the pronunciation that was used when the spellings became fixed, not when the document you’re working with was written.

      If you could only find one letter for such a pair of sounds, not two, you’d be in either case 2 or case 3, but the missing letter alone would not be enough to tell you which. If both sounds in foreign words consistently correlated with the same single letter in the mystery language, then you could figure that the mystery language didn’t distinguish between the sounds. But that would still leave open the question of what that one letter’s sound really was and how it had gotten that way (case 2 or 3b; a merger of the pair of sounds or just that it had “always” been that way). You could identify case 3a, though, if that letter in the mystery language consistently correlated with one sound in foreign words, but that sound’s voiced-voiceless counterpart tended to correlate with one or more other letters in the mystery language instead. (For example, if EVA-d represented /d/ and the language had lost its /t/ by some means other than a merger with /d/, then /d/ in foreign words would correlate with EVA-d, but /t/ in foreign words would not. It would correspond with some other letter or letters in the mystery language, like maybe one that normally represented /s/ or /θ/ or “č”.)

      Switching from general to specific…

      Under the theory that the letter EVA-d originated as a lowercase [d] or delta representing the sound /d/, and that the word that seems equivalent to “Taurus” is correct, we have three possibilities:

      A. It still represented /d/ but the pronunciation of some or all cases of /t/ in foreign words had been converted to /d/ in spoken Voynichese
      B. The letter’s use had shifted from representing /d/ to representing /t/
      C. The sounds /d/ and /t/ were not spelled differently and thus probably didn’t count as two separate phonemes.

      In all three cases, the only way to narrow it down would be to find more example words where foreign cognates have /d/ or /t/, and see which Voynich letters correlate most often with each one. That one word, plus the theory of the letter’s derivation, just leaves the possibilities open.

      • Derek Vogt

        editing error above: the sentence that ended with “just that” was supposed to be “just that there’s been only one unpaired sound all along”.

      • Derek, but wouldn’t we expect a medieval person to write the way he speaks, especially in the vernacular? So if a d-sound had become devoiced they would just write a t? Standard spelling didn’t exist yet, and I doubt mental images of words lasted long enough to bridge a major sound change. An exception would perhaps be a case where the original sound is still apparent in some contexts, like Dutch woord, plural woorden. The -d at the end of woord is pronounced as -t, but we still know it’s a -d because of the plural.

        • Peter

          And not to forget, the classic spelling as you know it today came after 1500 with Gutenberg. Before it was written as it was spoken. Each dialect has its own grammar.

      • Derek Vogt,
        May I say I find your comment of October 5, 2016 – 6:43 am – just awesome.


        • Stephen Bax

          Me too 🙂

      • Thanks Derek for the comprehensive reply.

        Actually, Stephen provided several other examples of this t>d mapping in his 2014 paper, namely Coriander [KOORATU] and Centaurea [KNT/?/IRN]. Although he suggested that the shape of EVA:d possibly derived from Greek theta.

        My purpose in raising this topic was not to provide more evidence, since Stephen has already provide other examples, but to try and interpret his results.

        For understandable reasons you’ve chosen a purely linguistic approach to deciphering the VMS, however, I think a lot of iconographic evidence hints to a South German/North Italian origin, so I was intrigued to find this t>d pattern described by the High German consonant shift, having first noticed it whilst looking through medieval German manuscripts.

        I think Koen also makes good case in the Dutch example he describes. The wikipedia page on this topic specifically mentions Dutch, along with Norwegian, Danish, and Swedish as displaying this consonant shift characteristic.

        I think it would be interesting to see if any of the other High German consonant shift rules (of which t>d, is only one of several) can be detected in Voynichese (from either Derek’s or Stephen’s sound/letter mappings).

        Below is a key section from the wikipedia page describing the t>d mapping. I’m unsure precisely when this fourth phase is thought to have begun, but I have seen examples in early printed books from the late 15th-century.

        Of the other changes that sometimes are bracketed within the High German consonant shift, the most important (sometimes thought of as the fourth phase) is:

        4. /θ/ (and its allophone [ð]) became /d/ (this /ðɪs/ : dies [diːs]). However, this also applies to Dutch (this : dit [dɪt]) Norwegian, Danish, and Swedish, but not Icelandic (this : dette [dɛte] / detta [dɛta] , but þetta [θe:ʱta]).

        This phenomenon is known as the High German consonant shift, because it affects the High German dialects in the mountainous south,[4] principally the Upper German dialects, though in part it also affects the Central German dialects. However the fourth phase also included Low German and Dutch. It is also known as the “second Germanic” consonant shift to distinguish it from the “(first) Germanic consonant shift” as defined by Grimm’s law and its refinement, Verner’s law.

        The High German consonant shift did not occur in a single movement, but rather as a series of waves over several centuries. The geographical extent of these waves varies. They all appear in the southernmost dialects, and spread northwards to differing degrees, giving the impression of a series of pulses of varying force emanating from what is now Austria and Switzerland. Whereas some are found only in the southern parts of Alemannic (which includes Swiss German) or Bavarian (which includes Austrian), most are found throughout the Upper German area, and some spread on into the Central German dialects. Indeed, Central German is often defined as the area between the Appel/Apfel and the Schip/Schiff boundaries, thus between complete shift of Germanic /p/ (Upper German) and complete lack thereof (Low German). The shift /θ/ > /d/ was more successful; it spread all the way to the North Sea and affected Dutch as well as German. Most, but not all of these changes have become part of modern Standard German.[5]

        The High German consonant shift is a good example of a chain shift, as was its predecessor, the first Germanic consonant shift. For example, phases 1 and 2 left the language without a /t/ phoneme, as this had shifted to /s/ or /t͡s/. Phase 3 filled this gap (/d/ > /t/), but left a new gap at /d/, which phase 4 then filled (/θ/ > /d/).

        • Derek Vogt

          The only way for a language to experience a Germanic sound shift series is to be a Germanic language. For the Voynich language to be a Germanic one would present a large problem. Germanic-speaking people at that time already had a writing system, so they had no incentive to create a new one, especially not one that’s so bad at representing a Germanic language.

          If the language isn’t Germanic, then it didn’t experience the HGCS. But that doesn’t rule out having its own independent shifts that coincidentally happen to individually match parts of the HGCS, like d→t. But not-ruled-out doesn’t mean it did happen, either, and the default presumption for every possible sound shift should be that it didn’t happen until there’s a real phonetic sign that it did.

          “Coriander/cilantro” has had a strange duality over the /d/ or /t/ since the beginning; the oldest language it can be traced back to, Greek, already had versions with both [δ] and [τ] and even another with no plosive there at all. So that word doesn’t help us figure out which sound EVA-d had. But “Taurus” and “centaurea” have always been voiceless; there’s no language that had “Daurus” or “cendaurea” for the Voynich language to have borrowed them from with a /d/ and then converted them back to /t/. So the use of EVA-d in those words looks like it’s just a letter for /t/ representing /t/.

          That doesn’t rule out the possibility that the same letter represented /d/ sometime before the Voynich Manuscript was written and shifted to represent /t/ along with a shift in the spoken sounds, but it doesn’t really indicate it either. It puts us in the same situation for /d/ as for anything else reasonably close to /t/ like /s/ or /θ/. We have nothing to indicate a prior state representing /d/ but its resemblance to the letters [d] and [δ], not actual sound correlations, and that tells us nothing without a bigger alphabetic picture to fit that symbol into.

          Sound shifts that aren’t associated with changes in spelling are very tricky to spot, and tend to rely on not only familiarity with the languages they happen in but also interactions with other languages. For example, for Greek [φ], the shift from /pʰ/ to /f/ is indicated primarily by a change in its transliteration to the Latin alphabet, from [ph] to [f], because the Greeks were using the same letter for the new sound as for the old one. Our current understanding of the Voynich Manuscript and any back-&-forth transfers like that with anything else is just nowhere near ready for that kind of precision.

          It might help to think of phonetic reconstructions as sort of like variables in algebra. When historical linguists write a particular phonetic symbol that’s inferred rather than attested for the given context, they’d not necessarily asserting that that’s exactly the precise sound. They’re using that symbol as a stand-in for “this sound or whatever other sound could fit the evidence here”. If they use a [t] but the evidence they’re talking about also allows that it could be either /t/ or /d/ or /θ/, then that “t”, in context of a reconstruction, is actually a shorter way of saying “possibly either /t/ or /d/ or /θ/… something like that”.

          • Derek Vogt

            On the subject of that final paragraph, about reconstructed past sounds possibly being only rough indicators of the type of sound that’s needed rather than necessarily exactly the sound that was actually pronounced in the past (so one could make a case that a hypothetical “t”, for example, was really “d”, or the other way around)…

            I was thinking through how to give known examples and adequately explain them, including how historical linguists can still be debating what the actual sounds were when they’re represented in writing with symbols that look as if the pronunciation were already known and settled, without the post getting too absurdly long, when I noticed that I’ve already done something rather unconventional with uncertain sounds myself.

            The usual way historical linguists indicate that something they’re writing is only a reconstruction rather than a known attested sound is to begin the word/morpheme/phoneme with an asterisk. (There is no symbol to mark the end, unlike with quotation marks or brackets, just the one at the beginning.) At one time, when I was looking for a way to indicate possible Voynich reconstructions and figured that conventional brackets and slashes and quotations marks wouldn’t be good enough for various reasons, I invented my own new convention, using ^^. If I had thought of what I was doing in its larger context, as just another example of talking about theoretical past spoken sounds as linguists have been doing since long before I was born, then I would have used the standard linguistic symbol all along instead of inventing my own. So, for example, ^nkhtn^ and ^agntn^ would have been *nkhtn and *agntn.

  5. MarcoP

    On the forum, Emma May Smith recently suggested that the Voynichese prefix EVA:q- might be related to ”Arabic waw. In Arabic, the conjunction ‘and’ is written with the letter waw joined to the beginning of a word”.

    Above in this page (1. Grammatical elements – a possible conjunction), the idea that the most common Voynichese words (EVA:daiin / aiin) represent the conjunction “and” is discussed.

    Is there anything we can deduce from the distribution of EVA:qo- vs EVA:(d)aiin?
    Does one of the two behave more conjunction-like than the other?

    • Derek Vogt

      It looks to me like it’s used too often in general, too often in multiple words consecutively, and too often too close to the beginning or end of a paragraph, to be used to connect sentences or clauses or phrases. But that doesn’t meant it couldn’t be used to connect words and make a list of single-word items. The catch is that we would also need to look for evidence that it was not used that way (so if it was then we’d fail to find that counter-evidence), and I don’t know exactly what that would mean we were looking for, without first having some idea of the meanings of the words it’s attached to and the words before it.

      Also, there’s a general issue with trying to gain information from words’ “distribution” in the paragraphs of the Voynich Manuscript: remember that there are some very strange statistical issues with the whole manuscript, which don’t match any real language overall and have led some analysts to conclude that it isn’t written in a real language. So, if it is any real language, it’s a very odd and unusual way of using it, which even native speakers would probably find awkward and possibly confusing. Think of the most nonsensical, language-abusing, borderline gibberish “poem” you’ve ever seen or heard, and try to imagine something even worse, to such an extent that it would not only be hard for (perfectly literate) native speakers to read but also statistically look like it wasn’t even in an actual language. The usual patterns in word distribution might just not apply when what we’re looking at are not normal sentences.

      Add to that the fact that it’s been at least six centuries since the manuscript’s separation from its closest linguistic relative, and the unfortunate result is that, even if it is just a bizarre application of a real language, even someone who knows its nearest relative perfectly well might not recognize it as anything related to his/her language anyway, with neither the individual words nor the sentence/paragraph structure being recognizable enough to tell him/her how to interpret the other.

    • Darren Worley

      Hi Marco – do you think that the Sanskrit conjunction “ca” could be a candidate?

      Besides being the origin of similar terms in Avestan and Old Persian, it also appears to be related to the Latin conjunction “-que”, which in turn, has gone on to influence Latin-derived languages.

      Is there a European language where “q-” is used as a conjunction? (Curious that EVA:q- appears to be used as a conjunction and it is written like a q too.)

      In some cases Q and C can represent similar sounds (q = cu = qu) could the Voynich author be using q in a similar way, to phonetically transcribe a “cu” sound?

      Edit : I just saw that “qu'” is used as a conjunction in French.

      • Derek Vogt

        That “c” in Sanskrit comes from the same Proto-Indo-European K-sound as Latin “que”, but actually represents the sound that’s usually “ch” in English. That’s what a lot of those PIE K-sounds evolved into in Proto-Indic. It’s represented by the letter [c] alone when we write Indic words in the Latin alphabet, not only because of a preference for a 1:1 relationship between letters & sounds and the fact that the letter [c] isn’t really needed for anything else anyway, but also because an [h] after a plosive is how aspirated sounds are indicated, which would mean if the plain sound were [ch] then its aspirated counterpart would be [chh], which would be silly. As it is, with Indic [c] as a plain unaspirated sound like English “ch”, Indic [ch] is its aspirated counterpart, essentially “chʰ”… at least, under the current standard transliteration scheme. The words that were imported into English as “chakra” and “chakram” break this rule; they would indicate an aspirated sound as they’re spelled, but the original is not aspirated, so they would have been spelled “cakra” and “cakram”.

        This issue has no effect on the relationship between Latin “que” and Indic “ca”, but it’s something to keep in mind if one wants to read foreign words for the sounds that they actually have or originally had. This kind of stuff is also why I prefer to use [č] for this sound, so it looks obviously & clearly different from a plain [c] without adding a superfluous and potentially misleading [h] after it. It’s also why I prefer to see things in their original alphabets.

        (Another issue like this that recently came up here was “Kalde-” as an alternative to “Chalde-“. In the original spelling in a Semitic alphabet, one can clearly see that [k] is the more direct transliteration to the Latin alphabet, but it also ended up as [ch] after a detour through Greek using the letter chi, which normally gets rendered as [ch]. Apparently some Greek writer at some point heard it as sounding like a chi instead of a kappa, so the standard spelling was and has remained “Χαλδ-“, not “Καλδ-“.)


        The Devanagari letters for the sounds “č” and “čʰ” are च and छ. The one you’re dealing with in this case is the unaspirated one, च. It doesn’t seem visually different enough from EVA-q to rule out a connection, or similar enough to really support the idea either. I’m more concerned about the historical mechanism or process that such a connection would represent, than the graphics: I don’t see a way for a single symbol to get transplanted to a new alphabet where the original one is nowhere in sight.

        Also, even if EVA-q does mean what we’re speculating it might mean, that doesn’t mean it necessarily has a counterpart that will ever be identified from anywhere else with the same meaning. “And” is just one of those meanings that easily drifts from one word/prefix/suffix to another. Latin “et” didn’t mean “and” in PIE. English’s “and” didn’t either. PIE *-kʷe lasted long enough to become Latin “-que” but then that fell out of use. Sanskrit had both “ča” and “tu” for “and”, but no modern Indic language that’s available at Google Translate has anything like either of them. The only way to tell whether EVA-q was a conjunction or not is its use in the actual manuscript; comparisons with conjunctions in other languages just won’t help.

        • MarcoP

          Thank you for your comments, Derek and Darren!
          Here are some data that I think could be relevant, but I currently have no idea of how to interpret them.

          Emma May Smith has posted about the very interesting LAAFU (line as a functional unit) phenomenon.

          Interestingly, EVA:daiin and the EVA:q- prefix exhibit very different behaviors when examined with respect to their position in lines. Considering the initial herbal section (f1r-f57r, 9507 lines), daiin occurs in 403 lines, q- in 635: the numbers are close enough to be meaningfully compared.

          daiin appears at the beginning of lines 12.2% of the times (49 times), at the end of lines 15.6% (63 times). These frequencies are compatible with what is statistically expected if the word was unaffected by LAAFU.

          On the other hand, q- appears in the first word of lines 29.9% of the times (190 times), at the end of lines 4.7% (30 times). The q- prefix is strongly affected by the position in lines, with a clear preference for appearance at the beginning of lines: in that position it is more than 6 times as frequent as at the end of lines.

          • MarcoP

            I spotted a significant error in the q- and daiin numbers I posted yesterday: I am sorry.
            These are the corrected numbers for the herbal section:

            daiin tot:404 | initial:50 (12.4%) | final:99 (24.5%)
            q- tot: 655 | initial:191 (30.1%) | final:59 (9.3%)

            daiin has a preference for appearing at the end of lines
            q- has a preference for appearing at the beginning of lines

            f37v exemplifies both cases (but the overall statistics are not as blatant as this specific case):

            Please also note that in f37v daiin appears twice as the last word in a paragraph.

  6. He said “compelling” with a smiley, that’s funny.

    • Stephen Bax

      Hey Nick! It took you only 5 months to crack my extremely complex Bax cipher code and get the joke. Well done! 🙂

      If only the Voynich manuscript had large smileys too, to signal each cipher clue.

      Then no doubt you super cipher sleuths would have cracked it by now, non? 🙂

      • Stephen: when your Voynich arguments become even half as good as your sarcasm, we’ll be able to start a proper debate.

        But you’re not there yet.

        • Stephen Bax

          Just to inform new readers: this post is from Nick Pelling, who wrote a book arguing that the Voynich manuscript is written by an Italian traveller. You might not have heard of it because – so far as I know – no-one else has bought into it. If I am wrong, and you think Nick’s theory is good, please feel free to post your views here.

          When I tried on Nick’s blog to debate some elements of the book which seemed to me to be weak and unsubstantiated, Mr. Pelling banned me from his website completely. I haven’t looked at his site for months but I suppose I am still banned?

          So, dear readers, you decide: who is more committed to the ‘proper debate’ which Mr Pelling mentions?

          Personally, I would rather debate the merits of the ‘cipher theory’ which Nick used to hold. in my view the manuscript COULD be in cipher, but as far as I can see no-one has made any progress on that idea in 20 years, or am I wrong?

          • Peter

            It is just a theory. Whether it is right or wrong is for me only a pure side issue. If it was a traveler, where did he come from and in which language did he write? Where the second is important, I know the language, then I can also say where he came from.
            Sure for me is that he visited the southern Alps, and was written in a code. In my view the basis is Latin. But just my opinion.

            It would not surprise me if the book has improved something with color and sold Rudolf his own book for 600 guilders, which may have brought his mother from Spain.
            Even a theory.

  7. Maryam Al-shhehi

    I honestly have new opinion about Voynich. That Wilfrid Voynich the guy who claimed that he found that manuscript in his library. Once he had full library with all different and rare books it is so easy to him to write that book especially that he had tons of old books around him also the manuscript they are using English numbers and if it was written around 14-15 centuries they would use Latin numbers not English. Anyway if he was working all his life to collect all rare books why would he give away one of his collections so easily. That guy Wilfrid have something to do with the manuscript.

    • Maryam,
      Your theory about Wilfrid Voynich faking it is now associated mainly with Richard Santacoloma, who has pursued the idea energetically for some years.

      His blog can be read online.

      To Stephen,
      This may seem a random notion but has anyone considered Arvanitika?

      • Stephen Bax

        I have never heard anyone suggesting Arvanitaka, but as always it would need extensive analysis to investigate it! Any volunteers?

  8. eric chapuzot

    I’m happy to see you are coming to my way of decoding considering lot of signs are linked more to morphology of draws than linguistics words… and i’m not sure when morphologic information will my removed from code, it remains something of linguistic to see. Too much signs are begining and ending with the lines of the reverse graphic… too much signs are following the edges of the draws… but , by example, when you take the draw of the baths, it’s easy to imagine that the draw depicted up the head can describe something like a name or a blason… an other fact can make your centauri analysis true, if lines are added code on grammar, only deforming few the original sense… or more stronger. Today, i can’t say it at 100%…

    • eric chapuzot

      i can’t help me to stop thinking to compare my morphologic shemes and your words “centauri” that i’m missing a little something… the two approaches are containing good and wrong parts to my mind… it’s natural that i think that i’m closest from truth but i want to proove you are not wrong too… it will be hard…

      • eric chapuzot

        In other words, i think your remarks on the centauri page can be a key to a next level of decoding…

        • eric chapuzot

          i’ve made a first short video to show my beginnings ways of decoding Voynich, lot explainations may follow but i have to begin somewhere…

  9. Louis Woolley

    This also goes as the first picture with the comment

  10. Louis Woolley

    Voynich Manuscript
    I am excited about the work you have done to date concerning the Voynich Manuscript. I am certain you are on the crisp of opening this mystery completely. I have been interested in it for a number of years. Though I have a MATS Degree in Theology and a M.DIV I lack the formal education to completely explore its origin. However; I did notice something rather interesting which may be of interest to you in your research. That is if it has not been explored already.
    I would like to share with you some of my thoughts. Perhaps you may consider them ridicules and I am able to accept that for you are the expert. However; if they do have any relevance perhaps they will be able to help in knowing where to look in order to translate more words.
    I found the pictures quite interesting:
    This seems to indicate on the left a fertile woman ready for inception and on the right a woman coming into her own. The woman in the center would be the distributer of the soul of a person and the entire construct is similar to that of the minstrel cycle and the fallopian tube.

    This next picture seems to be the placement of the seed into the male reproductive organs for the birth of a person.

    In following with your thinking, this could be a religious based science book of its time. This could be a book which explains the solar system, the value of flora and fauna, and fauna and the formation of human birth, and how they all interact. Considering the part of the world the language might be from it could be from a matriarchal society perhaps one which may even lead back to the ancient teaching of Lilith.

    Thank you for your consideration,
    Rev. L.B.Woolley MATS, M.DIV

  11. MarcoP

    Thanks to Job’s “” application, I examined the distribution of two common Voynichese words: EVA:shedy (blue) and EVA:chedy (red).

    Each of the two words appears more than 400 times. The two words are relatively rare in the Herbal and Pharma sections, but in a few plant descriptions they appear several times. They are very frequent in other sections.

    I noticed an alternating pattern in the occurrences of the two words in the zodiac pages:
    shedy: Libra, Leo, Sagittarius
    chedy: Pisces, Cancer

    I tried a more generic search for the two prefixes shed- and ched-

    Considering the 12 zodiac pages, there are 15 matches, on 7 different pages:
    shed- Libra, Leo, Sagittarius
    ched- Pisces, Taurus (dark), Cancer, Scorpio
    All occurrences appear in the rings of text, not in the labels of the “nymphs”.

    According to an ancient tradition (Tetrabiblos I,12), masculine (or diurnal) and feminine (or nocturnal) signs alternate in the zodiac. Other sources (e.g. Manilius II, 150 or the Pseudo-Ptolemy’s Judicia) only classify the signs as masculine and feminine (not diurnal and nocturnal).

    In the following list, feminine signs are in bold:
    Aries Taurus Gemini Cancer Leo Virgo
    Libra Scorpio Sagittarius Capricornus Aquarius Pisces

    All the 15 occurrences of the two prefixes give a consistent match on 7 different zodiac signs. A possible hypothesis is the existence of some kind of equivalence:
    EVA:shed- masculine (diurnal)
    EVA:ched- feminine (nocturnal)

    In Sanskrit, the grammatical “masculine” and “feminine” are similar terms (the word for masculine actually is “non-feminine”):
    astrī f. (with lexicographers) “not feminine” id est the masculine and neuter genders.
    strī f. (in gram.) the feminine gender etc.

    On the basis of Stephen and Derek’s phonetic analyses, a direct connection with the Voynichese words seems unlikely. But it is possible that between EVA:shedy and chedy there is a close and simple relationship, similar to that between astrī and strī.

    • MarcoP

      I realized that there is another possible interpretation, based on the classical association of the zodiac signs with the four elements:
      Aries, Leo, Sagittarius correspond to Fire (hot and dry)
      Taurus, Virgo, Capricorn correspond to Earth (cold and dry)
      Gemini, Libra, Aquarius correspond to Air (hot and wet)
      Cancer, Scorpio, Pisces correspond to Water (cold and wet)

      EVA:shed- could correspond to “hot”
      EVA:ched- could correspond to “cold”

      Considering the high frequency of the two Voynichese words, possibly “hot” and “cold” are better candidates.

      • Marco,
        Are your sources speaking about the constellations which were visible in, say, the fourteenth century during the month nominated by the inscriptions, or are you positing an implied adjustment to suit astrological calculations? What are your thoughts on the inclusion in this section of two goats but no sheep, and an antelope but no Bull with its distinctive straight spine etc.?

      • It would support this idea if the words appeared frequently on the botanical pages. Medieval herbals typically mention the humoral aspects (hot, dry, cold, wet) of plants.

        To test it we can get a list of plants, look up the humoral qualities in a source such as Gerard’s Herbal, and see if corresponding words can be identified in the manuscript.

        • I wasn’t clear, sorry. By “plants” I meant “plants identified in the manuscript with reasonable certainty”.

        • MarcoP

          Thank you for the useful suggestion, Daniel!
          The voynichese search I linked above shows that EVA:chedy and shedy only appear on a few plant pages, in which the two words usually appear together, often more than once.
          Some ancient herbals (e.g. the Tractatus De Herbis) mention the Galenic properties (hot/cold, dry/wet) of most (all?) plants. This would have generated a completely different pattern (only one of the two words appearing in most pages) so your observation suggests that the tentative identification of those words is not likely.

          • The humoral properties of plants in medieval usually include a pair of descriptors. For example, Gerard’s Herbal describes radishes as being hot in the third degree and dry in the second. A small number of plants are listed with only a single descriptor (e.g. milk, which is listed as hot in the first degree).

            I’ve got a list of humoral values for foods from Gerard’s and the combination of hot and dry appears about 45% of the time. However, many of the foods in the list are things like almond milk or pears, and (if I remember correctly) most of the plants in the manuscript are leafy ones. If I restrict my list to just leafy plants then it goes up to 68%.

            So really the distribution is not even and would depend on what plants are included.

  12. Tahqiq

    Dear Dr. Bax,

    why the word-by-word approach is a good thing one should not condemn and why it makes sense to it in everyone’s cerebral cortex along with grammar and syntax to make it a language is well explained in the current issue of Nature Vol. 532, No. 7600, pp. 453–458 as of April 28, 2016. Using certain areas of the cerebral cortex much, much more will bring you to the solution of the Beinecke MS 408. Best regards, Tahqiq

  13. Moi LeNeant

    Dear Prof. Bax,

    I hope you will read my thoughts and questions to you about the Voynich Manuscript and reply using your stunning expertise and always scientific approach, which I greatly appreciate you for, though I should refrain from judging you in any way since I am a “layman” (if that’s the word). Also, I’d like to apologise in advance for any mistakes in my English – it isn’t my native language, but I’m trying my best to put my thoughts into (English) words 😉 Please regard me as someone who has only some basic knowledge about the special and different features of the languages and language families of the world. I know a bit of everything, one might say.

    Firstly, I want to thank you for trying to consider every aspect as to what features the “words” (if they can be called that) of the VM might have and to what kind of language (family) they sensibly could be linked to. From semitic, turkic, indo-european to some language that is considered isolated – you’ve really proved to have a wide and broad view onto the matter, and I am glad we have something in common there. Now before I express my thoughts on the VM I’d like to ask you a few simple and general questions you may be able to provide an answer for:

    1. In the VM, are there any notes in Latin or other known script from the time it’s been produced or after that time (by its past owners e.g.)?

    2. Has there ever been an examination of the manuscript using X-Ray, UV-light, Infrared, Laser light or other such techniques?

    3. Has anybody ever tried to investigate the whereabouts of the manuscript’s missing pages? (As some owners and their residences are known.)

    Now some of my thoughts on this mysterious piece of art almost:

    As an outsider looking at a page of Voynichese script you are strongly tempted to dismiss it as a hoax because of the ridiculous amout of ocurrence of reduplication, the only slight change of words within a line or a paragraph, or because of situations where there is an EVA: “o” at the beginning of almost every word in a line for example. And sometimes the same word isn’t directly reduplicated but then densely clustered in ridiculously great numbers in a paragraph. All in all the text gives an repetitious impression.

    However, you and many others have come up with sensible and rational explanations and I am very thrilled to see how you try to spot grammatical elements, i.e. suffixes, prefixes, conjunctions etc. This may be a fruitful path to go because it can be an explanation for several patterns present in the text. In fact, we already are familiar with these patterns from inflectional languages like Latin and German. You mentioned that the 9-shaped character often stands at the end of words. When I was looking at a line in the VM where this was the case (and it’s quite often the case), I immediately had to think of this:

    “Wir malen einen hölzernen Fensterrahmen.” (in Eng.: “We’re painting a wooden window frame.”) In this German sentence the ending ‘-en’ catches one’s eye. First it marks the 1st person plural of the verb “mal-en” (“we paint”), then the indefinite article “ein-en” (“a”) in its masculine singular accusative form, as well as the adjective “hölzern-en” (“wooden”) to agree with the noun “Fensterrahmen” (“window frame”).

    It is obvious here that in inflecional languages a morpheme doesn’t only correspond to one function as they would do mostly in agglutinating languages like Finnish. In Latin endings like ‘-us’, ‘-i’, ‘-ae’ can well occur on words in a row and they do not correspond to one single function. (“puellae” can be genitive singular or nominative plural, aso.) And this is true for most languages in Europe. There are suffixes that are more frequent or common and hence appear often and, as the German example shows, can appear even on many words in a row. This is just to show you, Mr. Bax, that I think it’s a way to go. But who knows? Maybe these repetitive prefixes, suffixes, etc. are even some kind of subject, object or possessive markers like the Japanese language uses them (“wa, ga, no…”)? The “o” that precedes many words might be a negative prefix like English “un-“? This things are to find out. You have to look in every direction after all.

    A huge problem here is, in my opinion, that we can spot too much. We think we can see the reduplicating features of Armenian or Malay, or signs of articles as they can be found in many many languages in different forms, application and usage. North Germanic languages attach the definite article to the end, but the indefinite they put before the word, Romanian attaches them to the end and German has them in various forms before the noun, adjective, and so forth…

    From Japanese via Latin, Greek, German, Basque, Arabic, Hindi, Xhosa to even an American polysynthetic language you really can see hints of everything, whether it be the script or the sound to a character you think you’ve found out. Yes, at first sight, and surely also at the second sight the text of the VM seems to have a lot of inflectional features, but then again you cannot be sure, ’cause there are some unusual characteristics to it for an Indo-European language.

    So to conclude this, maybe we should look at what IS NOT in the text? At this point I have become quite frustrated about the Voynich Manuscript. So I’d be glad for your reply.

    Best wishes,

    Moi LeNeant

  14. TC

    I wonder if there has been any major attempt to analyze the “grammar” of the VMS, especially to pinpoint things like noun / verb endings. Before Michael Ventris gave a single phonetic value to any Linear B glyph, Alice Kober figured out that it was an inflected language. If we do the same for Voynichese, we could thin out the number of possible underlying languages in a significant way.

    I’ve been working with “qok-” and “qot-” words to see if their endings could be an inflected verb paradigm (think Latin: amare, amaveram, amavissem, amabitur…). Just a thought:

  15. Stephen Bax

    I forgot to mention a further point about the third grammatical element I mentioned.

    This symbol resembling a 9 is identical to the Arabic word for ‘and’, pronounced ‘wa’, frequently attached to a following noun. It could be a direct borrowing, but if it is pronounced as ‘wa’ then it does not fit with our analysis in other words, unless it is a nasalised ‘wa’ of some sort.

  16. Notula

    @Stephen Bax
    Thank you for the link. I’m only working on the VMS since its official dating and I’m not so familiar with the decade long discussions.

    Yes, my point is in a certain way dogmatic because I’m on the other hand very familiar with the culture of the Quattrocento and its influence on Europe as a whole.
    Taken into consideration that, during that period of time, an enormous thirst for education produced an also enormous amount of books, one should not forget about the script itself and its development along a line of rules. From my point of view the VMS is all the more a curiosity or rather a paradox with its obvious pictural content and its text hidden behind a unknown script, but its handwriting also follows the rules.

    And no, I don’t mean we shouldn’t compare other documents. Comparison is, of course, an important part of scientific research (I’m doing it myself very often), but it doesn’t make allways sense. What I mean is that the identification of plants is not that much important, as long as not one single word is decoded.

    I shoud probably apologize in advance for upcoming misunderstandings since I’m not native in English.

    • MarcoP

      Hello Notula, you wrote that the VMS is “a paradox with its obvious pictural content and its text hidden behind a unknown script”.
      In case you have not seen them before, a few manuscripts by Giovanni Fontana are similarly paradoxical. Stephen briefly discusses the subject here.

      • Notula

        Thank you MarcoP!
        I know Fontanas fascinating work very well, but I don’t see it as paradoxical as the VMS. The rapidly growing in warfare technologies combined with the terrible local wars between the noble families and their demands for power are certainly the main sources for the flourishing art of cryptology. The VMS has surely nothing to do with that kind of manuscripts.

  17. D.N. O'Donovan

    Thank you for a very interesting post. For someone like me, who considers the written part of the text a combination of maze and potential mine-field, it is fascinating to read analysis not affected by such external factors as personal ambitions, limited horizons and theoretical histories.

    About Romany. At one stage I was intrigued by the history of the Romany, and especially their earlier links to the Indus valley, the Yemen, the region about Baghdad, and the existence of truly ‘Egyptian’ Romany.

    My understanding is that there were originally three distinct forms of Romany dialect – that is before about the twelfth or thirteenth century when the first Latin Christian records of Romany arrivals occur. (one very interesting account speaks of a Romany leader as named ‘Thomas’ who gave his place of origin as southern India).

    Have you considered the degree to which earlier regional influences might be reflected in the later grammar and syntax of recorded Romany speech? Also – I believed we had no example of written or spoken Romani until relatively recently. Is that true?

    And finally, I wonder if you know of any sources for Cuman?
    Thanks again for an interesting post.

    • Marjukka

      I must say I’m starting to go for the Romany language theory to the VMS. Here are some of my reasons in addition to the things some of you others have said. Bare with me here:

      I started looking at Voynich some months ago (so I admit at being a newby). I have to say my first thoughts were Greek, Slavic, even Latin or English (due to the similarity of handwriting to some of the writings from same period in England).

      There is even a trait pointing to typical writing in that era where letters are combined or endings shortened (like -us, -ion, -arum to only one specific letter).

      Somehow it seems that the drawings and the style of letters point to similarities with other manuscripts.

      So what if the writer is someone who tries to write in a language which has only existed in oral form (as many of you who have researched it have said)

      What if this person has had access to other manuscripts to see the drawings and writing, but hasn’t actually known how to read or write in those languages. So he has invented an own written language to write down the stories in his own language.

      Romany would not have had a written language in 15th century and definitely not many Roma-people at that time would have known how to read or write as most of them were just coming to Europe throuhg Greece from Anatolia where they had been speaking a language made up of different languages, a military koine language which would develop to Urdu eventually), see

      There were however some individual Roma-people who were part of the “sivilization” for example Stefan Raztvan who lived 100 years after the VMS would have been written, but it seams there were others who had come before and were citizen of Ottoman Empire there for not to be enslaved.

      Until 15th hundreds before the Romany people started majorly to disperse to other European countries they were speaking quite unified language with loan words only from those areas they had been to, most recently Slavic languages and Greek.

      So there is an explanation which would explain all the similarities, even to Urdu.

      This would also maybe explain the mixture of things even some wilder explanations like some similarities in tradition and loans made by some others, even so far as to go to the Finno-Ugrian or Coptic ends. Given the Roma people’s professions and it might even explain the magical, mythical and alchemy influences , and also explain how it ended up to Emperor Rudolf. Most likely it would have then been written somewhere in modern Croatia or nearby.

      I have nothing to prove these ideas specifically, but this was something that came to mind when reading up on both the words and etymologies found from VMS as well as the history of Roma-people in Europe

  18. Notula

    The first words in the paragraphs are NOT the plant names since they all beginn with almost the same character, i.e. one of only two chraracters used for almost every paragraph. It is not very logical for a herbal to show only a range of plants whose names starts with character A or B. Whatever the underlying language is, this assumption makes no sense.
    It is also absolutely senseless to compare plants from different herbals even if they belong to the same historical time frame. Every one specialized in medieval manuscripts knows how far from their origins copy after copy was produced and how they changed with each copy over the time.

    • Stephen Bax

      Thanks Notula, but I think you are being too dogmatic about it. Your point about the first letters on those herbal pages was made also by Nick Pelling, and you can see from my answer to him
      here that this issue is far more complex than you imply.

      As for your second point, I don’t really understand what you mean. Do you mean we just shouldn’t bother to look at other documents?

  19. I fully agree that most people pretend like grammar doesn’t exist, or exploit it to their advantage in the most absurd ways.

    i wonder, in your own study of plant names – the first words in the paragraphs – to what extent did you consider the possibility of deformation by grammar? Or did you assume the words would appear in a subject position – hence probably nominative?

    • Stephen Bax

      Yes, that is a natural assumption. If I had more time I would try to track more of these words (the ones in initial position on the plant pages, or others which seem likely names) through the manuscript, trying to see if we can find any grammatical inflexions, which might be prefixes, suffixes or even infixes.

  20. Derek Vogt

    I can easily see someone objecting that none of that stuff you just wrote really establishes the language’s grammar, any more than some other people’s “translations” have done so. The difference is acknowledgement of the process and where we are along it. It’s better to admit not knowing what is unknown while suggesting little bits of progress along the way, than to declare that you’re finished just about as soon as you’ve started without having actually even tried to do anything about it yet.

Leave Your Comment

Your email will not be published or shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>