Voynich: the punctuation problem

One issue which has long interested me concerning the Voynich manuscript (VM), and which has not perhaps been researched as much as it should, is what we can call the punctuation problem.

Obviously the script is noteworthy for having no obvious punctuation, which is rare in itself. However, as a linguist what then interests me is how the reader could know where the ‘sense-units’ begin and end? If we assume that we are dealing with a natural underlying language, the reader would have to have signals of some sort, in the absence of punctuation, as to where the sense endings would be, especially on pages containing lengthy chunks of text. Consider this text from f25v. Is it a single sentence? If so, it is quite long. Does it contain any dividers of any sort? Almost certainly, but what and where?



Although historically many scripts had little punctuation, they almost always had instead some form of ‘discourse marker’ to help the reader to follow the writer’s flow of ideas. An example from classical Arabic, which had very little punctuation, but tended to connect long sentences with ‘and… and’ , is the word ‘hal’, an essentially empty word signifying that the following sense unit was to be read as a question. It had no translatable meaning beyond flagging up the function of the sentence as a question. It is what we term a ‘discourse marker’, no more.

It could be that the Voynich script uses a series of such markers, along with other devices, to flag up to the reader certain aspects of the discourse. In this post I want to consider a number of possible discourse and textual markers in the VM, of different types, which seem to me interesting.

1.  Line breaks: From pages f103 onwards we see many pages in which the ‘sense units’ seem to be clearly demarcated by simple line breaks, with each new sense unit demarcated by a star on a string or a flower. See e.g.:


This might seem too obvious to mention, but if we assume, as seems likely, that each ‘paragraph’ of these pages represents one ‘sentence’, then closer examination could elucidate some properties of the typical Voynich ‘sentence’, and even its sentence structure.

2.  Specific signs: Some of the the famous ‘gallows’ characters’, in particular those transcribed in the EVA system as ‘p’ and ‘f’ do appear to act as initiating discourse markers, flagging up the start of sections. They are often decorated, which seems to add to the possibility that they are discourse ‘flags’ of some sort.

As Currier noted years ago, “[t]hey ( p , f ) appear 90-95% of the time in the first lines of paragraphs, in some 400 occurrences in one section of the manuscript.”. This in itself implies that they are being used to indicate or highlight the first line of a text. More to the point they occur 107 times as page initial (93 pages with ‘p’ and 14 with ‘f’). Since it is highly unlikely that the author could find actual words beginning with these letters to start these pages, it is plausible to suggest that the symbols are perhaps semantically empty markers used simply to flag the start of a page or section, just as we use a semantically empty full-stop to indicate the end of a sense unit.

If you use the wonderful new tool at www.voynichese.com you can see that Currier was largely correct with respect to EVA ‘p’ and ‘f’. However, note that EVA ‘k’ and ‘t’ are far more common and seem not to be limited so much to the first lines, which suggests that they are far more than simple ‘initiating markers’ .

3. Lexical discourse markers: Another possible discourse marker, which could act to indicate to the reader the ways in which ideas link together, could be the most common word in the manuscript, transcribed in EVA as ‘daiin’. In my 2012 paper I suggested that this word might be the equivalent of a comma, or ‘and’. Let me revisit some of that paper here.

My argument started from the last page of the manuscript (116 v), and the analysis offered by Johannes Albus at the Voynich 100 conference in Italy, in which he argued that the text is a recipe in Latin and German, with two words in ‘Voynichese’. He explained that the text prescribed a way of using Billy Goat’s liver as a remedy for wet rot, a skin condition, and his analysis was supported by numerous examples from contemporary recipes and other sources, as well as by reference to the picture of the goat and liver in the margin. I reproduce the original page here.

last page


Albus’ transcription and gloss is as follows:

Transcription with abbreviations and omissions in square brackets

L1 poxleber umen[do] putriter.
L2 + an[te] chiton olei dabas + multas + t[un]c + t[an]ta[a](?) cer[a]e + portas + M[ixtura] +
L3 fix[a] + man[nipulis] IX + mor[sulis] IX + vix + alt[e]ra + matura +
L4 … … (two ciphered words) pals [ein]en pbrey so nim[m] gei[s]smi[l]ch O


Translation (Johannes Albus)

Billy goat´s liver for wet rot
At the membrane you gave oil, then you bring a lot of the much(?) wax, in a
fixed mixture: 9 hands full, 9 morsels (from) the only just double mature
… … (two ciphered words), squash it into a paste, then take goat´s milk.

The fact that the text contains two words in Voychinese is significant, since it means that it was not simply a later addendum by an unrelated scribe, but is linked at least tangentially to the rest of the VM. As such it could serve as a help to its interpretation, for reasons we can now consider.

I won’t consider here the German/Latin aspects here, but if we examine Albus’ interpretation we note that the prescription has a clear structure, starting with the heading on line 1 which indicates the nature of the preparation and also its medicinal use. Line 2 and the start of line 3 offer an instruction with verbs in the second person, namely ‘dabas’ (imperfect or future of ‘dare’ to give)  and ‘portas’ (present of ‘portare’, to carry), although why the tenses are different is unclear. This is followed in line 3 with further ingredients and quantities to be added, with Line 4 offering the two Voychinese words, followed by further instructions in the form verb + noun.  The words have been transliterated as ‘oror sheey’ (Palmer 2004, http://inamidst.com/voynich/michitonese).

I have suggested in my February 2014 paper that the word transcribed as ‘oror’ could refer to juniper, but what interests me here is the ‘punctuation’.  Note that the text in 116v is divided up into sense units separated by a + symbol. These do not divide words, but larger units of meaning, so for example the words in “an[te] chiton olei dabas in line 2 are not each separated by crosses. It is not always clear to the modern reader why the sense units are separated in this text (e.g. why ‘multas’ and ‘tunc’ form separate units) but what is clear is that the author considered it important to indicate these separations specifically, using a cross, in addition to leaving spaces between each word.

This kind of sense-division on f116v is – I suggest – arguably the same as the function of ‘daiin’ on other pages, such as on f25v reproduced above. Look at the page again with ‘daiin’ highlighted (thanks to www./voynichese.com):



The element ‘daiin’ is repeated not only in the middle of the first four lines, but five more times. Considering the fact that this is the most frequent item in the manuscript as  a whole, this frequency was perhaps to be expected, but note that it is never inflected in any way, whereas it follows words beginning with ‘ch’ which clearly do inflect in some way, such as ‘cho’, then ‘chor’ then ‘chol’ and so on.

I suggest that that the most probable function of ‘daiin’, frequent as it is, yet not changing, is as a kind of divider between sense-units, a discourse marker indicating to the reader the sense break. In plainer language, the  function of ‘daiin’ is simple but important – it acts much like the word ‘and’ or a modern comma.

‘Daiin’ might have a literal meaning, but that is fundamentally unimportant in functional terms, since its essential function here seems to be to show the reader where a small sense-unit ends. In some cases it is doubled (as in folio 25v, line 5) perhaps to signal  a more substantial sense-break,  more like a full-stop. (This doubling occurs 17 times in the manuscript, with one tripling on folio 89r2.). But usually it seems to act as an ‘and’ or comma dividing individual sense units.

If this is so, it goes some way to answering the question posed above about how a reader would break up the text in the absence of any other punctuation marks. ‘Daiin’ gives the reader a clear guide as to how to recognise the start and end of short sense-units.

We can take this further by comparing f25v with the ‘prescription’ as analysed by Albus on f 116v. This is still speculative, of course, but we notice that if we take ‘daiin’ as a sense divider, the resulting structure resembles the prescription analysed by Albus, as follows (transcribed in EVA):


 From Bax 2012

Although this is speculative, it is possible that this text is a prescription, with the high incidence of daiin markers in the middle of the text indicating different ingredients, mirroring the high number of crosses in the middle of f116v.

Observation of the original text suggests that the singe character which has been transcribed as ‘s’ (in ‘s okeeaiin’ line 2) does not look like other characters transcribed as ‘s’. but rather resembles the Arabic numeral ‘2’, so it could in fact be a number for a following ingredient. However, this possibility requires more translation of the underlying language in order to evaluate it fully.


4. Modified characters: Returning to the question of punctuation and discourse markers, a fourth way in which the Voynich script might signal divisions between sense-units is through the use of particular characters. I suggested in my Feb 2014 paper that the character transcribed in EVA as ‘m’ might be a variant of the ‘r’ character, varied in order to mark the end of sense unit in some way.

In a recent posting on this forum, Cosmo offered some interesting support for this view, using the new tool at www.voynichese.com. Cosmo said:

Stephen, I think there is strong evidence for your suggestion that the transcribed “m” is a final form of another character – “r” being a good candidate – because of how often it occurs as the last character, particularly in the latter folios.


When “m” occurs within a line, it’s usually at the end of a word, for example:

There are a few occurrences of “m” within a word but these are quite infrequent and in some cases unclear.

IMO the consistent usage of “m” as the last character in a block means that it is either an embellished character or an abbreviation. The only question is why it does not occur more often, given how common “r” terminated words are.

For example, see this plot of “r” terminated words in yellow and “m” terminated words in blue:

Naturally I agree ( 🙂 ) and I am most grateful to Cosmo for showing so graphically the way in which the character seems to operate as an end marker.

To conclude the discussion of modified characters acting as discourse markers, it may well be that other characters could also act in this way – for example EVA ‘g’ as a variant of EVA ‘d’:


However, this would need more research.

To sum up [how about that for a discourse marker], it seems to me that the Voynich manuscript does contain various devices which act as signals to the reader regarding sense unit boundaries. This is to be expected if we are dealing with a  natural language (or indeed a cipher which imitates natural language patterns).

In my view is also a further strong reason why the Voynich could not be a gibberish hoax, but that is a story for another day!






  1. Peter

    How i work

    • Peter

      A dictionary in order to reconstruct a meaningful sentence.
      Image too big

      • Peter

        The author of VM was not stupid surely.
        The match I must refrain even with the encryption of a simple system.

        My opinion: The key consists of a number of ways.

        1. Individual characters
        2. Sign in combination
        3. Before and suffixes
        4. abbreviation (classical Latin)
        5. deception
        6. And finally the language, for example. Medieval Latin (vulgar / dialect)
        to make matters even more difficult.

        Words from the latin, italy, spain, france, Portuguese, Galician ………
        must be taken into account are to ever be able to achieve a success.

    • WilliamF

      Hi Peter,

      The ending/suffix idea is interesting to me because I have been thinking the same thing. But do you think a person would need to write “all” so frequently?

      • Peter

        Hi William
        The writing of endings and abbreviations was very common at that time. If you look at times the old books, you will quickly notice how many times they are available.
        If you get the amount of characters look at because mus be simply more than just a simple alphabet.
        Then certain signs occur only behind, or alone. Just as it looks forward from the (o). I can not believe that a vocabulary of 80% with the same letter begins.

        But I think the Latin the first reference point of the solution concerned. Finally, it was once the official language in a large kingdom, and I can not think that there have still kept a lot of words ……. in whatever language.

  2. Derek Vogt

    The use of a letter like Voynich ^b^ to mark the beginning of a statement does not even need to be separate from its phonetic use; it could have its punctuative use because of something that it represents the sound of. Not only can languages can have some words (usually relatively short ones) that are especially likely to be the first in a statement, but sometimes they can also attach such words to whatever else is adjacent. This is a known tendency in various older Indo-European languages, with Anatolian languages such as Hittite being a particularly drastic example. According to “Indo-European Language and Culture” by Benjamin W. Fortson, IV:

    Characteristic of Anatolian sentence structure is the clause-initial clitic chain: all clitic sentential conjunctions, clitic sentential adverbs, and clitic pronouns were attached to the first word of the clause in a particular fixed order with the whole chain written as a single word.

    “Sentential” means “pertaining to the whole sentence (instead of just part of it)”. “Clitics” are short bits of sound with meanings that might equate to independent words in some languages but are only found attached to other words in other languages such as Hittite. What this section is describing is adding not whole words but pronouncible word fragments to the beginning of a sentence, creating a compound word together with whatever else would have been the first word otherwise.

    Thus in Hittite one can begin a sentence kinunmawatuza, where kinun means “now”, -ma is a clitic adversative conjunction meaning “but”, -wa is a quotative particle indicating that the sentence is quoted speech {closest single word in English thus being “quote” or “verbatim”}, -tu is the second person singular accusative pronoun, and -za is a reflexive particle…

    Combined with putting verbs at the end, that’s the equivalent of expressing English “It is now said, however, that you brought this fate on yourself” as
    Nowhoweververbatimonyourselfyou this fate brought”.

    I don’t see anything that drastic in Voynichese, but looking at ^b^ as a statement-initial proclitic attached to what would still be the first word without it does explain a couple of other things as well as the frequency of ^b^ at the beginnings of paragraphs.

    Consider pages 15v and 16r, where three similar words appear, as Bax first described with the identification of the juniper: ^barar^, ^arar^, and ^garar^. Until I really considered the case of page-initial ^b^, I could only explain two of these forms. The original sound in Hebrew and Arabic would be /ʕarʕar/, and /ʕ/ tends to get dropped or converted into other sounds, typically ^g^ or combinations involving ^g^, and there is also a tendency for initial consonants followed by ^a^ to get dropped from the beginnings of words. So getting ^garar^ and ^arar^ from /ʕarʕar/ are both to be expected, but that left ^barar^ out, with no apparent reason in phonetic evolution to add a /b/, or for such an addition to be inconsistently applied. Consider ^b^ a separate pronouncible element that gets added to other words at the beginnings of statements, and it all fits together perfectly, because ^barar^ is the first word on its page, whereas ^arar^ and ^garar^ occur later on the same pages.

    And another similar example to that is 21r, identified as purslane, which is “perpehen” in Persian (“frfħyn” in Arabic because all /p/ in Arabic shifted to /f/ long ago). The page’s third word is ^ãpnhon^, and almost the same word occurs again as the first word of the second paragraph… but this time in the form of ^bhãpnhn^. So, except for one vowel being written in one case and unwritten in another, this looks like exactly the same word, with a ^bh^ added when it starts a paragraph and nothing added when it doesn’t. At first, while it seemed clear that the latter was just an example of initial consonant-dropping, I didn’t know whether to attribute the ^bh^ to a slight mutation of the initial /p/ (sometimes dropped and sometimes preserved) or think of it as a prefix. But this statement-scale interpretation gives me a bigger context to put the idea of ^bh^ as a prefix into now, and allows a single consistent pronunciation of the root word (with the initial consonant /p/ dropped) in both cases. (The same principle of consistent root word pronunciation would also be satisfied in ^garar^ if the ^g^ is also a prefix, but that’s not needed or indicated by anything else I know of for now.)

    Also, aside from ^b^ at the beginning of a word at the beginning of a statement, consider ^n^ between words within a statement, on folios 6r, 6v, and 17r:

    ^pawr n xaš hašar^
    ^kãwy n $wr hõokwr^
    ^twrwr ntwrhar^

    The first and last word in each line here would appear to be cognates for the plant names: “papauer” and “hašhaš” or “xašxaš” for the first, “xarwaʕ/kharvay” and “kerčak” for the second, and “tarxun” for the third. That would make ^n^ something that goes between alternative names for the same thing, regardless of whether it’s attached or independent. I don’t know of a cognate for it (nor do I expect to find one because that’s especially difficult and problematic for this kind of word), but ^n^ occurs 151 times alone and 1757 as the first letter of a word with at least two letters, so there’s plenty of space for it to have a simple common meaning such as conjoining alternatives, like our “or” or “also” or “alias”.

    Of course, ^$wr^ and ^xaš^ are also candidates for conjunctions or parts of conjunctive phrases like “also called”, along with the latter’s near-homophone ^haš^, which occurs between what look like names for the same plant in folio 38v’s ^ãkhãb haš xagẽs agões^. And they all occur a few hundred times. But, as with ^n^, I haven’t found clear cognates for them, nor would I have expected to yet.

    • Stephen Bax

      Thanks Derek, as always really insightful and detailed. In particular, I really like your suggestion about what you term the Voynich ‘n’ sign (EVA ‘y’) possibly being “something that goes between alternative names for the same thing”, i.e. with a meaning something like ‘or’, as in

      ^pawr n xaš hašar^
      ^kãwy n $wr hõokwr^
      ^twrwr ntwrhar^

      (To those new to this discussion, this ‘n’ is the symbol which looks like a figure ‘9’. See Derek’s scheme at https://stephenbax.net/?p=1550)

      I have for a long time wondered if that character might be some sort of conjunctive, on the basis of the first line of f3v, of which the first word is repeated towards the end of the line with this character in front of it, apparently attached. So your suggestion seems to me plausible.

      • Stephen Bax

        See the attached for examples of the EVA ‘y’ symbol as a possible conjunctive, as Derek suggests, on the first lines of folio 6r and 3v.

      • Derek Vogt

        Funny that you mentioned “detail”…

        I have been collecting my various scattered Voynich ideas to try to organize into a single presentation, not for here but for YouTube. After snipping off a lot of the little sidetracks to keep it as streamlined and simple as I could get away with, I thought it would make a video 5-10 minutes long. But, once I had made what I figured would be about half of the slides, I timed myself talking through them, and that was about 20 minutes. Nobody would ever watch the whole thing! 😮

        Maybe I can sacrifice some more “details” and get it down to something I can split into two separate videos which I might be able to get down to 15 minutes apiece on separate aspects of the whole process…

    • Derek Vogt

      Modern Kalderash Romany for “now” is “ap”. Its cognates include Hindi अभी “abʰī”, Nepali अब “aba”, Bengali অব “aba”, and Urdu ابھی “abʰy”. Back-translations from some of those into English also yield “until”, “yet”, and “of”.

      • Derek Vogt

        Another good translation for a common sentence-starter or paragraph-starter would be “certainly” or “surely”… and Romany for that is “ba”.

  3. WilliamF

    Thanks Stephen for an interesting site. The comments on the last page of the VMS about the billy goat remedy were interesting.

    If the 8 was an “n” instead of an “s”, would that help the conjugation remain consistent?

    Attached is a proposed sound set for the Voynich characters. This is not necessarily haphazard, but there is quite a bit of uncertainty. I have been working on statistical matches and actual word matches for this. This is where I came to the assumption that it may be written in Galician. Since I don’t speak it, I have only Google translate to blame. And that is subjective, at best.

    But, if somewhat correct, and following on with the German/Latin logic proposal, this would suggest the two unknown words on the last page may be “ae(r)” and “quus”. Together they make aequus in Latin (equal) suggesting 9 smaller bits from the liver of a billy goat that is twice as old as the first would be mixed in. I suppose if you have wet rot, getting the remedy correct would be important. Not sure what wet rot is, though.

    The attached sound set assumes there is a difference between upper case and lower case letters. These seem to work when applied to the words in the segments in f67r if they are assumed to be months. If I can attach this image, I’ll try to attach another with the breakdown.

    • WilliamF

      OK, that made it. Attached is a breakdown using the above rules for the words in the pie slices of f67r (the left page). No assertions of accuracy. Just looking for a meaningful text.

      • WilliamF

        OK, one more, Stephan, that uses the sound rules applied to your list of stars. I can’t say they match anything for certain. But it may look familiar to someone else. It’s worth a shot.

      • MarcoP

        Hello William,
        I think this approach is definitely worth trying to test a hypothetical phonetic system.

        Your treatment of the EVA sequence ‘ot’ is not clear to me. For instance, at the beginning of month 1 it reads ‘i’, at the beginning of moth 2 it reads ‘f’. I see that this is listed in your first image, but I think I need a more detailed explanation of how that works.

        • WilliamF

          Hi Marco,

          Thanks for asking.

          I am assuming the distinction between an “I” and an “F” would be with the “l” or “y” that follwos the vowel immediately after the “ot”.

          For example, “Ia” would be written “otal” or “otay”, while “Fa” would be written as “ota”. The same for something like “Ie”, which would be “otel” or “otey”, while “Fe” is “ote”.

          I am not certain about the overlaps, like “Fal” or “Fel”, but an “F” may be indicated by dropping the vowel. That is “otl” would be “F-l” with the vowel being guessed from context.

          I had used the dash (-) in the definition of “I-” and “J-” as a wildcard vowel. But I might not have been clear on it’s use.

          • MarcoP

            Thank you William, this makes things clearer. I guess that a system like that would be a cipher, since it seems to make reading deliberately harder.

            • WilliamF

              Hi Marco,

              I suppose you could call it a cipher. But only for a few characters. It might, alternatively, be considered a spelling rule. I am not sure if a cipher is when all sounds are intentially changed to a different symbol, or if it includes an alphabet with just a few changes or overlaps. In my head, a cipher is intentially made to be secretive. While spelling rules were created to frustrate elementary school students.

              Specifically for the “F” sound, I had read that in Spain, the H was slowly replaced by F in certain words. (Maybe it was the other way around?) Anyway, if the first two months of a calendar were written as Hanes and Hebes, someone might understand them at the time as Ianes and Febes. But that wouldn’t explain the “l” after the “a” in “otalDy” or the “y” after the “u” in “oteeyDar”. So there seems to be a reason for their placement. My first impression is they modify the starting letter. That way the “ot” in Ianes can be different from the “ot” in Febes.

              My assumption is that they are the names of the months. But that isn’t certain. So I may be reading into something that is completely erroneous.

              Ideally, you can pick a spot in the manuscript and write down the sounds that correspond to the letters. If full sentences of words make sense in a particular language, without too much guessing, or reading into it, or adding absent sounds that aren’t written, then it may be functional. If not, then the alphabet I proposed may be completely wrong. It could also be too open-ended to be valid, with too many variations on sounds for a given word to be reasonable

              I have seemed, with some struggle, to be able to come up with full sentences. But I may be wishing too hard and deceiving myself.

              • MarcoP

                Thank you William,
                I agree that assuming that those could be month names is a reasonable hypothesis. As you write, when one will manage to identify the correct alphabet and the correct language, the text will be readable so it will be clear when a solution has been found.
                Unluckily, the language could be something very exotic.

                • WilliamF

                  Hi Marco,

                  I agree. The drawings are exotic and the letters are exotic, so it is reasonable to expect the language may be exotic.

                  I have started reading the articles on this site over the past week or so and found you have posted a lot of papers with substantial supporting evidence. That is very encouraging to see. You really do take time to show why you write what you are writing. And you present a lot of connecting information I have not seen anywhere else.

                  Would you be willing to apply the alphabet I suggested to any region you deem to be a sentence in the VMS, without any visual reference (no drawings on the page) and see what you come up with? That would be a fair application of the alphabet. If you have a question about the sounds, or spelling rules, let me know, but don’t tell me where it is. Then, when you are done, see if it makes sense to you when you paste it into Google Translate for Galician.

                  To be honest, I don’t speak that language, but it has seemed to work the best for fit. But there are a lot of unknowns about how the translation algorithm is done with that application, specifically for word fit and grammar, but it is all I have had to work with. So it is, so far, a blind method.

                  If you would like to try this experiment, it would be a secondary test, which may persuade me in correcting my reasoning or help me modify my methodology, or abandon it altogether. I have nothing to loose.

                  If you like, to make it simpler, pick a short word sequence then see what you get. You pick the number of words. Let me know what your results are.

                  I do not know you, and by no means do I want you to think this is an assigned project. Only a simple experiment. If you don’t feel it is worth the effort, I can post my interpretation of the 1st paragraph of the 1st page and then you can review it for methodology and consitency. I am just hoping you will try to find a few minutes to vet the idea and show if my methodolgy is too loose.

                  I found this page to be useful:

                  • Peter

                    Galician alone is there probably is not enough. I must consider all art relative fall languages.
                    The plant tell me but that it must be to the Alps.

                    See Link, same page

                  • MarcoP

                    Hello William. the application of the phonetics you propose should be a purely mechanical activity, so I don’t see any advantage if someone else does it in your place. Ditto for pasting the output into google translate. I think that what would be actually interesting is if you submitted your phonetic readings to Galician speakers and see if they can recognize their language.

                    • WilliamF

                      Exactly Marco. Ideally, a good alphabet allows anyone to do a translation without too much personal interpretation.

                      Unfortunately, I don’t know anyone who speaks Galician. So I will try to post an image of my effort of a portion and see if, maybe, someone here runs across it that speaks Galician, and it makes sense to them.

                    • WilliamF

                      Hi Marco,
                      Below is a screen capture of the excel file of my effort of translating the first paragraph on the first page (f1r).

                      Hopefully, the way I have broken down the sounds corresponds reasonably to the lettering in the VMS in the 1st paragraph. The sounds written under the VMS script is based on the alphabet I posted a few days ago. The words under the sounds is based on Galician words from Google Translate, and the verb conjugation site I posted above.

                      Since punctuation is vague, as this blog page has discussed, I took some liberties with commas and periods.

                      I’ll post the line by line word summary next.

                    • WilliamF

                      Here is paragraph as written in Galician.

                    • WilliamF

                      And here is the Google Translate screen capture.

                      There are some things that make it difficult to read. I had put “fac” for “fas” since, if you translate it line by line, “fac” translates as a conjugate of “to do” rather than the actual word “fas”. That is “fac orifici” should be “fas orifici” or “make a hole”. “Miras” is lost in the translation, for some reason, but means “you look”. A sentence by sentence translation in Google Translate works better than when doing an entire paragraph at one time.

                      A cleaner translation might be:

                      “To those in spathe and fading, he wants to come. Galen is coming. He emerges for you and offers something to those who hurt, hurt in their body. In the places that hurt, do more. In those spaces will be badness. Make a hole. Doing that lets it pass peacefully. Look, most of the ferrocious air goes out. The scene is influenza.”

                      Anyway, it looks like some reference to Galen and a suggestion to cut a hole in the body where it hurts. The reason is influenza or ague.

                      Of course, I am not sure if the name Galen is correct. I had a hard time finding another common sense word for those sounds in that location in the sentence. So I went with a name. But Galen seems to be a little too convenient.

                      If I could find a similar paragraph in another known book, it would make the translation more reasonable. Especially if there was a reference to a “ferocious air.” Not sure of the adjective of “ferocious” it could be “verifiable” instead.

                      Anyway, there it is. But with very small changes in some of the sounds, there might be large differences in translation. This is why I am uncertain of the outcome.

  4. MarcoP

    Considering the Romany hypothesis recently advanced by Derek, I noticed that in some variants of the Romany language the conjunction “and” is “thaj” / “haj” / “aj” (A Handbook of Vlax Romani, Slavica, 1995). Could this be related to the most frequent Voynichese word EVA:daiin?

    For example, in this short Council of Europe document, “thaj” appears 26 times. It is the third more frequent word after “te” and “i” (I attach an image of the last three paragraphs).

    According to Derek’s phonetics, EVA:daiin reads something like “tu”, while, according to Stephen’s 2104 paper “A proposed partial decoding”, it reads like “twaur” or “duur”. So, the sound does not seem very close to “thaj”. Yet there are two more details that could be relevant:

    1) The Romany conjunction can be repeated (“thaj thaj”, possibly with the meaning “and also”). The attached image includes an example. This is an analogy with EVA:daiin, that is often repeated.

    2) It as been noted that in the so-called Currier “language” B the most frequent word is not EVA:daiin, but EVA:aiin (voynichattacks.wordpress.com). Maybe the loss of the initial t could be related to the Vlax variants “thaj” and “haj / aj”.

    • Stephen Bax

      Hi Marco – as always you are ahead of the game! I had also noticed this and was preparing to write a post about it. I have long been on the track of a suitable conjunction to fit with the EVA:daiin word, and I too had noticed the Romany word. Of course we cannot on this basis say that the language is Romany, but it is interesting. I’ll get a move on and try to post my thoughts more fully.

      • MarcoP

        Thank you Stephen, it’s great to know that you are planning a new post!
        The Sanskrit alternative proposed by Derek also seems very promising. I guess that the Romany word could descend from the Sanskrit?

        • Derek Vogt

          I don’t know how to tell which post-Sanskrit Indic languages descended directly from Sanskrit and which ones descended from one of Sanskrit’s cousins. But Sanskrit’s cousins from the same time are generally considered to have been dialects or registers of a single language back then anyway, so their words like “tu” could have still been the same. So the descent of the Voynic word ^tw—^ from one of them would be essentially the same as descent from the identical Sanskrit word anyway.

          I would guess that the probability of the modern Romany word “thaj” being related to Sanskrit/Voynic “tu” is substantially above zero but less than 50%. The meaning fits, but it has only one sound in common out of three or four (“th” being an aspirated plosive, a single phoneme consisting of two distinct components), and short words with few sounds and slippery meanings like this are notorious for creating false cognates.

          English “and” is a good example. English and Latin descend from a common ancestor (PIE), and Latin “et” shares with it a similar vowel at the start and an alveolar plosive at the end, creating an environment where it would be easy for an alveolar nasal to either appear or disappear (both of which have been known to happen in similar environments elsewhere). So we could think they’re related. But they’re not. “And” came from a PIE word that meant “before, facing, opposite, front, forehead”, which ended up as “ante” in Latin. Latin “et” came from PIE for “beyond, over”, which ended up as a Middle English prefix “ed” for “around, circular, back, backward, again”, which fell out of use in Modern English except for the word “eddy” (swirling or turbulent flow in a stream). The only known PIE word to combine things like “and” and “et” do, “kʷe”, didn’t even go between the words it was combining, but at the end of the last one as a suffix; it has no English counterpart, but ended up as Latin “que”, as in “fratres sororesque” instead of “fratres et sorores”.

          When dealing with the kinds of words that pull stunts like that on us, in a language that we don’t know well enough to untie those knots, I’m short on confidence in any phonetic match that’s anything less than perfect. The repeatability of both Romany “thaj” and Voynic ^tw—^, and the “t”-dropping in some contexts, could be related even if the root words aren’t: if “thaj” was unrelated to the original “tu” but took over its job sometime in the last few centuries, then it would be expected to also take on its behavior.

          • MarcoP

            Thank you, Derek! Very clear explanation.

          • Neticis

            If it helps, “thaj” looks similar to Latvian “tā/tas”, which is feminine/masculine for “this”.
            For English “and” in modern Latvian “un” is used, which is influenced by German “und”. But in archaic Latvian (e.g. still preserved in folk songs) “i” was used, which is still used in Latgalian language (which shares words with Russian).

    • Derek Vogt

      Wow… by the time someone else here posted a link to this online Sanskrit dictionary, I hadn’t thought about ^tw—^ for a long time. But now that I put the two together…

      तु, pronounced “tu”, is Sanskrit for both “and” and “but”.

    • Derek Vogt

      It should be noted that the “j” in “thaj” is being used like a vowel, essentially a funny-looking “i”. I couldn’t be sure of this before because not all languages use the letter “j” the same way and there’s no standard Romany/Romani spelling system, so sometimes it’s safe to read “j” as “i” or “y” and sometimes it’s not, but now I know that Ronald Lee’s Kalderash Romany dictionary spells the same word “thai”. So, in assessing what other words might be cognates of this one, we actually have one less consonant-sound to deal with than it might look like.

      • Neticis

        In Latvian “j” is sometimes called “semi-consonant”, because it is spoken as “i” together with other consonants quite often (and sometimes even with other vowels).
        E.g. (writing — spelling): klajš — klaiš, zvejnieks — zveinieks, vējbakas — vēibakas, šuj — šui.

  5. Victoria Hastings


    My browsers are giving me warnings that one of the certificates at “www.voynichese.com” is no good and that the site may be used by hackers.
    Please remedy. The tool looks fantastic.


    • Stephen Bax

      Thank you, but note that that site is nothing to do with this one, and I have no idea who made it. However, I am sure it cannot harm your computer as it does not ask you to download anything.

  6. Darren Worley

    Stephen, you’ve provided us with a fine introduction, I agree that this the text on f116v is significant. What’s your interpretation of the mixing of Voynichese with German/Latin in the section of text on f116v? (see picture).

    It seems quite telling that the writer of f116v, is writing both scripts seemingly interchangeably.

    Its quite easy to imagine that this writer of f116v, might come from somewhere where Alemannic German speakers (Alemannic German) and Latin speakers might have mixed. Such a place would be in the border region between Germany and Italy – in and around what is now Southern Germany, Austria, Liechtenstein and Northern Italy. (Can anyone suggest anywhere else?).

    Southern Germany was the location of a large, little-known, lost “nation” during the Middle Ages. This was the Jewish Yiddish-speaking community. I suspect this Jewish element provide a clue for the Near Eastern imagery in an otherwise European-looking manuscript, and presence of the unusual Voynich script. The later destruction and migration of this community might explain the VM’s uniqueness.

    Below is an introduction to the early origins of Yiddish and Jews in the region around Bavaria, Austria, Bohemia, Moravia, and northern Italy.

    ref: http://www.jewishgen.org/databases/GivenNames/yiddish.htm%5D
    The initial growth of Yiddish began in Western and West-Central Europe. At the turn of the 9th century, Charlemagne (742-814) invited the Jews of southern France and Italy to the Rhineland to encourage economic growth. Jews had lived in the trading towns along the Rhine River long before, under the Roman Empire. Charlemagne’s initiative caused trade and economic life to develop rapidly in the Rhineland.

    Then, in the Early Yiddish Period tenth and eleventh centuries, Jews from northern Italy and northern France, who spoke Jewish Romance languages (Old French or Tsorfatic (Western Laaz), and Old Italian or Italkic (Southern Laaz)) migrated to Rhineland towns along the middle and upper Rhine Valley in an area called Loter (Lotharingia); this area is close to present-day Lorraine. It is from these Rhineland Jews that Yiddish originated. In their new surroundings, they adopted various medieval Germanic dialects of the region, mixing in their earlier Romance and Hebraic/Aramaic elements. They wrote their new language in Hebrew characters, from right to left.

    The German stock of words itself was affected by a peculiar mingling of elements from different German dialects. Thus, Old Yiddish and medieval German early parted ways as two separate languages. Somewhat later, Slavic elements from Czech, Polish, Ukrainian, and Russian were also introduced into the language.

    The collapse of the Babylonian academies took place during this Early Yiddish Period and many Babylonian teachers arrived at this time in Ashkenaz (the name used in Rabbinic literature for Germany), impacting nascent Yiddish.

    In later centuries, pogroms accompanying the crusades (1095-1272), the black plague (1334-1350), and persecution drove the Rhineland Jews up the Rhine River into Baden/Wuerttemberg in South Germany, where they began creating Yiddish given names based on German names. The first acts of Crusaders on their way to the Holy Land were to slaughter Jews in the Rhine valley. Therefore, in the Old Yiddish Period twelfth to fourteenth centuries, the Rhineland Jews escaped east to Bavaria, Austria, Bohemia, Moravia, and northern Italy, incorporating more German names into the Yiddish lexicon. During the High Middle Ages (1000-1492), Italy was the only European country where Jews were not persecuted en masse, and it was in Italy that the first dawn of the Renaissance mitigated the darkness of medieval barbarism.

    The passage, about the collapse of the Babylonian academies might give a clue to the transmission and appearance of Near Eastern imagery into a European context.

    The Jewish Babylonian academies or Geonim were the chief centers of Jewish learning over a 400 year period until approx. 1000CE. There were two major Geonic academies, one in Sura and the other in Pumbedita both in modern-day Iraq.

    Pumbedita was the name of a city in ancient Babylonia close to the modern-day city of Fallujah. Sura was a city in the southern part of ancient Babylonia.

    The composition of latter Yiddish – is also complex representing the two main language sources (German and Slavic), its vocabulary also contains more words from Hebrew, Aramaic, Latin, Turkic, French, Greek and more. Particularly characteristic is its use of diminuitive suffixes derived from both German and Slavic but developed further than either.

    [ref: Yiddish Civilisation: The Rise and Fall of a Forgotten Nation by Paul Kriwaczek ]

    Again, this might give a clue to the composition of Voynichese.

    One advantage I can see in this hypothesis, is that the VM was first reported in the Holy Roman Empire in Prague (I believe). If the VM was compiled or copied in this region, perhaps by an immigrant Babylonian teacher or latter pupils several generations later, that neatly explains its presence at this location, in the region around Bavaria, Austria, Bohemia, Moravia, and northern Italy. It also suggests why a European-style manuscript might contain atypical Near Eastern imagery, and the Jewish elements I’ve written about elsewhere.

    • MarcoP

      Hello Darren, I don’t think there were any native Latin speakers in the XV century. The only people who understood Latin were the learned: clergy, scientists, lawyers. I think that if 116v really is written in a mixture of German and Latin, any German speaking area could be a candidate.

      • Darren Worley

        Marco – I see your point. I meant native speakers of German and Romance languages (i.e. derived from Latin.) I had Italian primarily in mind.

        This doesn’t (and shouldn’t) affect my argument – the languages I’m referring to are those used in the areas bordering Germany and Italy, furthermore Yiddish contains elements of German and Latin (amongst others) presumably from the Old French and Old Italian from which it evolved. The development of Yiddish seems to have occurred in these border regions.

        It seems reasonable to assume that Old ltalian and Old French are closer to Latin than modern French and Italian.

        I know various contributors to this forum have recognized or proposed similarities of Voynichese words with Slavic and Turkic words, so I find proto-Yiddish a good candidate given its various linguistic borrowings.

        • There were no ‘native’ speakers of Latin in the sense that it was a vernacular language for them, but *every* literate person in Europe before the western Christian schism learned to read and write in Latin, with the book of Psalms (Psalter) as their basic text. So basically, every merchant, every teacher, everyone able to write letters for the illiterate could manage Latin and in that sense it was a ‘native tongue’. Even people who could not read and write would hear enough Latin to know the words of the Mass, and their meaning – by simple repetition as we learn a language when we’re children. So it wasn’t by any means an arcane language. It just came to seem so when national territorial units became emphasised. The idea that priests muttered in a language incomprehensible to the majority is simply not so, no matter how often you’ve heard that old furfy.

    • Darren Worley

      There seems to be a number of similarities between the VM page that contains the “extraneous” writing on f116v and a section found within a MS V.b.26(2) – a 16th century manuscript held in the Folger Shakespeare collection. An online copy can be viewed here.

      Its description says : “An eclectic anthology of spells and invocations with charts, magic circles, and descriptions and drawings of spirits, angels, and demons. Draws on the Solomonic tradition, with traces of the Lemegeton (including the Goetia), references to “Friar Bacon” (Roger Bacon), and set within a Christian framework. Multiple spells relate to deterring or catching thieves and curing or preventing sicknesses. Includes translations of Psalms 43, 47, 51, 54, 67, 121, 138, and 150 (p. 25-26).”

      I’ve attached a image below to make the comparison clearer. (Note that f116v contains some Voynichese script too)

      Both passages are seperated with the curious “+” signs. The MS V.b.26(2) was written between 1577-1583, and is described as “Book of magic, with instructions for invoking spirits, etc.”. Its in English but has words of Hebrew interspersed.

      A transcription of the passage from MS V.b.26(2) is available here on p231.

      Bgbenvtc wdeeobstc [“Against witchcraft”]

      That thy enemies shall not ouercom thee in thy cause, write on iii lourell leaues & bere them with thee, michaell + Gabriell + Raphaell + hbngc thcm bbrstc yrso ncekc [“hang them about your necke”]

      For axis or ague.

      Write this verse in a aple that is to say in 3 parts & let the sicke confesse himself to God, & the first day to eate one parte that is + in nomine patris + pater est vita vivens alpha & ? the second + et filii + filis est sapiencia patris geniti + Emanuell the thrird [sic] + Et spiritus sancte est amor + ab utroque precedens paraclitus Amen.

      Curiously, part of the text appears to be weakly encoded “hbngc thcm bbrstc yrso ncekc” with the plaintext written below “hang them about your necke”. One of the encoding rules seems to be substituting “a” with a “b”.

      There seems to be a number of similarities between the VM and the class of books known as Grimoires, the history and origin of which, I think will be useful for further research.

  7. First of all, that’s not “daiin.” There is no such word as “daiin.” That’s “som,” a staple Old Norse word used in Swedish and Danish and Norwegian and meaning, variously, depending on the context: as, like, which, who, that, and such as–which explains its frequency. It helps to have a workable transcription alphabet.

    Now let’s talk about why there is no punctuation but quite a bit of repetition. You are reading a piece of literature with its roots in the “loihtoluvut” or Karelian charm rune as well as the Sami joik. By singing a charm about the origin of a thing, the people of this region believed they were able to conquer that thing through the power of the spoken word. Quite a few characteristics indicate that the Voynich consists of such charm songs. For example, the text is:

    1. Largely trochaic, which is the meter of choice for such purposes.
    2. Intensely alliterative and repetitious, playing with sound and depth symmetry, seeming to have no beginning or end, very much like a joik.
    3. Accompanied by several pages of nothing but women involved in some sort of ritual, dancing and holding torcs, spoons or scoops, a drop spindle, a distaff, a seidr staff, and other ceremonial objects directly correlating to a north European traditional belief system such as that which the Karelians and their neighbors espoused for centuries.
    4. Numbered, in some cases, especially on one page that appears to be filled with such charm songs demarcated for various purposes.

    • Neticis

      It it is not required to be book of magic spells. For example, in Latvian most of words have accent on first syllable, thus trochee is the easiest meter. In pre-industrial time Latvian folk songs in trochee were spelled on the fly with any content as a kind of “ancient rap” competitions (and actually still happens today in modern rapper events).

      • WilliamF

        Is the Karelian Rune laula on the left in the pic above? It doesn’t look Karelian. Maybe I am seeing something different. The idea of the VMS as a joik on hyva, mutta, en mina get the impression that there is a connection to the VMS.

  8. David Harden

    Something that occurred to me looking at your discussion of prefixes is that, as a taxonomist I used prefixes to abbreviate text. A typical example is G. species. Sometimes a prefix fill-in missing info. For example, “Ansiollum bellea partea p. grandia” where the “p” refers back to an earlier discussion and represents something too long to include here. It is taxonomists shorthand and could mean something like “The group being discussed.”

  9. MarcoP

    Hello Stephen,
    EVA:iin looks so similar to the Latin “m” that I keep reading EVA:daiin as the Latin “tum”. Looking at the etymology of ‘tum’ I found that it is possibly related to the Sanskrit word tad which has a great variety of meanings (including “that”, “then”, “and”, “now”). Paralleling “daiin daiin”, it also appears in the repeated form “tad tad” meaning “this and that”, “various”, “different” (I am not sure of how “tad” is inflected when used as a conjunction -“and”- or adverb -“then”-).
    Could EVA:daiin be a relative of the Sanskirt “tad”?

    I also see that linguists studying Proto-Indo-European have used “tom” or “tam” as a conjunction. Do you know where did they get that word from?

    • Stephen Bax

      I’m not sure how ‘tad’ could be related etymologically to ‘tam’, as the two last sounds are so different. But it is certainly worth a look.

      I haven’t seen that use of ‘tom’ as a conjunction. Where did you see that?

    • Hello, Marco and Stephen! I think, that 8aiin can be a variant of albanian ‘fund’, meant ‘end’/’ending’. As I had already written, the language of Voynich have an direct relation to Kartvelian languages family. There was so lands, like ‘Iveria’, ‘Alvania’, before the turk expansion in Caucasus. These people had gone away from their lands and today we know that lands as ‘Iberica’ (Spain), ‘Albania’. In modern Georgia not only georgian language is. There are also Megrelian (Margalury), Svanetian (Svan), Lazian. They are considered to be an dialects of georgian, but they are not mutually understandable, they have a big difference in syntax with georgian. All of them haven’t their own script. (!)

    • MarcoP

      Thank you for your reply, Stephen!
      About the Proto Indo European “tom”, I misunderstood its occurrences in Schleicher’s fable: I am sorry. I think it is actually used as a pronoun.

      I originally found the connection between Latin “tum” and Old Indian “tad” on
      http://starling.rinet.ru/ (in the entry for Proto-IE: *to-).

      Proto-IE: *to-
      Meaning: pron. dem.
      Tokharian: B ta-ka, tkā ‘then’ (Adams 276), ta-ne, tne ‘here’ (278); tu ‘this one, it’ (n.) ( < *od-u?, Adams 299); B te 'this one, it' (n.) (303)
      Old Indian: tád 'this, that', acc. tám, tā́m, tád
      Avestan: tat_ 'das', acc. tǝm, tām, tat_; t-m 'dann'
      Armenian: -d; da 'dieser', doin 'derselbe', the, ethe 'dass, wenn'
      Old Greek: to- demonstrative stem: tóthen, tóthi, tôi̯o- etc.; tó, acc. tón, f. acc. tǟ́n, etc. (article)
      Slavic: *tъ, *to, *tā
      Baltic: *ta-, *tā̂ f.
      Germanic: *ɵẹ̄-z, *ɵiō, *ɵa-ta, etc.
      Latin: istum, -tam, -tud; tam 'so', tum, tunc 'dann, alsdann', topper 'cito, fortasse, celeriter, tamen'

      @Sergei: I find what you write very interesting: I did not know of the existence of kartvelian languages. I also agree that f77r can help us understand more of the elements. I had not noticed that illustration before reading about it in a comment by Ellie Velinska.

  10. i may verify but i think the similiraties you noticed don’t comes from a ponctuation but from graphics disposition on the other side of the sheet… but it can be mixed with some ponctuations, codes are so mysterious…

    • Stephen Bax

      Thanks Eric. I must confess that I do not fully understand your idea about graphics, even after looking at your website!

      • look close spaces, you’ll see the major of then are the borders of the draft on the other side… so that means either it’s a coder who used draft to elaborate text, either it’s an artisan (like sawerin) who reported details of the draft to sculpt an object made of wood or stone, in this case, we never find a literal meaning to the code… when i defeat, i think there’s 70% chances that is only artisanal code. Butuse sp acesdif ferently doe sn’tm eanth at ithas nosense.
        Here can be the style of VM’s word or rather: 0Butuse9 0sp9 1sp2 3acesdif4 5ferently6 7doe8 6s’tn3…. and i don’t tell if the frontier number is compacted with the letter of text. The fact is that a space or a “o” is always put in text when a line crosses on the other side : you may verify, that’s true.

        • For see whether i can extract something from page 25, i enhanced the page 24 behind, and that gives http://echapfr.files.wordpress.com/2014/05/page25-24.png

          i think we can clearly see a halo of spaces around the central blue flower… but despitly, nothing convincing for the scheme “8am”… for the moment.

          • do you see it better with this little illustration ? i know how to calculate the correlation of space, bit i’m a little bit lazy, and perhaps the visual vérification is enough… plus someone will say that mathematicals expressions don’t mean anything and we can make say to statistics all we want (and it’ll not be so false…)…honnests people will recognize i tell true… and the truth always finishes to win !!!
            perhaps i’m too convinced i’ve the right key-code for draw, but not for the text now. (/n -g is often only a hook but can be added text-code and give /t…)

          • To be more clear, i used the page 3 you may love and apreciate with centaura : https://echapfr.files.wordpress.com/2014/05/centaura01.png

            nobody can contest that the lines of Nymphea cuts the text-line and generate a space… so the word “centaura” which has chances to be there may be perhaps found by eliminating the graphics signs (the spaces and some signs before and after)

            something tells me that you are not so wrong as people will think…
            (Phig)8an(g) can be pronounced [Cen]
            (gPhi)Ro(x) can be pronounced [Tau]
            (8)ar(d) can be inverted in [Ra]
            i’m sorry to use my personnal symbols but you certainly recognized the third first words… and on this case, i admit there are lot of chances i’m wrong…but as you, i try and better explanations will certainly come….

  11. Marco

    Hello Stephen,
    EVA character “q” appears almost exclusively as a prefix (this could suggest that it is some kind of abbreviation, since, at least in Latin, abbreviations often were at the beginning or at the end of words). What I find puzzling is also that “q” only rarely appears immediately after “daiin” (here daiin in yellow and q-starting words in blue):


    It seems to me that star names and other labels in the astrological and cosmological diagrams never start with “q”: I think this could suggest that the function of the character is not only phonetic, but possibly related at a specific language function. Do you think there could be some kind of relation between “q” and “daiin”?

    • Stephen Bax

      This is an intriguing suggestion. It reminds me of the Latin -que for ‘and’, as a suffix. It would be interesting to investigate the distribution further.

      Do I understand that you think ‘daiin’ and ‘q’ might both be sorts of commas or conjunctions, the choice of one or the other depending on the preceding or following word?

      • Stephen Bax

        Note, though, that it need not be an abbreviation. Some languages (e.g. Semitic languages) often use one letter prefixed particles for a variety of functions, e.g. the letter ‘waw’ means ‘and’ when prefixed, and ‘l’ as an attached prefix means ‘to’.

        • Marco

          Hello Stephen,
          I don’t think I can make a specific hypothesis about the function of ‘q’.

          ‘q-words’ are about 14% of all words. So, while the percentage of ‘daiin’ is similar to that of the English ‘and’ (2.5 % ca), ‘q’ seems too frequent to be a conjunction.

          ‘q-words’ often appear in sequences. 6 consecutive words here:
          I also think you are right about ‘q’ not being an abbreviation: usually, removing the initial ‘q’, one obtains a regular Voynichese word. ‘q’ is likely just a prefix attached in front of standard words, similarly to the Semitic examples you mentioned. By the way, can ‘waw’ be prefixed to any word, or is that type of conjunction only restricted to some word classes? Does the prefix ‘waw’ occur in lists of items?

          I have also analyzed a few sequences based on other prefixes: cho-, qo-, o- and no prefix. qo- and o- prefixes, for the same ‘root’, are the most frequent, with similar numbers of occurrences:
          chokaiin(17) qokaiin(262) okaiin(212) kaiin(65)
          chokchy(16) qokchy(69) okchy(39) kchy(30)
          chopchedy(2) qopchedy(32) opchedy(50) pchedy(34)
          chotal(9) qotal(59) otal(143) tal(20)
          choteey(7) qoteey(42) oteey(200) teey(20)
          choto(3) qoto(3) oto(10) to(2)
          I have no idea what to make of this.

          With respect to the recipe in f116v, I found this very interesting post by Ellie Velinksa:

          It provides a number of examples of medical/magical texts in which the words of a charm are separated by crosses (like lines 2 and 3 of Voynich f116v). It is also noteworthy that in a few cases the main text and the charm are in different languages (e.g. The Hunterian Psalter, 12th/13th century, Glasgow Univers. Libr., Sp Coll MS Hunter U.3.2: text in Latin and charm in “a corrupted form of Anglo-Saxon”; Zurich, Zentral Bibliothek, MS C 101 fol. 91: text in German and charm in Latin!). These two features (the crosses and the different languages) are so exceptional that in my opinion their co-occurence in f116v can hardly be accidental.

          • Stephen Bax

            Hi – you ask “By the way, can ‘waw’ be prefixed to any word, or is that type of conjunction only restricted to some word classes? ”

            I cannot think of a word class in Arabic which CANNOT take the ‘waw’ as a prefix, meaning ‘and’!

      • I think, EVA-q is [v]. In mingrelian lang prefix va- shows negation. For example:
        miork – I love; vamiork – I don’t love. moko – I want; vamoko – I don’t want.

    • Derek Vogt

      The only few places I’ve seen where “q” doesn’t look like it’s clearly the first letter of a word are rather long streams of letters crammed together without spaces, as if they were meant as separate words but the spaces just got squished too much. Are there any places at all where the nearby spacing makes it more clear that the “q” really is meant to be in the middle of a word?

  12. Marco

    Hello Stephen,
    thank you very much for your clarifications and for posting the 2012 paper!

    My comment here is not really related to the punctuation problem: I have checked again Albus’ transcription and I think that a couple of words read differently. This does not affect much the possible meaning of the paragraph (which in any case is very obscure, I think Albus has made a great job in extracting a coherent meaning from what can be read of the last page).

    The word that Albus transcribes “olei” looks like “ola” to me. I think it could be “ol[e]a” (oils) so the meaning is almost completely unaffected.

    The second variation is maybe more interesting. What Albus transcribes as “alt[e]ra” is here transcribed “abta”:

    I certainly agree that the second letter of the word is a “b” (comparing it with daBas and pBrey):
    I think I can see a slanted “dot” over the third letter, as in the immediately preceding “vix”. So I think the word is “abia”. That word is not Latin, but it is close to the ancient Italian (and Spanish?) “habia”, corresponding to the first, second and third singular person subjunctive of “havere” (to have).

    Curiously, “abia” is a Romanian word which apparently derived from the Latin “vix” and which has the same meaning (“hardly”, “only just”):

  13. Marco

    Hello Stephen,
    thank you very much for this great post!

    Another manuscript without punctuation is the 15th Century Greek herbal, eg f11r:
    The fact that these texts were nevertheless understandable certainly depends on the presence of other structure markers, like those you so clearly analyze.

    It seems to me that a few of the points you discuss in this post are features of the Voynichese language, while others are aspects of the script. If I understand correctly, the ‘hal’ Arabic interrogative particle also is an aspect of the language, like the English “do” in “(do) you live in Europe (?)”: so it is part of the language (spoken as well as written).
    Also your discussion of “daiin” is language related. I find your hypothesis of “daiin” meaning “and” very likely and reasonable.
    If your 2012 paper is available online, I would love to read the complete version. Could you please provide a link? I would like to read more of your very interesting speculation about verbs in the second person, numbers etc in the herbal recipes. This linguistic approach seems to me more difficult than working on single words and the phonetic value of single letters, but I guess that the possibility of understanding the manuscript depends on considering all aspects together. And it’s great to see a structured analysis of a page like f25v, which at first sight appears a complete mystery!

    I am sorry that a more detailed presentation of Albus’ work is not available. I would like to see the examples from ancient sources he proposes.
    I think that the structure of the last page can also be compared to the structure of herbal recipes such as that we discussed here:
    In both cases, the first words give the name of the main ingredient and the affection it cures:
    * Garlic [unreadable word] its leaf heals wounds
    * Billy goat´s liver for wet rot

    Of the points you discuss, what I find more puzzling are the “specific signs” “p” and “f”. Are they just elements of the script (like the line breaks you mentions in point 1) or do they belong to the language? In other words, do you think these symbols have a phonetic value, or are they just prosodic markers like punctuation in current writing? Uppercase initials come to my mind as a possible analogue: are those two characters uppercase versions of other characters?

    I am sure that http://www.voynichese.com/ will be extremely helpful in understanding the structure of the language and in validating new hypotheses and ideas. The task of understanding the manuscript is frightening, but your sound methodology and the availability of tools like voynichese.com will surely give their results in time!

    • Stephen Bax

      Thanks Marco. Here are some responses:

      a) The Greek manuscript is fascinating. I have asked a Greek friend to comment….
      b) The interesting thing about the ‘hal’ particle in Arabic is that is now semantically empty, so it is not quite like ‘do’ in an English question, which still has semantic import. It is essentially reduced to being just a discourse marker. My suggestion is that some parts of the Voynich script could act in that way.
      c) I have now put my 2012 paper online for you to see, though I see many things I would now change!
      d) Albus – yes a great pity we don’t have a fuller version of his paper. I spoke to him during a taxi ride at the Voynich 100 conference and I understand that his paper is based on a thesis… I’m not sure if it is available in Italy?
      e) I agree 100% with your idea about possible structures of the herbal pages as:
      * Garlic [unreadable word] its leaf heals wounds…
      This seems to fit completely with the structure of the entries in the 15th C Armenian herbal text I have discussed here
      f) re. the “specific signs” “p” and “f”. I am in two minds about this. They occur so frequently they seem to be mere markers, but yet in other places they seem to be in positions which suggest more.
      g) I agree that http://www.voynichese.com is a wonderful tool, but just one caveat: if, as I suspect, words in the language have even slight inflections, then a search for a particular word will obscure those inflected variants and might mislead. So it is useful to use the * sign when you search , and check out words which might be similar/variants to the word you are looking for.

      Thanks as always for your thoughtful contributions

    • Voynich Manuscript belonged to a Roma Sindh Mahajan and is in Landa Khoji scripts.

