The master places a piece of paper in front of the candidate and orders him to put on a pair of eyeglasses. “Read,” the master commands. The candidate squints, but it’s an impossible task. The page is blank.
The candidate is told not to panic; there is hope for his vision to improve. The master wipes the candidate’s eyes with a cloth and orders preparation for the surgery to commence. He selects a pair of tweezers from the table. The other members in attendance raise their candles.
The master starts plucking hairs from the candidate’s eyebrow. This is a ritualistic procedure; no flesh is cut. But these are “symbolic actions out of which none are without meaning,” the master assures the candidate. The candidate places his hand on the master’s amulet. Try reading again, the master says, replacing the first page with another. This page is filled with handwritten text. Congratulations, brother, the members say. Now you can see.
For more than 260 years, the contents of that page—and the details of this ritual—remained a secret. They were hidden in a coded manuscript, one of thousands produced by secret societies in the 18th and 19th centuries. At the peak of their power, these clandestine organizations, most notably the Freemasons, had hundreds of thousands of adherents, from colonial New York to imperial St. Petersburg. Dismissed today as fodder for conspiracy theorists and History Channel specials, they once served an important purpose: Their lodges were safe houses where freethinkers could explore everything from the laws of physics to the rights of man to the nature of God, all hidden from the oppressive, authoritarian eyes of church and state. But largely because they were so secretive, little is known about most of these organizations. Membership in all but the biggest died out over a century ago, and many of their encrypted texts have remained uncracked, dismissed by historians as impenetrable novelties.
It was actually an accident that brought to light the symbolic “sight-restoring” ritual. The decoding effort started as a sort of game between two friends that eventually engulfed a team of experts in disciplines ranging from machine translation to intellectual history. Its significance goes far beyond the contents of a single cipher. Hidden within coded manuscripts like these is a secret history of how esoteric, often radical notions of science, politics, and religion spread underground. At least that’s what experts believe. The only way to know for sure is to break the codes.
In this case, as it happens, the cracking began in a restaurant in Germany.
FOR YEARS, Christiane Schaefer and Wolfgang Hock would meet regularly at an Italian bistro in Berlin. He would order pizza, and she would get the penne all’arrabbiata. The two philologists—experts in ancient writings—would talk for hours about dead languages and obscure manuscripts.
It was the fall of 1998, and Schaefer was about to leave Berlin to take a job in the linguistics department at Uppsala University, north of Stockholm. Hock announced that he had a going-away present for Schaefer.
She was a little surprised—a parting gift seemed an oddly personal gesture for such a reserved colleague. Still more surprising was the present itself: a large brown paper envelope marked with the words top secret and a series of strange symbols.
Schaefer opened it. Inside was a note that read, “Something for those long Swedish winter nights.” It was paper-clipped to 100 or so photocopied pages filled with a handwritten script that made no sense to her whatsoever:
Arrows, shapes, and runes. Mathematical symbols and Roman letters, alternately accented and unadorned. Clearly it was some kind of cipher. Schaefer pelted Hock with questions about the manuscript’s contents. Hock deflected her with laughter, mentioning only that the original text might be Albanian. Other than that, Hock said, she’d have to find her own answers.
A few days later, on the train to Uppsala, Schaefer turned to her present again. The cipher’s complexity was overwhelming: symbols for Saturn and Venus, Greek letters like pi and gamma, oversize ovals and pentagrams. Only two phrases were left unencoded: “Philipp 1866,” written at the start of the manuscript, and “Copiales 3” at the end. Philipp was traditionally how Germans spelled the name. Copiales looked like a variation of the Latin word for “to copy.” Schaefer had no idea what to make of these clues.
She tried a few times to catalog the symbols, in hopes of figuring out how often each one appeared. This kind of frequency analysis is one of the most basic techniques for deciphering a coded alphabet. But after 40 or 50 symbols, she’d lose track. After a few months, Schaefer put the cipher on a shelf.
THIRTEEN YEARS LATER, in January 2011, Schaefer attended an Uppsala conference on computational linguistics. Ordinarily talks like this gave her a headache. She preferred musty books to new technologies and didn’t even have an Internet connection at home. But this lecture was different. The featured speaker was Kevin Knight, a University of Southern California specialist in machine translation—the use of algorithms to automatically translate one language into another. With his stylish rectangular glasses, mop of prematurely white hair, and wiry surfer’s build, he didn’t look like a typical quant. Knight spoke in a near whisper yet with intensity and passion. His projects were endearingly quirky too. He built an algorithm that would translate Dante’s Inferno based on the user’s choice of meter and rhyme scheme. Soon he hoped to cook up software that could understand the meaning of poems and even generate verses of its own.
Knight was part of an extremely small group of machine-translation researchers who treated foreign languages like ciphers—as if Russian, for example, were just a series of cryptological symbols representing English words. In code-breaking, he explained, the central job is to figure out the set of rules for turning the cipher’s text into plain words: which letters should be swapped, when to turn a phrase on its head, when to ignore a word altogether. Establishing that type of rule set, or “key,” is the main goal of machine translators too. Except that the key for translating Russian into English is far more complex. Words have multiple meanings, depending on context. Grammar varies widely from language to language. And there are billions of possible word combinations.
But there are ways to make all of this more manageable. We know the rules and statistics of English: which words go together, which sounds the language employs, and which pairs of letters appear most often. (Q is usually followed by a u, for example, and “quiet” is rarely followed by “bulldozer.”) There are only so many translation schemes that will work with these grammatical parameters. That narrows the number of possible keys from billions to merely millions.
The next step is to take a whole lot of educated guesses about what the key might be. Knight uses what’s called an expectation-maximization algorithm to do that. Instead of relying on a predefined dictionary, it runs through every possible English translation of those Russian words, no matter how ridiculous; it’ll interpret as “yes,” “horse,” “to break dance,” and “quiet!” Then, for each one of those possible interpretations, the algorithm invents a key for transforming an entire document into English—what would the text look like if meant “break dancing”?
The algorithm’s first few thousand attempts are always way, way off. But with every pass, it figures out a few words. And those isolated answers inch the algorithm closer and closer to the correct key. Eventually the computer finds the most statistically likely set of translation rules, the one that properly interprets as “yes” and as “quiet.”
The algorithm can also help break codes, Knight told the Uppsala conference—generally, the longer the cipher, the better they perform. So he casually told the audience, “If you’ve got a long coded text to share, let me know.”
Funny, Schaefer said to Knight at a reception afterward. I have just the thing.
A COPY OF THE CIPHER arrived at Knight’s office a few weeks later. Despite his comments at the conference, Knight was hesitant to start the project; alleged ciphers often turned out to be hoaxes. But Schaefer’s note stapled to the coded pages was hard to resist. “Here comes the ‘top-secret’ manuscript!!” she wrote. “It seems more suitable for long dark Swedish winter nights than for sunny California days—but then you’ve got your hardworking and patient machines!”
Unfortunately for Knight, there was a lot of human grunt work to do first. For the next two weeks, he went through the cipher, developing a scheme to transcribe the coded script into easy-to-type, machine-readable text. He found 88 symbols and gave them each a unique code: became “lip,” became “o..,” became “zs.” By early March he had entered the first 16 pages of the cipher into his computer.
Next Knight turned to his expectation-maximization algorithm. He asked the program what the manuscript’s symbols had in common. It generated clusters of letters that behaved alike—appearing in similar contexts. For example, letters with circumflexes () were usually preceded by or . There were at least 10 identifiable character clusters that repeated throughout the document. The only way groups of letters would look and act largely the same was if this was a genuine cipher—one he could break. “This is not a hoax; this is not random. I can solve this one,” he told himself.
A particular cluster caught his eye: the cipher’s unaccented Roman letters used by English, Spanish, and other European languages. Knight did a separate frequency analysis to see which of those letters appeared most often. The results were typical for a Western language. It suggested that this document might be the most basic of ciphers, in which one letter is swapped for another—a kid’s decoder ring, basically. Maybe, Knight thought, the real code was in the Roman alphabet, and all the funny astronomical signs and accented letters were there just to throw the reader off the scent.
Of course, a substitution cipher was only simple if you knew what language it was in. The German Philipp, the Latin copiales, and Hock’s allusion to Albanian all hinted at different tongues.
Knight asked his algorithm to guess the manuscript’s original language. Five times, it compared the entire cryptotext to 80 languages. The results were slow in coming—the algorithm is so computationally intense that each language comparison took five hours. Finally the computer gave the slightest preference for German. Given the spelling of Philipp, that seemed as good an assumption as any. Knight didn’t speak a word of German, but he didn’t need to. As long as he could learn some basic rules about the language—which letters appeared in what frequency—the machine would do the rest.
WHILE HIS FAMILY got ready for spring vacation—a “history tour” of the East Coast—Knight looked for patterns in the cipher. He saw that one common cipher letter, , was often followed by a second symbol, . They appeared together 99 times; a frequently came after: .
Knight reviewed common German letter combinations. He noticed that C is almost always followed by H, and CH is often followed by T. This sequence is used all the time in German words like licht (“light”) and macht (“power”). , Knight guessed, might be cht. It was his first major break.
During his vacation, as his daughters played on their iPads at night in the hotel room, Knight scribbled in his orange notebook, tinkering with possible solutions to the cipher. So far what he had was a simple substitution code. But that left scores of cipher symbols with no German equivalent.
So one evening Knight shifted his approach. He tried assuming that the manuscript used a more complex code—one that used multiple symbols to stand for a single German letter.
Knight put his theory to the test. He assumed, for example, that , , and all stood for I. It worked. He found others, and soon he started assembling small words, like or der(“the” in German), which Knight recognized from World War II movies. Then he got his first big word: , or candidat, followed by , or antwortet (“the candidate answers”). The cipher’s wall of secrecy was crumbling.
But some of the cipher’s symbols—especially iconic ones like , , and —remained baffling. Worse, he couldn’t get German translations for any of the cipher’s standard Roman letters.
On March 26, Knight reviewed his notebook. The words of his first phrase—Der candidat antwortet—were separated by an and an . That made no sense if the coded and stood for German letters. That’s when Knight realized how wrong his initial assumption had been. The unaccented Roman letters didn’t spell out the code. They were the spaces that separated the words of the real message, which was actually written in the glyphs and accented text.
to read the rest of the article, go to: https://www.wired.com/2012/11/ff-the-manuscript/