Family History MyHeritage Launches Book Matching By Esther April 7, 2016 Share Share Copy Link We’re excited to announce the release of a revolutionary new technology — Book Matching — perhaps our best technology yet. Book Matching automatically researches individuals found in family trees on MyHeritage in our vast collection of digitized historical books. Unique to MyHeritage, the innovative new technology uses semantic analysis to understand every sentence in every page in the digitized books, in order to find matches with very high accuracy. Book Matching has already produced over 80 million new matches for our users! Every match is a paragraph from a book specifically about the person in the family tree, providing direct access to that paragraph and the ability to browse through the rest of the book. With Book Matching, you’ll discover fascinating family information that you would not find otherwise. You may even discover new relatives and ancestors. Use this information to expand your family tree and add color to it. By way of background, we first launched SuperSearch™, our search engine for historical records, in 2012. In December 2015, the collection of digitized historical books was added to SuperSearch™. Very recently, we’ve tripled the books in the Compilation of Published Sources from 150,000 to 450,000 books, with a total of 91 million pages. We’ve assembled a team of hard-working curators and plan to add hundreds of millions of additional pages of digitized books to the collection each year. The challenge Books have always been one of the best sources for family history research, but searching them efficiently has been nearly impossible. Even after books are photographed and converted to digital, searchable text using optical character recognition (OCR), they always used to require a big investment of time and willingness to wade one’s way through endless false positives. For example, if you have a Richard Thomas in your family tree, doing a text search in books would find results for people called Richard, or Thomas, with no regard to first or last name. Even if a Richard Thomas were found, it would likely not the one that you were looking for. There is no way to find the exact Richard Thomas that you are looking for, such as the Richard Thomas born in Virginia in the early 1940s who married a Wilma Griffith. Book Matching to the rescue Our Book Matching technology overcomes these difficulties by automatically understanding narrative describing people in the historical books, including names, events, dates, places and relationships, and matching it with extremely high accuracy and speed to the 2 billion individuals in the family trees on MyHeritage; and this is repeated automatically as you grow your tree and as we add more books. A daunting task made simple Extracting genealogical information from books is not a simple task. In structured documents, such as birth certificates or census records, it is very clear what type of information is presented in the data you encounter. It is clear where to find surnames, birth dates, and so on. On the other hand, in unstructured free-text data, like digitized historical books, facts such as birth dates, locations, and death dates can be written in many different ways and varying contexts and the information has no designated location or order. While general phrases like “death,” “died,” “passed away,” can all refer to a person’s death, so can less commonly-used phrases such as “expired,” “ended his earthly career,” or “summoned to the home beyond.” We currently have a huge number of rules just for detecting expressions describing death! Books often do not refer to a person by a full name; for example, a paragraph can mention a woman by her first name and then continue to name and describe her father – specialized technology is needed to follow this and piece it together. We have worked hard to build numerous algorithms to harvest family history information from books. These have been tested and tweaked, iterated and perfected to ensure a high level of accuracy, and to gather as much information as possible from the books. In the process, we also successfully overcame millions of OCR errors and fixed them. For example, if the OCR process thought that a person was born in “]\lay”, we understand that it’s really May, “Apnl” is really “April”, and so on. Currently, some books in the collection of digitized books are duplicated because they were contributed to the public domain multiple times by different groups. Nobody was able to figure out that some of them are redundant. We are currently putting the finishing touches to specialized technology that is able to de-duplicate the books. Shortly, once we complete this work, most of the duplicate matches will automatically disappear. Book Matching in Action We recently showed some of the leading genealogy bloggers (or geneabloggers as they’re sometimes known in the genealogy community) their Book Matches, so they could see first-hand the matches found for their own family trees. Dick Eastman of Eastman’s Online Genealogy Newsletter has been researching his family history for years. He has about 2780 people in his family tree on MyHeritage, and he received about 500 book matches. The majority of the information in the Book Matches was new to him. For example, Elizabeth Fifield, Dick’s direct ancestor (8 generations)’s aunt, appeared in his family tree with only birth and death dates, and siblings. An automatic Book Match was found for Elizabeth in the book “Genealogical and personal memoirs relating to the families of the state of Massachusetts; by Cutter, William Richard, 1847-1918,” a source that Dick Eastman may never have thought to examine himself. The excerpt below is the section that was found by MyHeritage. The exciting new information here lists Elizabeth’s husband, and other historical information about him and his family, such as their six children and their dates of birth, information that Dick did not previously have and that he can now add to his family tree, and add a complete line to his family tree. Lifelong genealogist Randy Seaver of Genea-Musings has more than 40,000 people in his family tree on MyHeritage. With a whopping 17,323 20,609 Book Matches, he is now able to glean a mountain of new information about people in his family tree! For example, Randy has a relative, William Seaver Woods, in his family tree with a birth date, and he is listed as unmarried. In the yearbook “Alumni Record of Wesleyan University, Middletown, Connecticut, 1921”, MyHeritage found a perfect match for William. William happened to study in this university, and the page lists his achievements, and mentions that he had a wife and child, both of whom are missing in Randy’s tree. Note that their son, Robert, used the surname of Crombie coming from his mother Grace. Since Robert didn’t use the Seaver or Woods surname, Randy may not have discovered him without this gem. Now Randy has a fresh lead. He can research this family line and bring it to the present when it was previously a dead-end. Leland Meitzler of Genealogy Blog has imported his family tree of 5106 people to MyHeritage. He received 694 Book Matches. Leland was notified about a match for Elisha Mills in his family tree, found in the book “A Walloon Family in America: Lockwood de Forest and His Forbears 1500-1848,” (1914). The match adds Elisha’s parents and describes his accomplishments during the Revolutionary War. Finally, Pat Richley of Dear Myrtle also received some Book Matches. Thomas Wasden, Pat’s great-great-grandfather, was previously shown in the tree with basic information, including dates and places. A match for Thomas was automatically found in the book “Colonial Families of Philadelphia by Jordan, John Woolf, 1840-1921” (1911). The match included a photograph of him from the 19th century. What a great find that Pat can now add to her family tree. The geneabloggers were blown away by these exciting, never-before-seen matches, which add valuable information to their family trees. Literally no false positives were encountered. If Book Matching can introduce such an enormous amount of new data to seasoned genealogists who have been researching their family history for decades, you can imagine how helpful Book Matching can be for you and almost every user of MyHeritage. The Compilation of Published Sources collection is free to access. Viewing Book Matches requires a MyHeritage Data subscription. What’s next? Book Matching is currently available for English books only, but the technology will soon be enhanced to cover other languages. We’re continually expanding our repository of digitized historical records, facilitating easier family history research. We expect the corpus of digitized books on MyHeritage to be doubled soon. We will be adding amazing genealogy books from all over Europe, in all major European languages! How do you know if you have Book Matches? Simply log in to your family site and check your Record Matches via the Discoveries menu, or check your inbox for your Record Match emails – these will be sent to our users as of the next few days, delivering newly found matches. Any match you receive from a book is made possible by this new technology. New to MyHeritage? Sign up from the homepage and upload your family tree as a GEDCOM file, and benefit quickly from Book Matching — which is exclusive to MyHeritage. You’ll be amazed at the value of books and gain a new appreciation for them as a genealogical resource with the new Book Matching technology. Enjoy!