Main contributor: Thomas MacEntee

Artificial intelligence (AI) has been available for quite a few years, but only recently its capabilities have been applied to genealogical research.

Research your ancestors on MyHeritage

What is Artificial Intelligence

How to Use AI in Genealogical Research - AI-generated image
How to Use AI in Genealogical Research - AI-generated image

Artificial Intelligence (AI) refers to computer-based systems capable of mimicking human intelligence. These systems are designed to perform tasks traditionally requiring human intervention.

For example, an AI system might respond to a customer service query on a company’s website. While this may seem straightforward, involving a simple look-up and response, AI can ask additional questions to provide a more precise answer. Over time, it learns from user interactions to enhance future responses.

Another illustrative example is using ChatGPT—an advanced AI platform—to compose a genealogy-themed poem in the style of Shakespeare, Keats, Robert Frost, or Maya Angelou. This task, demanding extensive knowledge of poetry and the distinct styles of these authors, would take a human considerable time. AI accomplishes it in seconds.

Key features of AI include “deep learning” and “generative AI.” Deep learning emulates the human brain by identifying patterns across vast datasets, enabling it to interpret photos, audio, and text. Generative AI creates new content—photos, audio, text—based on user input, drawing from its training data to match the query accurately.

AI platforms

Genealogy vendors like MyHeritage are integrating AI into their user offerings, but several popular AI platforms are publicly accessible:

  • ChatGPT: Meaning “Chat Generative Pre-trained Transformer,” ChatGPT is the most popular publicly accessible artificial intelligence platform.
  • Claude: Claude by Anthropic is an AI chatbot developed by Anthropic, an AI safety and research company. Named presumably after Claude Shannon, the father of information theory,
  • Copilot: Copilot is an AI-powered intelligent assistant that helps get answers and inspirations from across the web, supports creativity and collaboration, and helps focus on the task at hand.
  • Gemini: Developed by Google, Gemini describes itself as “a family of AI models developed by Google's AI research labs DeepMind and Google Research. Gemini is Google's largest and most flexible AI model, able to run on data centers and mobile devices.”

Current uses of AI by genealogy vendors and others

AI is already benefiting genealogists in several ways:

  • 1950 United States Census Population Schedule - AI Transcription
    1950 United States Census Population Schedule - AI Transcription
    Family photos: MyHeritage has been offering a variety of photo enhancement tools over the past three years including ways to colorize images and make them clearer[1]. In addition there are tools that can “animate” an ancestor based on a photo and even help determine the date of an image based on characteristics such as fashion styles, hair styles, and more.
  • Transcription: The National Archives and Records Administration (NARA) in conjunction with Ancestry and FamilySearch used artificial intelligence to index the 1950 US Census population schedules released in April 2022[2]. Entries made by enumerators were scanned and transcribed then released for use at a much faster rate than what was accomplished with manual indexing performed for the 1940 US Census release in 2012. For the 1950 US Census, users were encouraged to review the transcriptions and submit corrections as part of a community effort by genealogists and other researchers.
  • Searching for and suggesting records: MyHeritage and other genealogy platforms have been listing “related” or “suggested” records in the sidebar of the webpage when a user is viewing a record as part of a search. In addition “hints” will often pop up suggesting records and family trees that a researcher might want to review due to similarities in data.
  • DNA matches: With over 30 million people having used personal DNA testing kits, 23andMe, AncestryDNA, FamilyTreeDNA, and MyHeritage all leverage AI to find connections between testers based on shared DNA data. Given the sheer amount of information involved, these match results are only possible with artificial intelligence[3].
  • Social history: AI can be used to determine how an ancestor lived. Using a prompt like "social history Huguenots in New Paltz, New York 1600s" generates a detailed description of daily life for the settlers of New Paltz, New York who arrived from France around 1675. This information can be incorporated into genealogical research, especially when creating family history books to share with family and friends.
  • Research assistance: Copies of specific records cannot currently be generated by AI platforms. However, a prompt such as "How do I find records about a German ancestor who arrived in New York City in 1881?" can create a research "to do list" and offer guidance as to possible record sets and how to access them.
  • Translation: Translation of text from one language to another can be handled quickly by most AI platforms. Caution should be used however when using such AI-generated translations since AI is often "too literal" and cannot pick up the nuances of the language known to a native speaker.

Copyright and AI

AI and Copyright
AI and Copyright

Can content that is created by artificial intelligence based on a query be copyrighted? If asking ChatGPT to generate an image of what an ancestor who fought in the American Revolutionary War might look like, can the AI-generated image be copyrighted? Who owns the resulting image?

Currently, courts in the United States have stated that AI-generated content cannot be copyrighted since there is no human author. Much like the case of the “Macaque monkey selfie[4]” where a monkey took a selfie photograph using equipment set up by a British photographer, there is no “consent” involved. Animals cannot give consent or enter into a legal agreement so it was determined that the resulting image was copyright free. The courts are using the same method to determine who owns that ancestor photo generated using artificial intelligence.

AI and Source Citations

Those new to genealogy and family history soon learn the importance of source citations in proving relationships as well as facts about an ancestor. Source citations document how genealogists find and use records such as census population schedules, death certificates, and even letters or diaries.

Currently, records useful for genealogical research cannot be located when making queries on AI platforms. However, information that serves as a clue for further research or social history about how an ancestor lived are more likely to be found. In these situations, a method of citing AI-generated content is needed.

For AI content, here is a formula proposed by the Modern Language Association of America (MLA):

“[QUERY]” prompt. [NAME OF AI PLATFORM], [DATE OR VERSION OF PLATFORM], [NAME OF AI COMPANY], [DATE OF QUERY], [PLATFORM URL]

So, for a ChatGPT prompt to determine the value of my great-grandfather’s home in the 1930 US Census listed as $80,000 in 2024 dollars, here is a possible source citation:

“Value of home in the 1930 US Census listed as $80,000 in 2024 dollars” prompt. ChatGPT, ChatGPT 3.5 version, OpenAI, 1 October 2023, https://chat.openai.com/.

References

  1. Levy, Daniella, MyHeritage Photo Features: What They Are & How to Use Them, MyHeritage Knowledge Base, accessed 11 July 2024.
  2. Wright, Jason, How Indexing the 1950 Census Will Be Different, FamilySearch blog, 27 January 2002, accessed 11 July 2024.
  3. Pillai, Abhilash, AI Revolutionizes Genealogy: Discovering Family History and Relationships with Data-Driven Insights, LinkedIn, 4 March 2023, accessed 11 July 2024
  4. Monkey selfie copyright dispute, Wikipedia, 10 June 2024
Retrieved from ""