kids encyclopedia robot

Wayback Machine facts for kids

Kids Encyclopedia Facts
Quick facts for kids
Wayback Machine
Stylized text saying: "INTERNET ARCHIVE WAYBACK MACHINE". The text is in black, except for "WAYBACK", which is in red.
Type of site
Archive
Founded
  • May 10, 1996; 29 years ago (1996-05-10) (private)
  • October 24, 2001; 23 years ago (2001-10-24) (public)
Area served Worldwide (except China, Russia, and Bahrain)
Owner Internet Archive
Commercial No
Registration Optional
Current status Active
Written in HTML, CSS, JavaScript, Java, Python

The Wayback Machine is like a giant digital library for the World Wide Web. It was created by the Internet Archive, which is a non-profit group in San Francisco, California.

Think of it as a time machine for websites! It started in 1996 and became available to everyone in 2001. Its creators, Brewster Kahle and Bruce Gilliat, wanted to give "universal access to all knowledge." They did this by saving copies of web pages over time. This way, you can see how websites looked years ago, even if they're not online anymore.

By the end of 2009, the Wayback Machine had saved over 38.2 billion web pages. As of January 2024, it has archived more than 860 billion web pages. That's a huge amount of information, over 99 petabytes of data!

Exploring the Past: How the Wayback Machine Works

The Wayback Machine started saving copies of web pages in 1996. One of the very first pages was saved on May 10, 1996.

Internet Archive founders Brewster Kahle and Bruce Gilliat launched the Wayback Machine to the public in October 2001. They wanted to solve a big problem: web content often disappears when websites change or shut down. This service lets people see old versions of web pages, which they call a "three-dimensional index" of the internet.

Kahle and Gilliat hoped to save the entire internet! The name "Wayback Machine" comes from a cartoon called The Adventures of Rocky and Bullwinkle and Friends. In one part, "Peabody's Improbable History," characters Mister Peabody and Sherman use a "Wayback Machine" to travel through time.

From 1996 to 2001, all the information was kept on digital tapes. Only a few researchers could access this data. When the archive turned five in 2001, it was opened to everyone. By then, it already had over 10 billion archived pages.

The data is stored on many computers called Linux nodes. The Wayback Machine regularly visits and saves new versions of websites. You can also save a website yourself by typing its URL into the search box. This works if the website allows the Wayback Machine to "crawl" and save its information.

In May 2021, for the Internet Archive's 25th birthday, they launched the "Wayforward Machine." This fun feature lets users imagine the internet in 2046, where knowledge is facing challenges.

How the Wayback Machine Collects Data

The Wayback Machine uses special software called "crawlers" to explore the web. These crawlers download all public information and files from web pages. They also collect data from the Gopher system and Netnews (Usenet) bulletin boards.

However, these crawlers don't get everything. Some data is private or stored in databases that can't be accessed. To help with this, the Internet Archive created Archive-It.org in 2005. This allows organizations and creators to save their own digital content.

The Wayback Machine gets its data from many places. Some data comes from other groups, and some is collected by the Internet Archive itself. For example, the Sloan Foundation and Alexa contribute data. The "Worldwide Web Crawls" have been running since 2010, saving parts of the global web.

When a page is saved, it gets a special time-stamped address. This helps link all parts of the page, like images and styles, to the correct saved version.

How often a website is saved depends on how often it's included in a "crawl list." A crawl can take months or even years to finish. But many crawls run at the same time, so a site might be saved more often.

Since October 2019, users can make up to 15 requests to save or view archives per minute.

Growing Storage Capacity

The Wayback Machine's storage has grown a lot over the years. In 2003, it was growing by 12 terabytes every month. The data is stored on special "PetaBox" systems designed by the Internet Archive.

By 2009, the Wayback Machine held about three petabytes of data. It was growing by 100 terabytes each month! In 2011, the Internet Archive added more storage, increasing its capacity by 700 terabytes.

In January 2013, they announced they had saved 240 billion web addresses (URLs). By December 2014, it had 435 billion web pages, which was almost nine petabytes of data. It was growing by about 20 terabytes every week!

In July 2016, the Wayback Machine held around 15 petabytes of data. By September 2018, it had over 25 petabytes. As of December 2020, it contained over 70 petabytes of data. The Internet Archive says that as of January 2024, they have stored well over 99 petabytes of data.

Wayback Machine Growth
Wayback Machine by Year Pages Archived
2004
30,000,000,000(0-100B : Light blue)
2005
40,000,000,000
2008
85,000,000,000
2012
150,000,000,000(100B-450B : Yellow)
2013
373,000,000,000
2014
400,000,000,000
2015
452,000,000,000(450B-600B : Orange)
2016
459,000,000,000
2017
279,000,000,000
2018
310,000,000,000
2019
345,000,000,000
2020
405,000,000,000
2021
514,000,000,000
2022
640,000,000,000(600B- : Red)

Website Rules: What Gets Archived?

The Wayback Machine used to follow rules from a file called "robots.txt." This file tells web crawlers what parts of a website they can or cannot save. If a website owner blocked the Internet Archive using robots.txt, any pages saved from that site would become unavailable. The Internet Archive also said they would remove content if a website owner asked them directly.

However, in April 2017, this policy changed. Some old websites that were no longer active were using robots.txt to block search engines. This also accidentally blocked them from the Wayback Machine. Now, the Internet Archive needs a clear request to remove a site from the Wayback Machine. They also stopped honoring robots.txt for U.S. government and military websites.

How People Use the Wayback Machine

Since it launched in 2001, people have studied the Wayback Machine to understand how it saves data and what's in its archive. By 2013, about 350 articles had been written about it.

When the Wayback Machine saves a page, it tries to keep most of the links working. This is important because links on the internet can often break. Researchers have found that it saves more than half of the links in online articles.

Journalists use the Wayback Machine to find old news reports, see websites that no longer exist, and track changes to website content. This helps them check facts and hold people accountable. For example, in 2014, an archived social media page of a rebel leader in Ukraine showed him talking about shooting down a plane. This was before it was known that the plane was a civilian jet, Malaysia Airlines Flight 17. He later deleted the post.

In 2017, the idea for the March for Science came from a discussion online. Someone found on Archive.org that all mentions of climate change had been removed from the White House website. This led to the idea of a "Scientists' March on Washington."

Wikipedia editors also use the site a lot to check facts and create new content. When new web addresses are added to Wikipedia, the Internet Archive helps save them.

In September 2020, the Internet Archive teamed up with Cloudflare. This partnership helps automatically archive websites using Cloudflare's "Always Online" service. If a website is down, users can be sent to the archived copy instead.

What the Wayback Machine Can't Do

There are some things the Wayback Machine can't do perfectly. In 2014, there was a six-month delay between when a website was saved and when it became available. Now, this delay is much shorter, usually 3 to 10 hours.

The Wayback Machine also has limited search options. You can search for a site based on words describing it, but not by words found on the web pages themselves.

It doesn't save every web page ever made because its web crawler has limits. For example, it can't fully archive pages with interactive features like Flash games or forms that need you to type things in. It also has trouble with things not coded in HTML, which can lead to broken links or missing images.

The crawler also can't find "orphan pages" that aren't linked to by other pages. And it only follows a certain number of links from each page, so it can't save every single link.

Legal Uses of Archived Web Pages

The Wayback Machine's archives have been used in legal cases. For example, the United States Patent and Trademark Office and the European Patent Office accept dates from the Internet Archive. These dates can prove when a web page was publicly available, which is important for things like prior art in patent applications.

However, there are also limits to using these archives in court. Sometimes, the way websites are archived can be misused. For example, the Wayback Machine doesn't fill out forms, so it can't save the contents of online shopping databases.

Rules and Challenges

In Europe, saving content without permission could go against copyright laws. So, if a creator asks, the Internet Archive might have to remove their pages. The Wayback Machine's website has a section explaining its rules for removing content.

The Internet Archive has faced some legal challenges because of its archiving work.

Censorship and Other Dangers

Archive.org is blocked in China. It was also blocked in Russia for a while in 2015–2016.

In 2015, security experts found that the service could accidentally host harmful software from archived sites.

Alison Macrina, who leads the Library Freedom Project, says that while librarians care deeply about privacy, they are also against censorship.

There was one case where an article was removed from the archive soon after it was removed from its original website. A reporter had written an article that revealed the identities of several gay Olympic athletes in 2016. The original news site removed the article after many complaints. The Internet Archive also removed it, saying they did so to protect the safety of the athletes.

Other threats to the archive include natural disasters, attacks that could damage the data, and laws about copyright or watching what users do online.

Experts like Alexander Rose from the Long Now Foundation worry that in the very long term, much of what we save today might not be useful. He thinks that while the basic data might survive, the way it was presented might not be recognizable. This is because modern websites use complex systems that are harder to archive completely.

In 2016, The Atlantic magazine noted that the Internet Archive, which aims to last a long time, is working hard to save data before it disappears.

See also

Kids robot.svg In Spanish: Wayback Machine para niños

  • Anna's Archive
  • Heritrix
  • Library Genesis
  • Link rot
  • List of Web archiving initiatives
  • Time capsule
  • Z-Library
kids search engine
Wayback Machine Facts for Kids. Kiddle Encyclopedia.