Wayback Machine facts for kids

Kids Encyclopedia Facts

Quick facts for kids
Wayback Machine

Type of site	Archive
Founded	October 24, 2001; 24 years ago (2001-10-24)
Area served	Worldwide (except China and North Korea)
Owner	Internet Archive
Website
Commercial	No
Registration	Optional
Current status	Active
Written in	HTML, CSS, JavaScript, Java, Python

The Wayback Machine is like a giant digital library for the World Wide Web. It was created by the Internet Archive, a group that works to save online information. Launched in 2001, this amazing service lets you travel "back in time" to see how websites looked years ago.

The people who started it, Brewster Kahle and Bruce Gilliat, wanted to make sure everyone could access all knowledge. They aimed to save copies of web pages, even if the original websites changed or disappeared. The Wayback Machine started saving pages as early as 1995. By the end of 2009, it had saved over 38 billion web pages. As of November 2024, it holds more than 916 billion web pages and over 100 petabytes of data. That's a huge amount of information!

History of the Wayback Machine
- Why the Wayback Machine Was Created
- How the Archive Grew
How the Wayback Machine Works
Uses of the Wayback Machine
- How People Use It
- What the Wayback Machine Cannot Do
Challenges and Threats
- Censorship and Other Issues
See also

History of the Wayback Machine

The Internet Archive began saving web pages way back in 1995. One of the very first pages was saved on May 8, 1995.

Why the Wayback Machine Was Created

Brewster Kahle and Bruce Gilliat officially launched the Wayback Machine in October 2001. They wanted to solve a big problem: web content often vanishes when websites change or shut down. The Wayback Machine lets users see old versions of web pages. It's like a "three-dimensional index" of the internet's past.

The creators hoped to archive the entire internet. They wanted to provide "universal access to all knowledge." The name "Wayback Machine" comes from a cartoon from the 1960s. In The Adventures of Rocky and Bullwinkle and Friends, characters Mister Peabody and Sherman use a "Wayback Machine" to travel through history.

How the Archive Grew

From 1996 to 2001, the information was stored on digital tapes. Researchers could sometimes access this "clunky" system. When the archive turned five in 2001, it was opened to the public. By then, it already held over 10 billion archived pages.

The data is stored on many computers called Linux nodes. The Wayback Machine regularly visits websites to save new versions. You can also save a website manually. Just type its URL into the search box on the Wayback Machine's main page. This works if the website allows the machine to "crawl" and save its data.

In October 2013, a "Save Page Now" feature was added. This lets anyone save a web page instantly. The saved page gets a permanent link. In May 2021, for its 25th anniversary, the Wayback Machine even launched a "Wayforward Machine." This lets users imagine the internet in 2046, where knowledge is under attack.

How the Wayback Machine Works

The Wayback Machine uses special software called "crawlers." These crawlers explore the web and download publicly available information. This includes web pages, files, and even old bulletin board systems.

What Gets Archived

The crawlers try to collect as much as possible. However, they don't get everything. Some data is private or stored in databases that can't be reached. To help with this, the Internet Archive created Archive-It.org in 2005. This allows organizations to save their own digital content.

The archived content comes from many places. Some are from partners like the Alfred P. Sloan Foundation and Alexa Internet. Others are from crawls run by the Internet Archive itself. The "Worldwide Web Crawls" have been running since 2010. They aim to capture the global web.

When a page is saved, it gets a special time-stamped address. This helps link all parts of the page, like images and styles, to the correct saved version.

How Often Pages Are Saved

How often a website is saved depends on many things. Websites in the "Worldwide Web Crawls" are saved once per crawl. A crawl can take months or even years to finish. But many crawls can happen at the same time. A website might also be on more than one crawl list. So, some sites are saved more often than others.

Saving Pages Yourself

The "Save Page Now" feature is easy to use. You can find it on the main page of the Wayback Machine. Just type in a website's address and click save. The page will then become part of the Wayback Machine. You can even upload files like PDFs. The Wayback Machine creates a permanent link for your uploaded content.

How Much Data It Stores

The Wayback Machine's storage has grown a lot over the years. In 2003, it was growing by 12 terabytes each month. The data is stored on special computer systems called PetaBox racks.

By 2009, the Wayback Machine held about three petabytes of data. It was growing by 100 terabytes every month. In 2011, the Internet Archive added more storage, increasing capacity by 700 terabytes.

In January 2013, the archive reached 240 billion web addresses. By December 2014, it held 435 billion web pages, which was almost nine petabytes of data. It was growing by about 20 terabytes a week.

In July 2016, the Wayback Machine had about 15 petabytes of data. In September 2018, it had over 25 petabytes. As of December 2020, it contained over 70 petabytes of data.

Wayback Machine growth
Wayback Machine by year	Pages archived
2004	30,000,000,000(0–100B: Light blue)
2005	40,000,000,000
2008	85,000,000,000
2012	150,000,000,000(100B–450B: Yellow)
2013	373,000,000,000
2014	400,000,000,000
2015	452,000,000,000(450B–600B: Orange)
2016	459,000,000,000
2017	279,000,000,000
2018	310,000,000,000
2019	345,000,000,000
2020	405,000,000,000
2021	514,000,000,000
2022	640,000,000,000(600B–: Red)
2024	866,000,000,000

Website Exclusion Policy

The Wayback Machine used to follow rules from a file called "robots.txt." This file tells web crawlers which parts of a website they should not visit or save. If a website blocked the Internet Archive, any pages saved from that site would become unavailable. Also, website owners could ask the Internet Archive directly to stop saving their site.

However, this policy changed on April 17, 2017. Now, the Internet Archive needs a clear request to remove sites from the Wayback Machine. This means that if a website disappears and its "robots.txt" file changes, its old pages might still be available.

Uses of the Wayback Machine

Since its launch, the Wayback Machine has been very useful. People use it to study how websites have changed over time.

How People Use It

When the Wayback Machine saves a page, it tries to keep all the links working. This is helpful because links on the internet can often break. Researchers have found that it saves more than half of the links in online articles.

Journalists use the Wayback Machine to find old news reports and see how website content has changed. This helps them check facts and hold people accountable. For example, in 2014, an archived social media page of a rebel leader in Ukraine showed him talking about shooting down a plane. This was before it was known that the plane was a civilian jet. He later deleted the post.

In 2017, the March for Science started after someone found that all mentions of climate change had been removed from a government website using Archive.org. This led to a user suggesting a "Scientists' March on Washington."

Wikipedia editors use the site a lot to check facts and create new content. The Internet Archive also saves new web addresses added to Wikipedia.

What the Wayback Machine Cannot Do

The Wayback Machine has some limits. It doesn't save every web page ever made. There can also be a delay between when a website is saved and when it becomes available. As of 2024, this delay is usually 3 to 10 hours.

It also has limited search options. You can search for a site based on words describing it, but not usually by words found on the pages themselves.

The Wayback Machine can't fully archive websites with interactive features. This includes things like Flash games or forms that need you to type information. For example, it can't show YouTube comments because they are not loaded directly on the video page. Also, its crawlers have trouble with things not coded in HTML. This can lead to broken links or missing images. It also can't archive "orphan pages" that aren't linked to by other pages.

Challenges and Threats

The Wayback Machine faces some challenges.

Censorship and Other Issues

The Internet Archive is blocked in China. It was also blocked in Russia for a time in 2015–16.

Sometimes, security experts have found that the service accidentally hosts harmful files from archived sites.

There have been cases where articles were removed from the archive. For example, a Daily Beast article that revealed information about gay athletes was removed from its original site. The Internet Archive also removed it to protect the athletes' safety.

Other threats include natural disasters, damage to its systems, and problems with copyright laws. It's also hard to archive very complex websites that use lots of different systems.

In September 2024, the Internet Archive had a data breach. This meant some personal information, like email addresses, was exposed. On October 9, 2024, the site went down because of a distributed denial-of-service attack. It came back online on October 14 but was in "read-only" mode until November 4. During this time, the "Save Page Now" feature was turned off.