Crawling
Julia Voortman avatar
Written by Julia Voortman
Updated over a week ago

Crawling, sometimes known as "spidering", is a technique that computers use to discover the content of a website. It's a method relied upon by major search engines like Google, and by Silktide.

How crawling works

Crawling is a straightforward process:

  1. Start with a known web page (like a website's homepage)

  2. Download that page

  3. Find all the links on that page

  4. For each link, repeat the process

Crawling only finds linked pages

If a page isn't linked from another page, there's no way for a crawler to discover it. This is important for both Silktide and Google.

For example, a web address that is written on a poster but never linked to elsewhere on your website is known as an 'orphaned page' and will never be crawled.

You can manually add the URL of an orphaned page to a Silktide website report for testing.

Crawling takes time

Crawling a website involves downloading a page, finding new links, following those links, and testing any new pages. This process repeats until all pages are found. Most crawlers, including Google and Silktide, download multiple pages at once to speed up the process. However, downloading a website too quickly can overload the website and cause it to crash.

To prevent this, Silktide limits the number of simultaneous connections to 6, equivalent to 6 regular website users browsing simultaneously.

Crawling can go on forever

Some websites might include 'spider traps', which can cause a crawler to go on crawling forever. A common example is a calendar widget with links to view the next day, and the next, and so on. A crawler doesn't understand that following these links makes no sense, and will continue to try to find the end of a series of URLs that can go on forever.

To avoid this, Silktide can be configured to ignore the URLs that lead to spider traps, while ensuring that the relevant pages you do want to test are included in your website reports.

Did this answer your question?