Design and build
The following guidelines will help ensure that the technical build and design of any new Aviva website or online functionality is optimised for search engine access.
Technical and design tips
Adherence to proper coding practice will facilitate search engine’s indexing our web pages.
- Use cascading style sheets.
- Keep html code as simple as possible and use mark–up headings (H1, H2, etc.).
- Structure code such that the main content appears before the main navigation.
- Use text links, rather than image–based links.
The keywords within a text link are given more weight by search engines than text within an image “alt” attribute. You can use CSS formatting to give a text link a more eye–catching appearance.
- Don’t use imagery or Flash animations to replace essential content.
- Don’t use tiny fonts or “hidden” text to include more keywords on the page.
- Don’t rely solely on dropdown menus for navigation.
Ensure that pages are also accessed through contextual navigation links.
File naming convention
Directory and file naming structure is important for SEO. Search engine friendly URLs present a proper directory structure using words, slashes and filenames only. They should also contain targeted keywords wherever possible.
- Use keywords within the directory and filename structure.
- Use hyphens to separate multiple keywords in URLs rather than underscores or spaces.
- Keep file and directory names in lower case.
This will help to avoid the creation of duplicate content due to variable capitalisation.
Good navigational structure is important to the user experience and is usually built with users in mind. However, it is also important to consider search engines, which make use of the links within your website to crawl and index the pages.
Some navigation may make it difficult or impossible for search engine spiders to access your content. Possible navigational blocks include:
- pages accessible only via a select form and submit button
- pages requiring a drop down menu to access them
- documents accessible only via a search box
- pages requiring a login
- pages that re–direct before showing content (search engines call this cloaking or bait and–switch and may actually ban sites that use this tactic)
It is important to be aware of these blocks and ensure that they are not preventing spiders reaching content that you would like to be indexed.
- Build a comprehensive HTML sitemap.
- Use breadcrumb navigation.
- Create W3C standards–compliant code.
- Avoid graphic based links.
All Aviva websites should contain a text sitemap, as this helps search engines navigate through and index the website, as well as supporting our visitors.
- Include a link to a sitemap in the header of every page.
- Keep the sitemap to no more than 100 or so links if possible.
If it has to be bigger than this, consider breaking it into separate pages.
- Use the ‘nofollow’ HTML attribute on links to less important pages.
For example, select language pages and legal pages. This helps guide search engines to the more relevant pages.
XML Sitemaps are a way of telling Google, and other search engines, about pages on your website that they might not otherwise discover. They can also be used to provide additional metadata about specific content types and additional information about your site.
XML Sitemaps are particularly useful if your site contains:
- Dynamic content
- Pages rich in Ajax or Flash content, which may be hard for search bots to crawl or index
- XML Sitemaps should only include: URLs with a 200 code, the canonical URL and only pages that are able to be indexed
- Re–submit updated XML Sitemaps to Google when amends to the XML. Sitemap have been made.
- Include a reference to the location of your XML Sitemap within the robots.txt file.
Excluding pages from search engines
In general, we want our web pages to appear in search engines results. But occasionally there may be exceptions.
The robots.txt file contains instructions for automated search engine robots to ignore specified files or directories that are otherwise publically–accessible. Please note that although these individual rules will be followed by reputable search engines, they are not enforceable. Therefore keep any confidential information in a password–protected directory or offline.
- Use robots.txt file to guide the search engines.
Indicate which areas of your site you do not want to be included in search results.
- Place robots.txt file in the top–level directory of the website.
eg. www.aviva.com/robots.txt .The location is important as search engines only look here for the robots.txt file rather than searching the whole site. If they do not find this text file, the assumption is that the whole site can be indexed.
Alternately, on individual pages you can use the robots html meta tag to tell robots not to index the page content, and/or not to scan it for links.
Redirects and error pages
- Make use of error pages for pages that no longer exist.
Error pages should trigger a 404 file not found response code so search engines know that the requested URL doesn’t exist anymore.
- Use 301 (Permanent redirect) to redirect traffic from pages that have moved or been renamed.
This preserves your search engine rankings for that page.
- Do not use 302 (Temporary redirect).
Unless you plan to reinstate the page in the future.
- Do not use meta refresh tags for pages that have moved or been renamed.
This is seen as a spam technique by some search engines and you may be penalised for using them.
Directory and file naming structure is important for SEO. Search engine friendly URLs present a proper directory structure using words, slashes and filenames only. They also contain targeted keywords wherever possible.
- Try to use keywords within the directory and filename structure.
- Separate keywords in file and directory names.
Search engines ascribe a high value to keywords in URLs. For example, if you use /corporate–responsibility/street–to–school/india/ rather than /corporateresponsibility/streettoschool/india/, search engines will recognize these keywords and give the page higher rankings for them.
- Use dashes to separate multiple keywords in URLs rather than underscores.
Dashes are preferable to underscores because some search engines do not treat the underscore as a word separator.
- Directory and filenames should be in the correct language for the market.
Do not use English directory or filenames for non–English language sites. You will reduce your search engine ranking if you do not use words that people are searching for.
Search engines limit the size of the page that they cache. They will generally only fully cache pages that weigh less than 150K.
- Minimise the web page size.
Web pages above 150K should be broken down into smaller pages.
- Keep styling within a separate CSS file.
RSS feeds can be an effective way of getting external links to your website. It can also generate you additional entries within search engine results if other people syndicate your content. Google also views news feeds as new content, which is a positive thing.
- Consider adding a content feed for content that is regularly updated, such as news.
- Include a snippet of the content and a link to the content within your website.
This will prevent search engines indexing the content properly.
Search engines cannot easily navigate through or provide helpful links to content that sits within a frameset.
Search engines will typically only index the content within the frameset in its search results, as this is the substantive information on the page. A search user that clicks on one such result will get an orphan page, without any contextual navigation or links to the rest of the website.
- Avoid frames if there is any other way of presenting your content.
Wherever possible, use other solutions such as CSS to control scrolling.
Framesets are strongly discouraged, but are sometimes necessary. If used, it is important to make sure that users would be able to re–establish the frame context by including a link to the homepage with the attribute target=“_top” to prevent your framed site becoming nested within your own frames.
- Include NOFRAMES element with meaningful NOFRAMES content.
- Include links to internal areas of your site within NOFRAMES content.
This will allow the search engines to index all areas of your website.
- Include meaningful titles on frame pages that give a clear idea of the frame content.
The canonical URL is the ‘primary’ or preferred version of a URL, in cases where there are duplicates. Search engines can be confused by, or even penalise, duplicate content. If your website contains identical or very similar content accessible through multiple URLs, you should use the canonical link element to point search engines to your preferred version of the page. This is specified in the header area of the page code.
- Consider specifying a canonical URL for pages with duplicate content.
This is particularly useful in cases where the URL of a page varies due to sort or tracking parameters, or session ID.
- For any pages where you’re not specifying a canonical, the canonical tag should be self referencing
For example, the canonical tag on https://www.aviva.co.uk would point towards https://www.aviva.co.uk
- Use a 301 redirect to point your domain to the www version.
For example, redirect aviva.com to www.aviva.com.
Addressing SEO within the CMS
Some behaviours may be built into the CMS to help maintain overall levels of SEO. Possibilities include:
- Enforce the population of the title tag.
One possibility is to extract the content of the main page heading to populate this, but allow the editor to modify if required.
- Specify page headings and subheadings using HTML header tags.
The main page heading should be <h1> and they should be used hierarchically. Many search engines consider text between heading tags as more important than the content on the rest of the page.
- Make the population of metadata description mandatory.
The metadata description can be pre–populated from the relevant part of the page content.
Static, human–readable page URLs perform better in search engines. In addition, people are more likely to create inbound links to your site. If your CMS cannot be adapted in this way, then the following guidelines are relevant:
- Use ‘masking’ to create human–readable URLs for key pages.
- Limit the number of parameters in the URL to a maximum of two.
Complex URLs with more than one dynamic parameter may not be indexed (a dynamic URL is one where the URL contains a “?” character).
- Make sure that the URL functions if all the dynamic items are removed.
- For multiple versions of a page, use the Canonical tag to point search engines to the authoritative version.
An example of where multiple versions may be created is when parameters are used to track where visitors are coming from.
- Be careful with multiple pages/duplicate content as Google will view this as less relevant and one will cancel the other out.
- Do not use multiple dynamically generated URLs to refer to a single page.
Or search engines may not index your pages at all.
This is a set of tagging which is needed to ensure that if content is spread over multiple pages (for example site.com/page/1, site.com/page/2 or site.com/topic?page=1, site.com/topic?page=2), search engines treat the “page one” (or “view all” if there is one) equivalent as the principle content and do not consider the rest of the page variations duplicated.
It is recommended to use Google’s specification for pagination tagging. For example, if you have content paginated across different URLs like this:
On the first page, http://www.example.com/news, include in the
<link rel="next" href="http://www.example.com/news?page=2" />
On the second page, http://www.example.com/news?page=2 in the
<link rel="prev" href="http://www.example.com/news" /> <link rel="next" href="http://www.example.com/news?page=3" />
On the third page, http://www.example.com/news?page=3:
<link rel="prev" href="http://www.example.com/news?page=2" /> <link rel="next" href="http://www.example.com/news?page=4" />
And on the last page, http://www.example.com/news?page=4:
<link rel="prev" href="http://www.example.com/news?page=3" />
Session IDs – cookies and URL based sessions
Some sites will use session IDs to collect and maintain specific data about visitors as they navigate through the web site. A session ID is a unique identifier that is usually stored in a cookie or in a URL as a parameter. It is important to make sure that session IDs are not required to access a page that should be indexed by search engine spiders.
- Avoid using session IDs for pages that you would like to be indexed.
Session IDs that change the URL parameters every time that a page is accessed may cause the search engine to attempt to index the page hundreds of times. This “duplicate content” may incur a penalty.
- Avoid requiring cookies for pages that you would like to be indexed.
- Detect spiders and switch off URL based sessions for non–logged in users.
Search engine spiders can be detected (and sessions switched off accordingly) by detecting the robot’s identifier in the HTTP headers.
Social Media (SCO)
Many companies are optimising on social media to improve their SEO positioning. An integrated campaign with links from large social media sites can help a site within Google owing to the fast changing and new content these sites provide. Similarly introducing links from your site to sites like Facebook (like or share ability) can help spread your content around the web and onto other sites using the public, which will again boost position. We envisage a separate SCO paper to be constructed this year.
Mobile SEO (MSEO)
The growing use of tablets and smart phones has meant the keywords that users enter into search engines have change. With tablets, consumers still use 5 word searches but with mobile phones consumers tend to use much shorter 2/3 word terms. This needs to be accounted for in your list of key words.
Lazy loading and infinite scrolling
Google have created a good guide on implementing lazy loading images and videos.
If lazy loading isn&rquo;t implemented correctly – it can hide content from Google, which in result will cause pages not to rank well in search engines.
To ensure Google can see all content on a page, their guidelines recommend to do the following:
- Make sure that the lazy loading implementation loads all relevant content whenever it’s visible in the viewpoint by using the IntersectionObserver API and a polyfill.
- To implement infinite scrolling, Google recommends to support paginated loading – this allows Google to show a link to a specific point in the content, rather than the top of an infinite scrolling page.
Testing the implementation by using a Puppeteer script – you’ll need Node.js and run the following script:
git checkout https://github.com/GoogleChromeLabs/puppeteer-examples cd puppeteer-examples npm i node lazyimages_without_scroll_events.js -h
- After running the script, the screenshot images should be shared with the SEO team to make sure all content is accessible to Googlebot and other search engines