QA Graphic

Validating Sitemap Namespace with Pytest

Basic Pytest setup

Sitemaps help search engines index your website correctly. But a malformed or incorrectly-namespaced sitemap can cause issues with indexing. This post walks through using pytest to validate that your sitemap uses the correct XML namespace.

Why Namespace Validation Matters

The XML namespace ensures the sitemap conforms to the sitemaps.org protocol. If the namespace is incorrect or missing, search engines may ignore your sitemap, leading to poor visibility.

The Test Code

Below is the code that checks whether the sitemap at https://www.americanfreight.com/sitemap.xml is valid and correctly namespaced:


import pytest
import requests
import xml.etree.ElementTree as ET
from urllib.parse import urlparse
# Define the sitemap URL
SITEMAP_URL = "https://www.americanfreight.com/sitemap.xml"
@pytest.fixture
def sitemap_content():
    """Fixture to fetch and return the sitemap XML content."""
    response = requests.get(SITEMAP_URL)
    assert response.status_code == 200, f"Failed to fetch sitemap: {response.status_code}"
    return response.content
@pytest.fixture
def sitemap_root(sitemap_content):
    """Fixture to parse the sitemap XML and return the root element."""
    return ET.fromstring(sitemap_content)
def test_sitemap_fetch(sitemap_content):
    """Test that the sitemap can be fetched successfully."""
    assert sitemap_content is not None, "Sitemap content is empty"
    assert b"<?xml" in sitemap_content, "Sitemap is not valid XML"
def test_sitemap_root_namespace(sitemap_root):
    """Test that the sitemap has the correct XML namespace."""
    expected_namespace = "http://www.sitemaps.org/schemas/sitemap/0.9"
    assert sitemap_root.tag == f"{{{expected_namespace}}}urlset", 
        f"Unexpected root tag or namespace: {sitemap_root.tag}"
  

Running the Test

To run the tests, use the following command in your terminal:

pytest test_sitemap_namespace.py

Test Breakdown

  • sitemap_content: Fetches the sitemap and ensures it's accessible (status 200).
  • sitemap_root: Parses the XML content for further inspection.
  • test_sitemap_fetch: Confirms the sitemap starts with the <?xml declaration.
  • test_sitemap_root_namespace: Verifies the root element includes the correct namespace defined by sitemaps.org.

Conclusion

These simple but powerful tests can catch issues early in your CI pipeline. Whether you're managing SEO for a large e-commerce platform or a personal blog, namespace correctness should be a non-negotiable in your automated checks.

 

Comments

Add Your Comments

Name:
Comment:

 

About

Welcome to Pytest Tips and Tricks, your go-to resource for mastering the art of testing with Pytest! Whether you're a seasoned developer or just dipping your toes into the world of Python testing, this blog is designed to help you unlock the full potential of Pytest - one of the most powerful and flexible testing frameworks out there. Here, I'll share a treasure trove of practical insights, clever techniques, and time-saving shortcuts that I've gathered from years of writing tests and debugging code.

Schedule

Wednesday 18 Pytest
Thursday 19 PlayWright
Friday 20 Macintosh
Saturday 21 Internet Tools
Sunday 22 Misc
Monday 23 Media
Tuesday 24 QA