Validating Sitemap Namespace with Pytest
Basic Pytest setup
Sitemaps help search engines index your website correctly. But a malformed or incorrectly-namespaced sitemap can cause issues with indexing. This post walks through using pytest
to validate that your sitemap uses the correct XML namespace.
Why Namespace Validation Matters
The XML namespace ensures the sitemap conforms to the sitemaps.org protocol. If the namespace is incorrect or missing, search engines may ignore your sitemap, leading to poor visibility.
The Test Code
Below is the code that checks whether the sitemap at https://www.americanfreight.com/sitemap.xml
is valid and correctly namespaced:
import pytest
import requests
import xml.etree.ElementTree as ET
from urllib.parse import urlparse
# Define the sitemap URL
SITEMAP_URL = "https://www.americanfreight.com/sitemap.xml"
@pytest.fixture
def sitemap_content():
"""Fixture to fetch and return the sitemap XML content."""
response = requests.get(SITEMAP_URL)
assert response.status_code == 200, f"Failed to fetch sitemap: {response.status_code}"
return response.content
@pytest.fixture
def sitemap_root(sitemap_content):
"""Fixture to parse the sitemap XML and return the root element."""
return ET.fromstring(sitemap_content)
def test_sitemap_fetch(sitemap_content):
"""Test that the sitemap can be fetched successfully."""
assert sitemap_content is not None, "Sitemap content is empty"
assert b"<?xml" in sitemap_content, "Sitemap is not valid XML"
def test_sitemap_root_namespace(sitemap_root):
"""Test that the sitemap has the correct XML namespace."""
expected_namespace = "http://www.sitemaps.org/schemas/sitemap/0.9"
assert sitemap_root.tag == f"{{{expected_namespace}}}urlset",
f"Unexpected root tag or namespace: {sitemap_root.tag}"
Running the Test
To run the tests, use the following command in your terminal:
pytest test_sitemap_namespace.py
Test Breakdown
- sitemap_content: Fetches the sitemap and ensures it's accessible (status 200).
- sitemap_root: Parses the XML content for further inspection.
- test_sitemap_fetch: Confirms the sitemap starts with the
<?xml
declaration. - test_sitemap_root_namespace: Verifies the root element includes the correct namespace defined by sitemaps.org.
Conclusion
These simple but powerful tests can catch issues early in your CI pipeline. Whether you're managing SEO for a large e-commerce platform or a personal blog, namespace correctness should be a non-negotiable in your automated checks.