Hugo supports multilingual sites natively, and it generates a RSS feed for each sub-site of different language. However, it might be desirable to generate a global feed that includes all articles in all sub-sites.
One possibility is to define a custom output format for homepage in config.toml
:
[outputs]
home = ["HTML", "RSS", "FEED"]
[mediaTypes]
[mediaTypes."application/rss"]
suffixes = ["xml"]
[outputFormats]
[outputFormats.FEED]
mediatype = "application/rss"
baseName = "feed"
In layouts/index.feed.xml
, we then use the following range
loop to iterate though all pages in all languages:
{{ range .Site.AllPages }}
{{ if .IsPage }}
Of course, the generated location would still be under the language prefix lang-code/
. So to make it appear “global” you might need to manually copy it to the root directory.
In addition, I also wrote a simple Python script to combine RSS feeds from different languages.
Basically, it grabs all feeds and merges them into a single feed.
Firstly, I use lxml
to load and parse the RSS feeds.
We define root_dir
as the root directory of the published site,
which is usually a directory called public
, and paths
are the
relative paths of the RSS files, which are usually {LANG_CODE}/index.xml
.
from lxml import etree
def load_feeds(root_dir, paths):
feeds = []
for path in paths:
with open(os.path.join(root_dir, path)) as infile:
feeds.append(etree.parse(infile))
return feeds
We would assume there are only two sub-sites to simplify the logic, but it’s easy to extend it to handle more sites. This is left as an exercise for the readers.
Secondly, we use pytoml
to load the configuration file config.toml
, and
obtain the baseURL
of the site. This would be used to set the
location of the global RSS feed.
In addition, we obtain all entries from each feed and sort the entries by their published date.
Finally we inject all items into a RSS file.
import os
from datetime import datetime
import toml
NAMESPACES = {
'atom': 'http://www.w3.org/2005/Atom',
}
D_FORMAT = '%a, %d %b %Y %H:%M:%S %z'
def process_feeds(main_feed, alt_feed, output_path, config_path):
with open(config_path) as infile:
config = toml.load(infile)
base_url = config['baseURL'].rstrip('/') + '/'
link_node = main_feed.xpath('//rss/channel/link')[0]
link_node.text = base_url
atom_node = main_feed.xpath(
'//rss/channel/atom:link', namespaces=NAMESPACES)[0]
atom_node.attrib['href'] = os.path.join(base_url, output_path)
last_build_node = main_feed.xpath('//rss/channel/lastBuildDate')[0]
last_build_alt_node = alt_feed.xpath('//rss/channel/lastBuildDate')[0]
if datetime.strptime(last_build_node.text, D_FORMAT) < datetime.strptime(
last_build_alt_node.text, D_FORMAT):
last_build_node.text = last_build_alt_node.text
all_items = []
main_items = main_feed.xpath('//rss/channel/item')
all_items.extend(main_items)
all_items.extend(alt_feed.xpath('//rss/channel/item'))
all_items.sort(
key=lambda x: datetime.strptime(x.xpath('pubDate')[0].text, D_FORMAT),
reverse=True)
channel = main_feed.xpath('//rss/channel')[0]
for item in main_items:
channel.remove(item)
for item in all_items:
channel.insert(len(channel.getchildren()), item)
return main_feed
The command line options could be handled by the following code.
import argparse
if __name__ == "__main__":
# Parsing arguments
parser = argparse.ArgumentParser(description='Merge RSS feeds.')
parser.add_argument(
'--root-dir', required=True, help='publish root directory')
parser.add_argument(
'-o', '--output', required=True, help='output rss file')
parser.add_argument(
'-i', '--input', required=True, nargs='+', help='path of the feeds')
parser.add_argument(
'-c', '--config', required=True, help='path of the config')
args = parser.parse_args()
assert len(args.input) == 2
feeds = load_feeds(args.root_dir, args.input)
feed = process_feeds(
*feeds, output_path=args.output, config_path=args.config)
feed.write(
os.path.join(args.root_dir, args.output),
pretty_print=True,
encoding='utf-8')
Running this script would produce a combined RSS feed at output
path.
This is also how the current global RSS feed for City of Wings is generated.