In this post, we’re going to explore another of my adventures investigating and debugging issues related to Blazor, specifically with Latin characters.
Table of Contents
1 - Indexing Issue on Google and Blazor
Some time ago I migrated this blog from asp.net core to Blazor and made a video about it on YouTube. During this migration, since I didn’t check if Google was indexing the site properly, I lost a lot of visits.
As you can see in the image, it’s obvious where I started running the site as a SPA and lost all the traffic, even though the bug was fixed in just 2 days.
To solve this, what I had to do was load the site as if it were a static website. It’s not ideal, but since this is a blog that needs to be indexed by Google, it was necessary. For a corporate website, it wouldn’t be needed.
<component type="typeof(App)" render-mode="Static" />
//Mas código
</component>
When I fixed this, I assumed the drop in traffic was because Google had de-indexed the website and that things would recover over time.
But since then, I still haven’t regained my previous traffic levels, which is disappointing since I’ve dedicated a lot of time to the website.
Not being an SEO expert, I simply imagined that others were doing things better and my content was ranking lower.
2 - Problem with Latin Characters in Blazor
Until yesterday, when I searched for one of my posts on Google “Cache NetMentor”
As you can see, the main site, YouTube, and even GitHub show up, but the post about distributed caching does not. I started to investigate and discovered there was an issue with the site’s encoding.
My first thought was oh, I’m not using charset utf-8
and that’s it, but that wasn’t the problem, I was indeed using it. Besides that, I tried a ton of scenarios and none gave the proper result.
To make matters worse, if I put the text with an accent directly on the webpage, it would print fine, but if I used a variable, it would not. For example:
<div>
Caché
</div>
Prints Caché
correctly as expected.
But the following code:
<div>
@variable // LA variable es un string que contiene "caché"
</div>
Prints Caché
This is ruining Google indexing.
So at this point I can be sure it’s not the charset since the accent is printed correctly when typed directly. The problem is definitely in blazor/razor/C#.
3 - Printing Latin Characters in Blazor
It turns out that by default, Blazor (Razor) encodes all characters that aren’t ASCII, meaning all those that aren’t in basic Latin, accents, circumflex, dieresis, etc.
To print these characters correctly, we have to tell the app to leave those characters alone.
This is done from ConfigureServices
;
In our case, we want to exclude from the configuration the Latin characters, and specifically Latin1Supplement
.
services.Configure<WebEncoderOptions>(options =>
{
options.TextEncoderSettings = new TextEncoderSettings(UnicodeRanges.BasicLatin, UnicodeRanges.Latin1Supplement);
});
- Note: for other Unicode characters, check the charts.
In my opinion, this is a bit counterintuitive, since usually when we specify configuration, we’re indicating what we do want to happen, but in this case, we want to specify NOT to encode them.
Alternatively, you can exclude all encoding with UnicodeRanges.All
.
And once this change is made, you can see the characters print correctly:
Conclusion
In this post, we’ve seen how Blazor’s default configuration can really mess up SEO for those of us who write in a language other than English.
Let’s hope that with this change, the blog will be indexed like it used to be.
If there is any problem you can add a comment bellow or contact me in the website's contact form