I’ve now started look at another site with the same issues, albeit on a much smaller scale. This site uses Alloy, not Poster, but the issue is indentical.
When pretty urls are used, random urls which include the page uses pretty urls will return an actual page, although it’s an unformated page, with ramdon content. Nonetheless, crawler bots try to index it, and contstantly crawl it.
So, this is a much sider issue than Poster2. It, for sure, happens with Alloy, and it might be happening with other uses of pretty urls too. Perhaps Poster 2 and Alloy do the whole pretty url thing in the same way?
I don’t know what the solution is. If anyone knows an expert on htaccess then perhaps we can start talking with them to see if the solution is a different way to do pretty urls. But for now, my solution is to turn off prety urls.
I’m gonna tag @dave in this, as he seems to know his way around an htaccess file. Maybe he has some ideas.
I’ve now turned off pretty urls for all pages in the la novia site, so if you click that link you will be sent to the homerpage, with a 404 error being recorded. if you’d like to see what the page would look like with pretty urls on, tell me, and I’ll add them back for some page.
I think what you are saying, is you need to see the code for the page where this url is generated. Correct?
If so, I don’t know where that link is generated. All I know is that Google has it as a link, and so was trying to crawl it.
I have now added pretty urls back for the page: Elysee collection wedding dresses at La Novia Edinburgh and sure enough, that link in the last post, which shouldn’t land on a page, now does. Albeit a page that is garbage.
In the garbage page above: https://www.lanovia.co.uk/oscarlili/lilianadabic/wedding-dresses/elysee/files/files/files/real-brides/ The following stands out to me…
It’s made up of lots of page folder names, all mashed together in one url.
For instance /oscarlili/ is a page, /lilianadabic/ is a page, /wedding-dresses/ is a page. And so on.
If “feels” something is telling search bots to append page folder names to the end of working page urls. Perhaps this is how Google is getting the garbabe urls?
For sure, there is no page anywhere on the site that actually has the url https://www.lanovia.co.uk/oscarlili/lilianadabic/wedding-dresses/elysee/files/files/files/real-brides/ on it. So these garbage urls haven’t come about from the page content. Some how (I think) they are being created dynamically, and search bots are reading them.
Nah, there is just too many of them. There were 60k of pages with such urls. It’s humanly impossible to make that many url errors. Even for me ;-)
They are somehow getting produced dynamically. Then, thanks to pretty urls, they were not being returned as a 404 error. This is the only explanation I can think of.
Since putting pretty urls back on for the page https://www.lanovia.co.uk/elysee/ the following php warnings have started to appear in the error log…
[16-Mar-2025 14:07:54 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:54 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:54 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:54 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:56 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:56 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:57 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:58 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:58 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:58 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:07:59 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:08:01 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:08:02 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
[16-Mar-2025 14:08:02 UTC] PHP Warning: Cannot modify header information - headers already sent by (output started at /home/caffeine/site-lanovia/elysee/index.php:1335) in /home/caffeine/site-lanovia/elysee/index.php on line 4272
I think they are being generated when the page is hit.
The lines around line 4247 in the index.php file on that page looks like this…
What are you using on some of those pages: Poster2, Alloy, something else?
I’m am going to have to remove pretty urls for this page: https://www.lanovia.co.uk/elysee/as having them on is causing a spike on the server again. If anyone needs them added back let me know.
If you would you be so kind and generate the whole website content with pretty URLs activated, and send me the content included htaccess files as zip file. Than I can test everything locally on my MAMP server.
I use both alloy and poster2.
but I have only noticed the messy-URL-problem in my “alloy” projects. my little project with poster2 + pretty url looks good so far
Not necessarily. It’s more the way how the htaccess rules are set up, and that relative links are included in the HTML (generated from RW, Alloy/Poster, or manual links), combined with the PHP renderer.
These wrong links in Google search console can have their origin in one wrong (relative) link, either generated from RW, Alloy/Poster, or by a wrong manual link. This one wrong link will lead to a “garbage URL”, which then itself will generate a lot more “garbage URLs” in a round robin loop style way. This explains the enormous amount of incorrect URLs.
We have to find out which wrong links are the first ones, leading to the other incorrect ones.