Wikipedia:Link rot/URL change requests

WP:URLREQ

This page is for requesting modifications to URLs, such as marking dead or changing to a new domain. Some bots are designed to fix link rot; they can be notified here. These include InternetArchiveBot and WaybackMedic. This page can be monitored by bot operators from other language wikis since URL changes are universally applicable. Ŋun su:ClueBot III/ArchiveThis

RuPaul

I noticed that on the page for RuPaul, that citation № [102], concerning his spouse Georges LeBar, leads to a link for a site known as georgeslebar.com which when clicked leads to nothing but open domain. MetricPin (talk) 03:03, 28 July 2023 (UTC)Sincerely, MetricPin.

Done, I removed that citation and replaced it with a more appropriate WP:RS for the citation. Phuzion (talk) 17:07, 5 September 2023 (UTC)

Purging all mainspace links to fmg.ac/Projects/MedLands

Hi, I'd like to request that all mainspace links to fmg.ac/Projects/MedLands be removed, because consensus was reached it is an unreliable source, but it's impractical to manually remove all 1,300+ links. More explanation is at Wikipedia:Bot requests#Erasing all links to fmg.ac/Projects/MedLands from mainspace. Thanks! Nederlandse Leeuw (talk) 21:56, 2 June 2023 (UTC)

User:Nederlandse Leeuw, is this to remove links within citations (keeping the rest of the citation); or, remove complete citations including the surrounding ref tags? -- GreenC 00:05, 3 June 2023 (UTC)

The latter. Nederlandse Leeuw (talk) 03:12, 3 June 2023 (UTC)

"Terminate with extreme prejudice". I'll start on it soon. -- GreenC 03:49, 3 June 2023 (UTC)

Thanks! Nederlandse Leeuw (talk) 04:33, 3 June 2023 (UTC)

I'll probably be able to get most of them automated (with programming) but there will be some that can't be automated, which I hope you are or someone else can manually remove to finish it with more refined work. Wiki has endless ways to do things, I can't program for every possibility, or sometimes it can't be done safely. -- GreenC 13:36, 3 June 2023 (UTC)

Please ping me once the automated process has finished, I'm happy to help with the final clean up. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 20:04, 3 June 2023 (UTC)

ActivelyDisinterested and Nederlandse Leeuw: The user User:Roelof Hendrickx is systematically reverting the work to remove the fmg.ac references. IMO the correct action is to remove the text the references were citing, not to restore an unreliable source. I edited about 300 of 1000 pages. I won't do anything further until this is resolved. Good luck. -- GreenC 00:46, 5 June 2023 (UTC)

They are also edit warring now, I can't continue while there is a dispute. Suggest either convince the editor, or an RfC/RSN resolution to settle the issue. Let me know when you are ready to continue! -- GreenC 00:55, 5 June 2023 (UTC)

Of course I am reverting the edits you have made. Those edits are demolishing articles, deleting valuable references and footnotes and external links. Roelof Hendrickx (talk) 01:15, 5 June 2023 (UTC)

Keeping unreliable sources is, arguably, "demolishing" those articles. You have to make the argument why not, respond to the points made in the discussion at Wikipedia:Reliable_sources/Noticeboard/Archive_405#fmg.ac_(Foundation_for_Medieval_Genealogy). -- GreenC 03:22, 5 June 2023 (UTC)

Under the header External links there are no sources, just external links. So they shouldn't be deleted, that's censoring. Furthermore the bot removed references but retained the texts. So texts with sources have become unsourced now, does that make Wikipedia better? And finally, a bot should be tested before being used. Which it wasn't as it sometimes only removed part of the link in footnotes and left a mess of the texts. Roelof Hendrickx (talk) 08:52, 5 June 2023 (UTC)

I cannot reply in an archived discussion. A discussion for which the users that have used the website as source for information were not invited. Then it's easy to reach "consensus". But a so-called consensus between persons with the same biased POV is worthless imho. Roelof Hendrickx (talk) 08:57, 5 June 2023 (UTC)

@Roelof Hendrickx you can always unarchive the discussion and raise the points you have made. – robertsky (talk) 09:03, 5 June 2023 (UTC)

I don't know how, and I don't care anymore. This isn't the first problem I have with the way it works here. I have enough of it all. I'm leaving and will not contribute anymore. So go on with destroying articles with bots that don't work well and keep on censoring external links. You have my blessing. Roelof Hendrickx (talk) 09:09, 5 June 2023 (UTC)

The editor has retired with the message I will not accept any responsibility for the reliability of the information in the articles I have edited when those articles have been altered by another user of Wikipedia, which seems to fundamentally misunderstand how Wikipedia works, and previously edit warred with citation bot trying to stop it from replacing curly apostrophes per MOS:CURLY. If the had wished to discuss the matter I would have suggested halting, but as they don't I believe this should continue. As to the question of external links the argument against MedLands is that it's interpretation of primary sources leaves a lot to be desired, to put it mildly, and per WP:ELNO #2 were better off without these links. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 10:11, 5 June 2023 (UTC)

No, I now know exactly how Wikipedia works. That's the reason why I decided to retire, because the way Wikipedia works has nothing to do with being an encyclopedia. An encyclopedia doesn't use censorship and it doesn't use a manual of style that forces users to makes texts less readable for readers. Neither does it delete references while retaining the text, nor does it uses bots that cripple texts.

As for Medieval Lands, it's a source just as reliable as other secondary sources. It has its errors, mistakes and typos, just as other sources. All sources should be used with caution. One has to know who's the author, what's the objective of the source, where does the information come from, and are the sources known. When a user writes or edits an article using multiple sources including Medieval Lands, and shows that he/she is interpreting the sources in a scientific way, imho one should try to talk to that user first before deleting information and references. I have used Medieval Lands for articles about members of the House of Nassau, and that part of that website is more reliable than Europäische Stammtafeln and also more reliable than Wikipedia.

I regret ever starting contributing to Wikipedia, it has proven to be a complete waste of my time and energy. Roelof Hendrickx (talk) 10:56, 5 June 2023 (UTC)

Maybe you could have taken part in the multiple discussion over the last decade that have shown the many issues with Medlands, and that such issues go unresolved. Those have included that if MedLands uses sources that are reliable then use those instead of MedLands. If you wish to continue the discussion I suggest opening a thread at WP:RSN. As with curly apostrophes, if you disagree with community standards the solution is to open discussions about them not edit war. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 11:54, 5 June 2023 (UTC)

I started editing articles here 2 years ago. I was not aware of any discussion on Medieval Lands until the changes by the bot. As I mentioned above, as a user who used this website to edit articles, I would have thought it would have been neat if I had been invited to the discussion. New users don't automatically become familiar with ongoing discussions. But by now, I am no longer interested in continuing the discussion. I am done with Wikipedia. Roelof Hendrickx (talk) 14:13, 5 June 2023 (UTC)

Hi everyone, I hadn't been active on Wikipedia today until now, so I had some catching up to do. I am quite perplexed by the reactions and responses made by Roelof Hendrickx towards the virtually unanimous consensus that fmg.ac/Projects/MedLands

is an unreliable source;
has always been an unreliable source;
was already agreed in 2012 to be eventually phased out as an unreliable source; and
has in the past few weeks been agreed to be finally purged as an unreliable source (both at WP:RSN and at TfD).

The claim that these are valuable references and footnotes and external links had already been rejected by community consensus. Deleting unreliable sources has nothing to do with WP:CENSOR: Content will be removed if it is judged to violate Wikipedia's policies, in this case WP:RS.

Hendrickx: texts with sources have become unsourced now, does that make Wikipedia better? I have argued (successfully) at RSN that I'd rather move from unreliably-sourced to unsourced statements; readers will treat the latter with more skepticism, and editors will be more motivated to fix the problem. The statements aren't necessarily untrue just because Cawley made them; we just want to give the opportunity to editors to find reliable sources supporting the same statements instead rather than continuing to rely on a source that has been known for over 11 years to be repeatedly unreliable. Because: Every day Cawley stays up across c. 700 articles (576 through the template, 123 outside the template minus the c. 10 that I manually purged already), we are potentially misleading more readers into a false sense of security that certain claims made by Cawley are somewhat reliable, and giving the impression that only a "better source (is) needed" to what Cawley has already "proven". (..) I don't want future readers to be misled.

Hendrickx has also shown a (regrettable) unwillingness to appeal the process and seek a solution within our policies and guidelines, even when offered to do so by reopening the RSN, or other options. As a relatively new user, he may indeed not have been familiar with the policies and guidelines which we have, let alone ongoing discussions about particular sources (which have been had many times in the RSN archives, as I also found it). But he does have an obligation to find out or check those that are or may be relevant to the kind of edits he wants to do. Since his first edit on 9 June 2021, he has had plenty of opportunity to read the relevant policies and guidelines, ask questions about anything he didn't understand, or question any rule which did not make sense to him. (Which is entirely possible, as rules are changed, modified and refined all the time. Wikipedia wasn't developed overnight, and its policies and guidelines aren't set in stone, although obviously some policies and guidelines are more strongly accepted and important than others).

If he hasn't read or understood WP:RS, WP:CENSOR or other relevant policies and guidelines before making edits, that has been at his own risk, and he only has his own carelessness to blame. Especially if his edits have been so incredibly dependent on this one single - and as it turns out very unreliable - source, that seeing it purged leads him to lose all willingness to participate in the project anymore. Again, here the community cannot be held responsible for the poor editing decisions Hendrickx has made about how sustainable those edits might be in the long term if they are based on an unreliable source. Every editor knows, or at least should know, that Wikipedia isn't a free-for-all, and one cannot ignore existing rules at one's pleasure. Especially the edit-warring is something he should know (WP:EW) would not help with what he wanted to do. I'm afraid there is nothing more that we can do for Hendrickx. We've given him all the chances, but if he retires on his own accord after having misunderstood how Wikipedia works, that's all we can do. Nederlandse Leeuw (talk) 14:39, 5 June 2023 (UTC)

Every chance? Yeah right, keep on dreaming. Indeed, I retire on my own accord. And that's due to Wikipedia's editors, not to me. Roelof Hendrickx (talk) 17:42, 5 June 2023 (UTC)

Unfortunately it's not possible to invite people to such discussions, because it's not possible to determu which editors to invite. Discussions have shown a myriad of issue with MedLands, I can understand this is annoying as it lays out a lot of details in a very simple manner. The only suggestion I have is using it for study, and then going from there to the sources it uses (much in the same way that Wikipedia is used by many people). With all that said unless you willing to come to WP:RSN and convince editors this is a reliable source there's little more to say. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 15:23, 5 June 2023 (UTC)

Well said. Nederlandse Leeuw (talk) 16:58, 5 June 2023 (UTC)

Not possible to invite people to such discussions, but it is possible to find the articles where the said website is mentioned. Right. It says enough about willingness to new users. Roelof Hendrickx (talk) 17:43, 5 June 2023 (UTC)

But then there are still dozens, I not hundreds of editors for some articles, many if not most of who have no interest. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 18:27, 5 June 2023 (UTC)

As a separate comment I don't believe that saying the bot was making a mess is a fair description. GreenC was only doing this work only because they were asked to, and had been checking their work as it went along. As with WP:NOTVANDAL it's important to not describe edits you believe are mistaken in that way. They may have been removing links you believe are useful, but that is something to discuss with myself, Nederlandse Leeuw, and the other editors on RSN. Rather than a reason to denigrate an editor who was only editting in good faith. It's important to remember to assume good faith and that other editors are only acting to help improve the project. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 15:32, 5 June 2023 (UTC)
I fully agree. Nothing GreenC or their bot did was inappropriate. And, since Hendrickx has indicated to longer be interested in explaining or otherwise contributing to Wikipedia anymore at all, the bot may now resume the process. Cheers, Nederlandse Leeuw (talk) 17:01, 5 June 2023 (UTC)

So this edit isn't crippling text? Thanks for clarfying that. And you wonder why I have retired? Roelof Hendrickx (talk) 17:49, 5 June 2023 (UTC)
@Roelof Hendrickx The edit in question was already corrected 46 minutes later, so your complaint is frivolous. If you are really retiring, I advise you to Wikipedia:Leave gracefully: If you choose to leave the project, do so in a graceful and dignified fashion. It is not necessary to secure the last word, and it is not fair to put other editors in the difficult position of having to assist you withdraw from the project while you attempt to do so. Nederlandse Leeuw (talk) 18:24, 5 June 2023 (UTC)
No, it was corrected before by me. The edit you refer to again has crippling of text, including the removal of a header and an entire paragraph. I leave in the fashion I have been treated here, not just in this debate but also in four earlier encounters. Roelof Hendrickx (talk) 22:38, 5 June 2023 (UTC)
You leave with making little sense, and making no actual argument for you points. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 22:42, 5 June 2023 (UTC)
On the contrary, it's you that's not making actual arguments for your points. The removal of a header and an entire paragraph is not making a point? Says enough doesn't it? Roelof Hendrickx (talk) 22:44, 5 June 2023 (UTC)
I'm not going to continue this discussion with you. If mistakes were made they would have been corrected, your continued aspersions and nitpicking change nothing. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 22:47, 5 June 2023 (UTC)

The edits are done in batches and then checked, again this is neither malicious in any way nor an effect of incompetence. It was a temporary issue that was corrected a short time later. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 18:32, 5 June 2023 (UTC)
This kind of work deleting entire citations is complex and inherently error prone. The mere existence of errors means nothing. What is important is how many errors, and what was done to correct them. The answer: a 7.3% error rate ie. 73 out of 1000 articles will have an error (based on the results of the first 300 edits). And every edit is manually reviewed and corrected. There are many ways to do automated editing on Wikipedia, a manual review of every edit is acceptable. A 100% correct fully automatic bot is very difficult to make, that kind of labor is not warranted for this few articles. I'm not going to spend days programming just so 73 articles don't have a temporary error that I can manually review and fix in a few minutes. The nature of this work follows the 80/20 Rule which is to say the first 80% is trivial to fix, the next 20% is hard. The last 5% is the hardest of all, taking as long to program for as the first 95% was - so I don't bother with those edge cases rather do them manually since it's only 73 pages. -- GreenC 21:35, 5 June 2023 (UTC)
I know it's a waste of time, but still I give you this advise if you wanna retain users. Invite them to the discussion, and invite them to think about changing text they wrote. If I had not been completely surprised by the bot edits, and proper explanations had been given, I would've been willing to think with you and help you out on the texts I wrote. It's just a little investment of time that doesn't scare new and inexperienced users away. Roelof Hendrickx (talk) 22:42, 5 June 2023 (UTC)
You've been here long enough that you should understand at least some of how Wikipedia works. What you have shown here is that you are unwilling to actually discuss issues, or allow anyone to dare touch "your" articles (something I note you have already been warned about). -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 22:45, 5 June 2023 (UTC)
Well said. To which I might add that WP:OWNERSHIP of articles is not a thing. Nederlandse Leeuw (talk) 22:52, 5 June 2023 (UTC)

I know it's a waste of time. Then do yourself and us all a favour, stop responding, Wikipedia:Leave gracefully, and go do something else that makes you happy, please. Nederlandse Leeuw (talk) 22:50, 5 June 2023 (UTC)

Corrected by me, and then crippled again. Roelof Hendrickx (talk) 22:38, 5 June 2023 (UTC)
Please just leave, Roelof. There's no point in continuing to complain and being hostile to us if you're WP:NOTHERE to build an encyclopedia anymore anyway, and still refuse to learn how our policies and guidelines work. Go do something else that makes you happy. We'll take it from here, thanks. Nederlandse Leeuw (talk) 22:44, 5 June 2023 (UTC)
GreenC, Nederlandse Leeuw I suggest we pause this for a week. That will allow Roelof Hendrickx time to make any changers they desire or to formulate an argument to WP:RSN on MedLands reliability if they want to. Failing any other objects we would then start back up again. Sorry to jerk you around GreenC, but this seems a better idea than continuing the current conversation. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 10:48, 6 June 2023 (UTC)
I second AD's suggestion. Nederlandse Leeuw (talk) 11:25, 6 June 2023 (UTC)

This edit suggest an implicit agreement: Special:Diff/1158585285/1158736035 .. if they are making active changes no rush. Completed through page 400 of about 1000. -- GreenC 14:48, 6 June 2023 (UTC)
I agree, but giving them some time to manually correct any articles they wish to doesnt hurt anything. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 20:39, 6 June 2023 (UTC)
No problem ping me when you are ready. I may be in other jobs but everything is setup now. -- GreenC 20:54, 6 June 2023 (UTC)
Will do, thanks GreenC. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 20:59, 6 June 2023 (UTC)
Is it really necessary to remove from External links? I wasn't under the impression RS applied to those. Srnec (talk) 21:13, 6 June 2023 (UTC)
I believe it comes under WP:ELNO #2. MedLands presents dubious interpretation of primary documents as fact. Something that has been mentioned in the RSN threads. Linking will present details that are not in Wikipedia's articles, as historians have rejected them, and leave readers wondering why. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 21:36, 6 June 2023 (UTC)
Ok it's been seven days without any reply. GreenC apologies for jerking you about, could you complete the work when you have time. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 11:51, 13 June 2023 (UTC)
OK. If the bot edits a page where Roelof has an active interest (no pun intended!), recommend they revert the bot, then edit to remove the unreliable sources and content, as they want to handle it, for that page. -- GreenC 14:41, 13 June 2023 (UTC)
No offence taken. Although I haven't checked all the articles for which texts I'm responsible, in the sense that I wrote the texts, I believe that the articles I still have to check, haven't references or footnotes that contain links to Medieval Lands. So, I doubt I will have to revert the bot edits in those articles. I might do some additional changes, but I doubt that those changes will have anything to do with the bot edits.

In the articles I have checked and changed the bot won't find Medieval Lands anymore. Roelof Hendrickx (talk) 17:09, 13 June 2023 (UTC)
@Roelof Hendrickx I would like to thank you for the efforts you have undertaken in recent days to improve the articles in question. (For instance, this edit; I'm pleasantly surprised that there is a journal about genealogy and heraldry called De Nederlandsche Leeuw – almost identical to my username 'Nederlandse Leeuw' – which can be used as an WP:RS for genealogical information in articles such as Adelaide of Vianden). I would also like to apologise for some remarks I made to you on 5 June that were a little harsh; I should have dealt with my frustation more constructively. You have shown in the past 9 days that you are WP:HERE to help make Wikipedia better, and willing to follow the policies and guidelines in order to do so. I'm very glad with that, and I look forward to working with you if our paths should ever cross again. Happy editing! Tɛmplet:Smiley Nederlandse Leeuw (talk) 07:28, 14 June 2023 (UTC)
@Nederlandse Leeuw First of all my apologies for my late reply. I unfortunately was occupied with private matters until now. Thanks very much for your message, much appreciated. And I immediately admit that I also made comments that were not civilised at all. I think emotions took the better of me, which shouldn't have happened. I'm only human, and sometimes I make that mistake. My apologies for that too, also to @ActivelyDisinterested and @GreenC. I hope for forgiveness and hope we can continue as if it didn't happen. Or as we Dutch say "zand erover".

As for the magazine, I'm surprised that you didn't know it. I really thought you had named yourself after it! Or did you name yourself after the order of chivalry? Roelof Hendrickx (talk) 17:32, 16 June 2023 (UTC)
@Roelof Hendrickx Much appreciated! It might not seem like it, but on this side of the screen, there is also a human being with his own flaws and limitations who doesn't always do things right. As for my nickname, it's not really named after anything in particular, except maybe the Dutch Republic Lion in heraldry, or the Leo Belgicus in cartography (those maps looked pretty cool). But I don't take any of them very seriously, and the nickname has no connections with organisations, publications, or orders of chivalry. Nederlandse Leeuw (talk) 17:43, 16 June 2023 (UTC)
@Nederlandse Leeuw Now that we both gracefully admit that we're only human with flaws and limitations and that we could admit that we both are able to make mistakes, I'm sure it will work out fine in the future between us. I'm really glad that we cleared it!

Good to know that you didn't name yourself after anything in particular. it taught me again not to assume knowing something I cannot know for sure. Roelof Hendrickx (talk) 18:02, 16 June 2023 (UTC)
@Roelof Hendrickx Likewise! I assumed you would be Flemish because of your last name's spelling until you said "as we Dutch say". happy editing! Tɛmplet:Wink Nederlandse Leeuw (talk) 18:05, 16 June 2023 (UTC)

@ActivelyDisinterested and Nederlandse Leeuw: All refs removed (1,042 pages).[1] .. recommend submitting a request to WP:BLACKLIST otherwise users will re-add over time since they show on Google and might appear reliable. -- GreenC 01:04, 14 June 2023 (UTC)
@GreenC Thanks a lot for your work! Nederlandse Leeuw (talk) 07:08, 14 June 2023 (UTC)
It's already being re-added: Special:Diff/1160003284/1160040573 -- GreenC 13:36, 14 June 2023 (UTC)
It will happen, as with other unreliable and misleading sources they keep coming back. Thanks for your work. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 13:43, 14 June 2023 (UTC)

I have no problem with this website being deleted as a source. It is not an RSN because it is self-published, but as far as I can see most of the bot edits are deletions from external links sections? I don't see any rationale anywhere for mass removals from external links sections. Is that the intention?--Andrew Lancaster (talk) 05:29, 14 June 2023 (UTC)
Yes, per WP:EXT: Some acceptable external links include those that contain further research that is accurate. We have already established repeatedly at RSN that Cawley is frequently inaccurate, and Cawley admitted himself that the early part of his detailed Rurikid genealogy may be of little factual significance but is reproduced by way of interest. This is just one of many examples where Cawley shows that he values presenting things that interest him (WP:IJUSTLIKEIT) over being a reliable source (WP:RS). Some of his sources are also unverifiable, such as the 7 sources which are 'private emails' with certain authors, violating WP:V. We should not continue to host known inaccurate/unreliable or unverifiable sources in external links sections per WP:ELNO (as ActivelyDisinterested pointed out above): Any site that misleads the reader by use of factually inaccurate material or unverifiable research, except to a limited extent in articles about the viewpoints that the site is presenting. Cawley is not presenting 'viewpoints' other than his own hobbyist interest in genealogy, and presenting Cawley's non-expert viewpoint is WP:UNDUE. Cheers, Nederlandse Leeuw (talk) 07:07, 14 June 2023 (UTC)

bollywoodhungama.com

4,231 pages. Many soft404s redirecting to the home page. -- GreenC 17:19, 18 June 2023 (UTC)

A good idea is set the archive to the earliest archive. @GreenC Notrealname1234 (talk) 14:35, 19 June 2023 (UTC)

Done. -- GreenC 12:08, 6 July 2023 (UTC)

NewspaperSG url change for articles.

The newspaper archive of Singapore, NewspaperSG has moved its newspaper archive url from eresources.nlb.gov.sg/newspapers/Digitised/Article/straitstimes19331015.2.83 to eresources.nlb.gov.sg/newspapers/digitised/articles/straitstimes19331015.2.83 . You can use the first three references from Lim Yong Liang as an example. While the case change does not affect the url, the introduction of a s break the url. This should be a quick and easy change. Appreciate if anyone can assist with this. Thanks. Justanothersgwikieditor (talk) 04:34, 24 June 2023 (UTC)

User:Justanothersgwikieditor: I started working on this last night and ran into some complications with the remote site blocking bots; and other things like for example "/Digitised/Article.aspx?articleid=straitstimes19730106-1.2.98" needs to be converted to "/digitised/articles/straitstimes19730106-1.2.98" .. that was from a test of the first 10 pages so there are probably other issues to be discovered. Won't have much time today but I continue to work on this. -- GreenC 14:35, 24 June 2023 (UTC)

@GreenC, thanks for working on this. As far as speed os concerned on this, take your time. This issue was noted the last time NewspaperSG changed their format. I done a fix using AWB previously but my fix did not cover all cases it seems (i still find articles in this old format from time to time as you did now). Thank you. Justanothersgwikieditor (talk) 15:23, 24 June 2023 (UTC)

There's "/Digitised/Page.aspx?pageid=" and "/Digitised/Page/" which converts to "/digitised/pages/" .. there are also a few search results but I can't find a working URL for them: [2] (bottom of page). I found the bot block problem (User-agent string). URLs with a trailing ".aspx" it needs to be removed. Pages like this should be treated as dead links. Some URLs don't convert ie. the remote site returns a dead link at the new URL (eg. [3]), thus archive URLs should be added where available. Sometimes, the old format URL works [4] and the new format does not [5]. -- GreenC 16:36, 24 June 2023 (UTC)

@GreenC, for /Search, this are some of the working comparisons:

http://eresources.nlb.gov.sg/newspapers/Digitised/Search?ST=1&AT=search&k=%22barbara%20yu%20ling%22%20albery vs

https://eresources.nlb.gov.sg/newspapers/search?q=%22barbara+yu+ling%22+albery

https://eresources.nlb.gov.sg/newspapers/Digitised/Search?ST=1&AT=search&k=%22chew%20pok%20cheong%22 https://eresources.nlb.gov.sg/newspapers/search?q=%22chew+pok+cheong%22

For https://eresources.nlb.gov.sg/newspapers/digitised/articles/straitstimes20060702-1.2.5.2, this is available within the national library's intranet due to copyright agreement between the library and the publisher. Rather than dead link, we should class them with the 'url-access' parameter set with 'subscription' value, since the 'paywall' is the need to travel down to the library to access the articles via paid by time use terminals. – robertsky (talk) 23:24, 24 June 2023 (UTC)

'subscription' could work for CS1|2 templates but not anything else (eg. bare and square URLs). It's also probable there are URLs that were previously converted by AWB that didn't update the url-status. So on the first pass it will convert /Digitised/Article/ only. Then will need to check all URLs with /digitised/articles/, and if in CS1|2 convert url-status to subscription. -- GreenC 08:21, 25 June 2023 (UTC)

Justanothersgwikieditor: It looks like the remote site has changed, as of this morning. The old format now works: https://eresources.nlb.gov.sg/newspapers/Digitised/Article/straitstimes19331015.2.83 and the new format no longer works: https://eresources.nlb.gov.sg/newspapers/digitised/articles/straitstimes19331015.2.83 .. the site admins are actively changing things. At some point I may need to check all new form URLs and covert them back to the old form. -- GreenC 16:38, 26 June 2023 (UTC)

25 pages had the new form, they have been switched back to the old form eg. Special:Diff/1161768762/1162056617 -- GreenC 19:08, 26 June 2023 (UTC)
102 pages converted old-old form --> old form eg. Special:Diff/1156309139/1162078234 -- GreenC 22:05, 26 June 2023 (UTC)

@GreenC Sheesh. I am sorry for wasting your time. It seems like they did an update to the new display format and then a switcheroo with the urls in live mode, aka reverting to the old url style

There was no new maintenance notice as of yesterday (SGT time).

Thank you for helping to fix the old-old style. Justanothersgwikieditor (talk) 23:56, 26 June 2023 (UTC)

I think we can consider this request closed and if NewspaperSG decides to change again, I will put in a new request. Thanks so much! Justanothersgwikieditor (talk) 01:14, 27 June 2023 (UTC)

Blacklist healthlinedotcom

According to RfC Wikipedia:Reliable_sources/Noticeboard#Healthline:_deprecate_or_blacklist? the domain is to be blacklisted, and discussions were to remove all citations containing the domain. Request for help by bot due to scale made by User:Zefr and User:David Gerard. Every edit by the bot will be manually reviewed. Some errors and subsequent corrections are expected. The domain is in 840 pages. -- GreenC 18:52, 7 July 2023 (UTC)

If the bot can leave a comment tagging that it was Healthline (if that's possible), that would be very helpful afterwards - David Gerard (talk) 19:17, 7 July 2023 (UTC)

In the edit summary, or the citation needed template? For the former it looks like Special:Diff/1163511203/1164069714 .. for the later I formatted it as "citation needed|date= July 2023" .. note the date + the space after the = .. I figured that would be sufficient to disambiguate it from other uses of the template on the page. It's kind of a cryptic but works. Search on "=<space>July 2023". If you prefer more explicit could add "|reason=Healthline". -- GreenC 20:53, 7 July 2023 (UTC)

On more thought, a longer reason is probably a good idea. How about "|reason=WP:healthlinedotcom" which should make it obvious why the cite needed tag exists and why it was done, without cluttering the page too much. It could also redirect here instead. -- GreenC 20:59, 7 July 2023 (UTC)

that would be ideal, "reason=WP:healthlinedotcom" is a searchable flag that the claim itself really needs human inspection. thank you! - David Gerard (talk) 23:05, 7 July 2023 (UTC)

I have worked on a few from GreenC's initial bot work, but can already see this is going to be a long, tedious process of a) reviewing/editing content of the existing passage, b) finding suitable MEDRS-quality sources for what is often soft content (where healthline thrived on Wikipedia), c) leaving the 'cn' in place because there are no good sources readily identified, and d) fighting an edit war with healthline diehards, such as here.

GreenC - it might be best to nuke the healthline sources all at once, and I'll work on your list a few at a time. There are many other matters calling. Zefr (talk) 23:38, 7 July 2023 (UTC)

602 articles containing WP:healthlinedotcom -- GreenC 14:51, 8 July 2023 (UTC)

All gone. I submitted a Blacklist request to SBL: MediaWiki_talk:Spam-blacklist#healthline.com. @Zefr and David Gerard: -- GreenC 14:46, 8 July 2023 (UTC)

thanks, both of you :-) - David Gerard (talk) 15:57, 8 July 2023 (UTC)

GreenC - checking on the total number nuked, you first had 602 today, and 6 more in the "all gone" result. Yesterday, when I checked, there were 850 remaining. I know you have edited 20-30 yourself (with thanks) and I have done a dozen, but where might the other ~ 200 be? Zefr (talk) 16:24, 8 July 2023 (UTC)

Not all the citations removed had a [citation needed] tag added, because they were adjacent to other citations, they don't show up in the All Gone search. Or things like this Special:Diff/1163225910/1164248686. And the first 30 didn't use the WP:healthlinedotcom reason. The number of pages edited by the bot is 823, the rest were manual edits. -- GreenC 16:27, 8 July 2023 (UTC)

According to Healthline they also own Medical News Today (548 pages), DiabetesMine (11), and MediLexicon (160). They also own Greatist and Psych Central. -- GreenC 16:55, 8 July 2023 (UTC)

Each is a spam site giving opportunities to both advertise and link to other spam published under Red Ventures (parent of healthline and medical news today et al., discussed in the RSN evaluation on healthline), and described as "intent-based media — a term for specialist sites that attract people who are already looking to spend money in a particular area (travel, tech, health) and guide them to their purchases, while taking a cut." In other words, having these Red Venture sites as sources on Wikipedia enables further spam-spread, nonsense promotion, and commercialization. They should all be blacklisted.

David Gerard - as the RFC on healthline discussed this, would admin allow a fast-track to blacklisting for all Red Ventures sites? Zefr (talk) 17:18, 8 July 2023 (UTC)

I'm not totally convinced https://greatist.com is a spam site. Their policy page is pretty good. They were purchased by Red, started out independently. The others I have not looked at. Maybe it's best to filter them through RSN first. Some may be OK, others not, or prior to Red's purchase etc. I'd be more comfortable with more eyes on this before we delete references and ban them entirely. -- GreenC 20:57, 8 July 2023 (UTC)

Caution is ok. We have enough to do to purge healthline! The greatist.com is the same MO as the others: non-expert author + non-expert "medical reviewer" + right display promotion of other Red Venture articles + prominent subscribe bar + "best list", commercial promotion and spam within the article, as for CBD = BS. Should not be on Wikipedia, but there are only 35 insource hits. Zefr (talk) 21:37, 8 July 2023 (UTC)

The Greatist policy page explicitly says that they're run by Healthline Media, so they should be blacklisted as well. JoelleJay (talk) 23:24, 8 July 2023 (UTC)

GreenC, David Gerard and JoelleJay - removal of Healthline source notices is complete.

As noted in the discussion above, Healthline Media's brands are pervasive in public health-related content and continue to be cited on Wikipedia (MedicalNewsToday - the largest - is in 901 articles). All Healthline Media brands have advertising pitches to 3rd parties and promotion of the parent company, indicating the concerns that led to blacklisting Healthline remain in its other brands. Thoughts on blacklisting other Healthline brands? Zefr (talk) 23:20, 20 September 2023 (UTC)

"Healthline" is generally considered as reliable source, most articles on that website is based on other studies and researches. It has covered articles on way more things than most other websites that are generally considered reliable health website. Some articles on that website are actually unreliable but still It should be considered as generally reliable source here. It is definitely not a spam site. It is relatively better than its rest branches like "Psych Central", "Medical news today", "Greatist" etc. I think it should be considered as generally reliable or atleast not unreliable source. In fact most other health websites that are generally considered reliable are worse than it, I think more consideration should be done Polarbear678 (talk) 15:00, 9 July 2023 (UTC)

I have no opinion either way, this isn't the right forum to resolve that question. The RfC Wikipedia:Reliable_sources/Noticeboard#Healthline:_deprecate_or_blacklist? was open for over a month and is closed now. The references have been removed, it would be very difficult to restore them. Very difficult to reverse the RfC results. -- GreenC 16:21, 9 July 2023 (UTC)

Samsung

samsungusanews.com is a usurped link i found, please fix the citations, thanks Notrealname1234 (talk) 22:21, 8 July 2023 (UTC)

Another link is radiosyndicationtalk.com, it is usurped, please fix the citations. Notrealname1234 (talk) 22:55, 8 July 2023 (UTC)

These are both WP:JUDI spam. I added them to batch for future processing (Special:Diff/1159155008/1164372081). samsungusanews.com only has 3 pages so I manually fixed those. If you find for more JUDI spam that would be greatly appreciated, it's everywhere it seems and always adding new ones. -- GreenC 00:34, 9 July 2023 (UTC)

archive.fiba.com

The http://archive.fiba.com site changed URL format from http to https. So all those http-links are already dead and they need to be rewritten to https://archive.fiba.com. Thanks. Maiō T. (talk) 20:30, 9 July 2023 (UTC)

2,805 pages. Working .. -- GreenC 21:25, 9 July 2023 (UTC)

Oh, I didn't think it would be that bad. Can you do it? Maiō T. (talk) 09:12, 10 July 2023 (UTC)

Yes it's running.. it's taking a while because the site has bot blockers and I am verifying every URL still works (around 12,000) so anyway goes slow. -- GreenC 12:36, 10 July 2023 (UTC)

User:Maiō T.: It's done. -- GreenC 04:10, 12 July 2023 (UTC)

Unbelievable... Thank you very much! Maiō T. (talk) 08:49, 12 July 2023 (UTC)

You are welcome. I started a feature request for IABot to change http->https on demand, no idea if it will ever get implemented.[6] -- GreenC 13:57, 12 July 2023 (UTC)

Teledramaturgia

Please replace "teledramaturgia.com.br" links ending with ".asp", www.teledramaturgia.com.br/tele/uga.asp for example, the links no longer work. Notrealname1234 (talk) 23:34, 15 July 2023 (UTC)

The example changed to http://www.teledramaturgia.com.br/tele/uga which now redirects to http://teledramaturgia.com.br/uga-uga/ .. need to check for a redirect at the source URL with .asp removed. -- GreenC 02:12, 16 July 2023 (UTC)

I tried with teledramaturgia.com.br/tele/dono.asp by removing ".asp" but it didnt work. Notrealname1234 (talk) 14:45, 16 July 2023 (UTC)

It will try that first, if it doesn't work, add archive-url/dead link -- GreenC 15:14, 16 July 2023 (UTC)

Did it work? @GreenC Notrealname1234 (talk) 16:39, 17 July 2023 (UTC)

Done. It edited 109 pages, added 88 archive links, converted 70 links by removing .asp -- GreenC 16:43, 19 July 2023 (UTC)

worldaerodata.com

The domain has been usurped and now delivers malware/phishing/spam instead of useful content for all URLs. Count Count (talk) 09:45, 19 July 2023 (UTC)

Count Count: Added to WP:JUDI (Special:Diff/1164928221/1166116515) for the next run of WaybackMedic to usurp domains. -- GreenC 13:01, 19 July 2023 (UTC)

Facebook

facebook.com/photo.php urls no longer work, you can check every link here and it will not work, please fix these links Notrealname1234 (talk) 21:28, 25 July 2023 (UTC)

While I can confirm some of the URLs are broken most seem to be working. In the first 10 results that appeared for me 4 are broken, with the remaining 6 loading as expected. There's no difference in behaviour when I'm logged in versus logged out of Facebook, except that two of the errors are replaced with login screens. Sideswipe9th (talk) 21:34, 25 July 2023 (UTC)

i am logged in to facebook, it just shows me a error "this content is not available" Notrealname1234 (talk) 21:43, 25 July 2023 (UTC)

Could you give some example URLs that are broken for you? From the first ten results, this one linked on ENGERERcycrus' user talk page is broken, and this one linked on File:The Decemberists - The King Is Dead.jpg is fine. Sideswipe9th (talk) 21:45, 25 July 2023 (UTC)

The links are in 2,403 pages. Some probably work and some don't, the page needs to be scraped for keywords because the header's return 200 (soft-404s). I might run into bot blockers when at volume, Facebook.-- GreenC 21:50, 25 July 2023 (UTC)

From the small sample I checked, now up to the first twenty results, around 70ish percent were still live. I suspect what's happened here is just that the original files have either been deleted or made private on Facebook by the original uploader. There doesn't seem to be a URL schema change, and some of these like the broken one I linked above were added ten or more years ago. Unless there's some A/B testing on Facebook's end, where Notrealname1234 is being directed to a different version of the site which might in the future become the live version, I don't think there's really anything to do here bot wise. Sideswipe9th (talk) 21:56, 25 July 2023 (UTC)

At the 13th link, for the brazilian version (still the same link) of facebook, it says this: "Este conteúdo não está disponível no momento" @GreenC @Sideswipe9th Notrealname1234 (talk) 00:39, 26 July 2023 (UTC)

and the 13th link is this https://www.facebook.com/photo.php?fbid=195469590484313&set=a.180687601962512.39754.180681685296437&type=1 Notrealname1234 (talk) 00:43, 26 July 2023 (UTC)

This link https://www.facebook.com/photo.php?fbid=461539374654&set=a.56890239654.63538.5229624654 you provided does work. Notrealname1234 (talk) 00:44, 26 July 2023 (UTC)

If you check the user talk page where that URL is linked, it seems it was added in May 2011. I suspect what has happened here is that the original uploader of the image has deleted it sometime in the last twelve years. Sideswipe9th (talk) 03:46, 26 July 2023 (UTC)

Still, most links do not work and needs to be fixed. Notrealname1234 (talk) 20:27, 27 July 2023 (UTC)

I think if a page scrape finds "The link may be broken, or the page may have been removed. Check to see if the link you're trying to open is correct" it should be converted to an archive URL. Example. There might be other variations of a dead-link landing page. Any help identifying other versions appreciated. -- GreenC 01:57, 29 July 2023 (UTC)

there is also another dead link landing page here: https://www.facebook.com/photo.php?fbid=1986142294321&id=1266183626&notif_t=photo_comment&refid=0#!/pages/I-LOVE-MOCHUDI-CENTRE-CHIEFS/83747398194/ Notrealname1234 (talk) 22:25, 31 July 2023 (UTC)

Thank you. -- GreenC 01:50, 1 August 2023 (UTC)

I've looked through the first 100 links in the search, and the "This page isn't available" and "Sorry, this content isn't available at this time" seem to be the only variations I've come across. The error message is localised though, the link that Notrealname1234 provided here appears in English not Brazilian for me. I think it might be IP geolocation based, as the localisation I get whenever I'm not logged in is always English. Sideswipe9th (talk) 01:58, 1 August 2023 (UTC)

Thanks for checking. -- GreenC 03:33, 1 August 2023 (UTC)

Sideswipe9th & Notrealname1234: When logged into Facebook a URL takes me to one page. When not logged in it goes to another. Example: https://www.facebook.com/photo.php?v=1405623292982594&set=vb.108930319155814&type=2&permPage=1IAAF .. it works when not logged in, does not work when logged in. The bot runs in a 'not logged in' mode, and initial testing shows all links work, that otherwise don't work when logged in. I don't know what to make of it other than Facebook is a complicated service whose behavior is subjective to the viewer. Is anyone seeing something different? -- GreenC 14:19, 1 August 2023 (UTC)

The URL that Notrealname1234 provided in this comment doesn't appear to work when logged in or out. Of the first ten links from the search above: links 2-4, 6, 9-10 are not broken; links 1 and 5 provide a "This page isn't available" message when logged in, and a login screen when logged out; links 7 and 8 provide a "Sorry, this content isn't available at this time" when logged in or out. I can't seem to find any way to make the broken links work. Sadly I don't have time right now to test any deeper than this.

The link you provided though, provides a "This page isn't available" message when logged in, and redirects to an entirely different URL (https://www.facebook.com/WorldAthletics/videos/1405623292982594/ ) when logged out. Not sure what the reason for this change in behaviour is though, other than as you say Facebook being a complicated service with subjective behaviour. Sideswipe9th (talk) 15:18, 1 August 2023 (UTC)

Sideswipe9th & Notrealname1234: Sorry, I am abandoning this project. There is too much weird stuff happening. Links work in a browser (when not logged in) but don't work when checking via bot. Every single link I check in the first 100 articles has the same result. This is probably bot protection by Facebook to prevent mass scraping of content. This is what I was afraid of, Facebook is more sophisticated than my tools, which is not surprising! -- GreenC 19:00, 1 August 2023 (UTC)

It's fine. GreenC Notrealname1234 (talk) 19:02, 1 August 2023 (UTC)

Mediastatements.wa.gov.au

As far as I can tell, every single one of the hundreds of links to this archive of Western Australian government media statements going back 30 years or so is now dead (random example which is how I found this: "Health centre renamed in honour of Busselton doctor"). Please get a bot to add archive links to all URLs in this domain. Thanks! Graham87 11:23, 30 July 2023 (UTC)

User:Graham87, it looks like the example page is a soft 404. My bot WP:WAYBACKMEDIC can deal with these after some checking to see where they redirect to and configuring the bot to treat those landing pages as dead. It will check each URL because it's possible some are still working. Based on the results of this discovery, we can then decide to mark the entire domain as dead in the IABot database, for purposes of fixing dead links in 100s of other wikis. -- GreenC 14:30, 30 July 2023 (UTC)

Tɛmplet:Replyto Hmmm, it looks like *some* of them are working, but from an unscientific check, only the ones for 2023 (which I hadn't checked before. They're grouped by the name of the state premier at the time then the year and month so this link search will catch those from early 2017 to Mid-2023 when Mark McGowan was premier. https://www.mediastatements.wa.gov.au/Pages/McGowan/2023/01/King-Neptune-statue-given-heritage-recognition-within-Sun-City-Precinct-.aspx works fine but https://www.mediastatements.wa.gov.au/Pages/McGowan/2022/01/Mandurah-Line-now-open-following-successful-20-day-Shutdown.aspx and https://www.mediastatements.wa.gov.au/Pages/McGowan/2022/12/Premier-unveils-new-team-with-a-focus-on-renewal-and-experience.aspx don't work. (Probably needless to say but my first example was from 1993). Graham87 15:02, 30 July 2023 (UTC)

I should have noted this before but I also started a discussion about this at Wikipedia talk:WikiProject Western Australia#Fair warning: incoming watchlist onslaught. There are some 2022 media statements that were transferred over but as noted there, the URL pattern is not *quite* amenable to a pattern-based change. As I said there the earliest I could get to was https://www.wa.gov.au/government/media-statements/McGowan-Labor-Government/Support-for-young-offenders-to-turn-their-lives-around-20220731. Graham87 15:13, 30 July 2023 (UTC)

User:Graham87: Ugh it's probably one of those sites with a really short expire time which means it would need constant checking for dead status, but since they are soft 404s the standard tools don't work. It might be better to mark the entire domain as permanent dead in IABot. Unless there was a way to find them at www.wa.gov.au/government/media-statements but I think converting to archive URL via IABot is going to be best solution long term in particular for all wikis not only enwiki. -- GreenC 15:17, 30 July 2023 (UTC)

I can see statements back to 1991 here, and as far as I can tell, all have been migrated to the new site. I don't know what it's like for you on a screen reader, but on my screen, there is a box on the right where I can select which "Administration" which allows me to select different premiers. Steelkamp (talk) 15:20, 30 July 2023 (UTC)

Oh wow I completely missed that box (which does work but it's fiddly ... but I never really got to know the old site and always navigated it using Google searches ... which can't be done now). For example, this one from 2008 still works. Hmmm ... Graham87 15:33, 30 July 2023 (UTC)

FWIW I ran IABot on the domain, after setting it to status "permadead" ie. treat all instances as a dead link. It added archives for most of them. Some are actually still alive but they'll be dead soon enough. For the parallel version at www.wa.gov.au if/when someone figures out how to map between the sites I can update the links. -- GreenC 13:42, 1 August 2023 (UTC)

ARA News - aranews.(net|org)

Before it had news, now aranews.net has Indonesian blog-styled advertisements while aranews.org redirects to an app. Count Count (talk) 11:21, 1 August 2023 (UTC)

aranews[.](net|org|com) has 282 pages. -- GreenC 13:32, 1 August 2023 (UTC)

Done. Also updated IABot database, it will propagate to other wikis. -- GreenC 16:17, 1 August 2023 (UTC)

Deleting the query

Looking to have something that delete the queries out of google books. So basically https://books.google.com/text&q=abc&moretext becomes https://books.google.com/text/test&moretext. Note, I'd also like to do the same with dq= . (I think a query can only have one). I've managed to figure out how to do this with Autowikibrowser, but it seems like a *very* large thing to do, so are there bots that can do it? (I'm using AWB to do the ones related to Fraternities and Sororities)Naraht (talk) 21:40, 3 August 2023 (UTC)

I don't have a good-enough understanding of Google Books links to know if removing queries is a good idea (in any, some or all cases). However someone who might be able to help is User:AManWithNoPlan who maintains WP:CITATIONBOT which already does Google Books link maintenance. -- GreenC 21:49, 3 August 2023 (UTC)

Often, the reference should only have the book ID, and no query or page numbers. Secondly, even when the query is present, the URL is wrong, since much of GB is javasccript driven, so the "click me to link" correct URL is often not used, and people use the browser URL. Also, usually the search or page needs removed since only one of them is really what is intended (either: all instances of the word "jones" or explicity page 23, but I used "jones" to find it) - and that requires human intervention and thought. Another problem is that the stuff after the hash # is often the final location, and much of the URL is the path taken to get there. https://github.com/ms609/citation-bot/blob/master/expandFns.php function normalize_google_books() deals with the vq, dq, q both before and after the # sign. It also has to deal with the evil & within quotes. Other oddities, like if article_id is set, then you need #v=onepage to be set. Other than that one oddball, the code trims URLs down so that they have only one search, one book ID, and one page. AManWithNoPlan (talk) 12:42, 4 August 2023 (UTC)

Thank, AMWNP. User:Naraht, it looks like GB URLs are complex and error-prone if the wrong things are removed/changed. I would suggest rely on Citation bot, it can be run for selected articles. -- GreenC 16:48, 4 August 2023 (UTC)

OK. Let me take a look at running Citation Bot on the articles in the WikiProject.Naraht (talk) 16:57, 4 August 2023 (UTC)

drdo.org

Another WP:JUDI case. I have not seen it listed there. Count Count (talk) 16:47, 4 August 2023 (UTC)

Done. Or will be. Thanks, Count Count! -- GreenC 16:50, 4 August 2023 (UTC)

User generated genealogy site

I look to start removing links to gov.genealogy.net, a user generated website, befoee realising it has several thousand uses. Could a bit replace all such uses with {{citation needed}}? -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 09:33, 6 August 2023 (UTC)

ActivelyDisinterested: are there discussions about removing it, like at RSN or somewhere? I want to link to something in the edit summary. Otherwise it looks like no problem for the bot. -- GreenC 12:58, 6 August 2023 (UTC)

Sorry I thought there was more, but checking the archives shows I was mistaken. I'll start a new discussion and come back to this later. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 13:17, 6 August 2023 (UTC)

In case you want to delete them all, there are 1,934 pages with gov.genealogy.net and 2,116 with *.genealogy.net -- GreenC 00:16, 7 August 2023 (UTC)

The RSN discussion closed with the outcome of waiting until November, when it will be discussed again. -- GreenC 21:59, 8 August 2023 (UTC)

Canadian Indigenous profile links

Links to http://fnp-ppn.aandc-aadnc.gc.ca say there's no such site; but if you replace "aandc-aadnc" with "aadnc-aandc" it works. To avoid changing archive links I would appreciate if someone would change "url=http://fnp-ppn.aandc-aadnc.gc.ca" to "url=http://fnp-ppn.aadnc-aandc.gc.ca" (548 pages) and also changing "[http://fnp-ppn.aandc-aadnc.gc.ca" to "http://fnp-ppn.aadnc-aandc.gc.ca" (632 pages). See my last few edits for examples.

(Background info AFAIK: These are pages provided by the Canadian federal government about indigenous nations/bands/groups. Depending on which province the headquarters of the department is in, the French or English acronym comes first; maybe the headquarters moved. Now the department is no longer called AANDC anyway; it's CIRNAC and most of its web pages are now at rcaanc-cirnac.gc.ca but this one for example https://www.rcaanc-cirnac.gc.ca/eng/1100100013791/1535470872302 links to fnp-ppn.aadnc-aandc.gc.ca to provide these pages.) ☺Coppertwig (talk) 22:52, 11 August 2023 (UTC)

Coppertwig: Done: Special:Contributions/GreenC_bot. -- GreenC 04:52, 12 August 2023 (UTC)

Wow! Thanks!! ☺Coppertwig (talk) 13:52, 12 August 2023 (UTC)

LOL! GreenC, I see you fixed a significant typo in my request. I thought I had read it over carefully. Anyway, you knew what I meant. Thanks again! ☺Coppertwig (talk) 14:49, 12 August 2023 (UTC)

You are welcome. I found this change to be unusually easy to make mistakes with, I made a few during the programming process. -- GreenC 15:21, 12 August 2023 (UTC)

Coppertwig, any ideas what new URLs there might be for these?

https://pse5-esd5.ainc-inac.gc.ca/fnp/Main/Search/FNMain.aspx?BAND_NUMBER=564&lang=eng (391)

-- GreenC 15:21, 12 August 2023 (UTC)

Maybe simply replace the domain https://pse5-esd5.ainc-inac.gc.ca --> http://fnp-ppn.aadnc-aandc.gc.ca ? -- GreenC 15:28, 12 August 2023 (UTC)

Yes I think this is the case, the site moved: https://web.archive.org/web/20170810144106/http://pse5-esd5.ainc-inac.gc.ca/fnp/Main/Search/FNMain.aspx?BAND_NUMBER=564&lang=eng .. I'll update these 391 pages -- GreenC 16:14, 12 August 2023 (UTC)

Looks right. ☺Coppertwig (talk) 17:25, 12 August 2023 (UTC)

They're all converted (except 1 dead link). I also standardized the metadata so it's |website=Crown–Indigenous Relations and Northern Affairs Canada and |publisher=Government of Canada, and converted everything to https. I could also work on the title field with web scraping but I think this is enough for now. Government links are a mess, each new administration might redo the last admin, and not be so polite to maintain redirects. -- GreenC 20:45, 12 August 2023 (UTC)

Thanks!! I did some limited regex searches looking for BAND_NUMBER and didn't find any other old url's for the same thing, though my searches wouldn't have found everything. You've done way more than I asked for; great work. ☺Coppertwig (talk) 21:46, 12 August 2023 (UTC)

smmercury.com and asianrehub.com

Both serve online gambling spam now (WP:JUDI). There are only three links for asianrehub but it should be marked permadead in IABot in any case. --Count Count (talk) 19:16, 23 August 2023 (UTC)

Added. Thank you! -- GreenC 04:00, 26 August 2023 (UTC)

UOL (Natelinha)

natelinha.ne10.uol.com.br is a dead link, if you replace the link with natelinha.uol.com.br, it will work, there are over 91 pages with this link. (see http://natelinha.ne10.uol.com.br/noticias/2012/08/31/a-grande-familia-e-confirmada-na-grade-da-globo-para-2013-145754.php, it does not work, but if you remove the ".ne10" like this http://natelinha.uol.com.br/noticias/2012/08/31/a-grande-familia-e-confirmada-na-grade-da-globo-para-2013-145754.php, it will work.) Notrealname1234 (talk) 00:38, 26 August 2023 (UTC)

Notrealname1234. Done. Example. In 7 pages an archive URL was added example. -- GreenC 04:49, 26 August 2023 (UTC)

thisisjersey.com

Apologies if this has already been done, but Tɛmplet:Linksummary a formerly reliable source, has been hijacked and is now a gambling site. The former articles are in the Wayback Machine, so I believe a bot can rescue them? — Trey Maturin™ 11:43, 28 August 2023 (UTC)

User:Trey Maturin: Added to WP:JUDI .. this is part of a larger problem and will get processed in batches with other domains. If you find any more, please add them to JUDI, or post here, there are likely others, thank you! -- GreenC 15:32, 29 August 2023 (UTC)

Ooh, that's useful to know! Bookmarked. Thank you, GreenC! — Trey Maturin™ 16:31, 29 August 2023 (UTC)

Film Companion

Many old links don't redirect to their new versions. Like this doesn't take us to this. Kailash29792 (talk) 09:28, 29 August 2023 (UTC)

Is the correct new URL this? I found it by going to the wayback machine for the old link, and found an old redirect there going to this new link. What happens is the website creates a redirect at first, then fails to maintain it over time the old page turns into a 404. The WaybackMachine has a record of the deleted redirect, but retrieving it is another matter. I'm investigating. GreenC 15:28, 29 August 2023 (UTC)

This retrieves the redirect via the Wayback Machine:

curl -ILs 'http://web.archive.org/web/2id_/https://www.filmcompanion.in/madhumati-dil-tadap-tadap-ke-keh-raha-hai-song-inspired-by-18-century' | awk '/^[ ]*[Ll]ocation:/{sub("^[ ]*[Ll]ocation:[ ]*https?://web[.]archive[.]org/web/[0-9]{14}id_/", "", $0); a[++i]=$0}END{print a[i]}'

Output: https://www.filmcompanion.in/music/madhumatis-dil-tadap-tadap-ke-kah-raha-was-inspired-by-an-18th-century-song/

-- GreenC 00:58, 30 August 2023 (UTC)

Yes. Perhaps this was an erroneous title that the site later fixed, although Wayback had already archived it by then. Kailash29792 (talk) 08:30, 30 August 2023 (UTC)

User:Kailash29792: it has been processed. Example. Note in this example the source URL https://www.filmcompanion.in/mami-2018-soni-director-ivan-ayr-interview is a hard 404 with no redirect info. I was able to find an old now-deleted redirect in the Wayback Machine which pointed to a working page [7]. This is nifty and took a while to figure out. Now I know how to do it, and can reapply it to other domains in the future, that have deleted redirects. -- GreenC 15:59, 2 September 2023 (UTC)

Kailash29792

support this url Chaya20 (talk) 01:04, 1 September 2023 (UTC)

/* Kailash29792 */ Reply Chaya20 (talk) 01:05, 1 September 2023 (UTC)

www.fiba.com

The www.fiba.com page can no longer be opened. Unfortunately, it is used in thousands of articles. Sometimes renaming to https://www.fiba.basketball helps (for example here), but in most cases the site is dead. Can something be done about it? Maiō T. (talk) 19:25, 1 September 2023 (UTC)

Fiba.com has at least 12,000 mainspace links, in a few thousand articles. Most of them are archive.fiba.com and they are working. OTOH www.fiba.com there is a https error. Example says "SEC_ERROR_EXPIRED_CERTIFICATE". They need to pay to renew their certificate. They probably will. Do you know how long it's been down? -- GreenC 04:39, 2 September 2023 (UTC)

I don't know. Some two or three weeks ago I clicked on a www.fiba.com link and was redirected to www.fiba.basketball, so it was relatively fine back then. Now these redirects doesn't work anymore. Maybe it would be a good idea to wait a few days. Maiō T. (talk) 10:03, 2 September 2023 (UTC)

The Wayback Machine has info. It looks like on August 18, 2017, the home page www.fiba.com began redirecting to www.fiba.basketball (it may have been August 17, but there were no snapshots for that day). This could explain why the SSL cert expired, they abandoned the domain a long time ago and no longer maintain it. Now, if we look at a deep link: https://www.fiba.com/pages/eng/fa/statistics/p/sid/2271/_/1950_European_Championship_for_Women/player-leaders.html and change the www.fiba.com to archive.fiba.com .. https://archive.fiba.com/pages/eng/fa/statistics/p/sid/2271/_/1950_European_Championship_for_Women/player-leaders.html .. it works! So trying either archive.fiba.com or www.fiba.basketball. If they don't work, add an archive URL. I can do this. -- GreenC 15:50, 2 September 2023 (UTC)

Wow GreenC, you're a genius! Thank you in advance! Maiō T. (talk) 18:18, 2 September 2023 (UTC)

Update what was done today/tonight:

It ran in 3,024 articles and processed (unknown) number of links estimate around 20,000
It ran in two passes, the first for archive.fiba.com and second for www.fiba.basketball. Example pass 1 and pass 2. It's messy but works out: combined diff.
Generally for every link "www.fiba.com" the result will be one of: 1. link remains with a {{dead link}}; 2. link remains converted to an archive URL; 3. link is converted to archive.fiba.com ; 4. link converted to www.fiba.basketball which might be a new URL due to redirects. For 3 & 4, pre-existing archive URLs are either removed (bare and square links), or converted to url-status=live
Many of the diffs are dense with changes eg. 2016 in sports. If you see any problems, let me know, I can go back and try to repair it. It's easier looking at the final combined diff. There are so many permutations of link changes, and so many links in a page, it can create some spectacular diffs.
Some links don't do anything eg. https://www.fiba.basketball/calendar is only good for future dates, but is often cited for back dates. There are 27 links. They fail Verification and technically the entire cite should be deleted.
Template: and File: were processed (100 total).
There are other types of FIBA links like "alha.www.fiba.com" which were not processed. More work should be done to discover these edge cases, and what to do with them.

-- GreenC 06:06, 4 September 2023 (UTC)

Thank you very much! It's amazing what you have done. Maiō T. (talk) 10:15, 4 September 2023 (UTC)

Found another 70 domains *.fiba.com, of which 68 are dead and replaced with archive URLs Example. -- GreenC 04:48, 5 September 2023 (UTC)

Call sign history for U.S. radio stations

Any URL starting with http://licensing.fcc.gov/cgi-bin/ws.exe/prod/cdbs/pubacc/prod/call_hist.pl? needs to be changed to https://licensing.fcc.gov/cgi-bin/ws.exe/prod/cdbs/pubacc/prod/call_hist.pl? — Vchimpanzee • talk • contributions • 15:28, 2 September 2023 (UTC)

Frankly, any url starting with http://licensing.fcc.gov should be converted to https://licensing.fcc.gov, as all links to the HTTP version of that domain return a 403. According to my quick research, we're looking at about 3,400 instances of HTTP links to that domain. Phuzion (talk) 18:06, 5 September 2023 (UTC)

Yes I'll take care of it and check https etc.. any other problems like redirects and soft-404s, each URL will be verified is working not assume they all work they rarely all do after a migration. -- GreenC 21:28, 5 September 2023 (UTC)

Thanks to both of you. Yes, I should have realized the problem might be more extensive.— Vchimpanzee • talk • contributions • 16:56, 14 September 2023 (UTC)

User:Vchimpanzee - the FCC fixed the 403 error as the http link now redirects to https. We could in theory change all fcc.gov subdomains to https but it's probably redundant. Below is a list of all domains in use on Wikipedia. -- GreenC 14:13, 16 September 2023 (UTC)

Tɛmplet:Collapse begin

Tɛmplet:Collapse end

Andhimazhai

Many old links such as this don't redirect to the new versions like this. Kailash29792 (talk) 09:24, 11 September 2023 (UTC)

User:Kailash29792: This must have happened recently because the WaybackMachine has a snapshot at the old URL in January [8] and the new URL in May [9]. There is no redirect info in headers or the WaybackMachine. The options would be to convert them to archive URLs, or, wait and hope and they add redirects eventually. If the later happens, my bot is capable of rolling back archive URLs and adding the new URL. I suggest adding archive URLs for now. What do you think? -- GreenC 14:28, 16 September 2023 (UTC)

Even better might be to tag existing links as dead. Am travelling right now, so I can't add the archive links. Kailash29792 (talk) 14:43, 16 September 2023 (UTC)

OK I'll add archive URLs. I don't know if they are all dead so will check them individually. -- GreenC 15:10, 16 September 2023 (UTC)

It's done 21 pages edited, they were all soft-404s pointing back to the home page. -- GreenC 18:18, 16 September 2023 (UTC)

emertainmentmonthly.com

This domain now leads to a spam site. Referenced on many pages. https://en.m.wikipedia.org/wiki/Special:LinkSearch?target=Http%3A%2F%2Femertainmentmonthly.com

Looks like these references should be updated to point to https://emertainmentmonthly.org/ ?

I was able to find reference #4 from Bernard Cornwell on the new site (below). URL and content seem to match the archive.org record, aside from the extension of course.

https://emertainmentmonthly.org/2014/01/31/bernard-cornwell-talks-the-pagan-lord-the-challenges-of-historical-fiction-and-future-plans/

Baunno (talk) 12:23, 16 September 2023 (UTC)

Ok thank you for the report. Converting to .org and checking the new link works otherwise converting to archive URL. I will work on this. There are 61 pages. -- GreenC 14:33, 16 September 2023 (UTC)

It's done. If there was an archive URL there, it left it in place but flipped the status to live, Example. A couple didn't convert because the new URL is a dead link. -- GreenC 18:36, 16 September 2023 (UTC)

Thank you! Baunno (talk) 15:56, 17 September 2023 (UTC)

USA Basketball

Hello. I was wondering if the broken URLS for USA Basketball (usab.com) could be fixed. These include:

link at Ashley Houts. It can't be replaced with one of the links here as the event is no longer held.
Multiple articles at United States women's national under-19 basketball team such as this link It's no longer there in the news section.

I was also wondering if the archive.usab.com and usabasketball.com could be checked as they're broken as well. These two domains might have already been checked for archives, but I'd like to double check. Altogether, these total up to 3,000+ links.

Thank you! MrLinkinPark333 (talk) 17:31, 16 September 2023 (UTC)

Yes I see usab.com has been excluded from InternetArchiveBot. Same for about half of usabasketball.com .. so they are not being maintained. I'll go through them it will take some time. Everything I spot checked is dead, with long timeout response. Between this and FIBA above, I wonder what is happening in the basketball world. -- GreenC 18:53, 16 September 2023 (UTC)

MrLinkinPark333: The bot checked each URL in all sub-domains for usab.com and usabasketball.com -- it edited about 900 pages including Template: and File:, added about 1,200 new archive lines, flipped about 400 |url-status=live to dead - it also updated the IABot database (for each URL) so the results will propagate to 100s of other wikis (Example). -- GreenC 05:17, 17 September 2023 (UTC)

Happened to notice http://basketball.teamusa.org is also soft-404ing. 4 pages only. But teamusa.org has over 3,500 pages and spot checking there are many dead pages. I'll process this as a separate project in a new section. -- GreenC 05:39, 17 September 2023 (UTC)

Thank you for the quick response! MrLinkinPark333 (talk) 15:21, 17 September 2023 (UTC)

teamusa.org

Over 3,500 pages with many unfixed dead links of various types. -- GreenC 05:44, 17 September 2023 (UTC)

Edited 3,231 pages. Fixed over 4,000 links. Most were soft-404s. -- GreenC 00:33, 19 September 2023 (UTC)

Historic Hansard

Back in ~2018, the content of hansard.millbanksystems.com (digitised copies of Hansard for the UK Parliament) was transferred to an official site at api.parliament.uk/historic-hansard. The old site remained online, however, and continued to be pretty widely used as references - there are about 7800 links to it. (The new site has around 13k, 10k of which are via templates, changed back in 2018.)

The old site has finally gone offline, possibly forever, and so it's probably a good time to finally change all of these over. I believe the URL patterns are very simple - hansard.millbanksystems.com/... becomes api.parliament.uk/historic-hansard/... - which hopefully will make it straightforward. Andrew Gray (talk) 19:52, 19 September 2023 (UTC)

OK. I'll check each one to make sure it exists at the new site it's common for admins to miss some during migrations. If it doesn't exist it will add an archive URL. I'll also check for soft-404s (redirects to home pages etc). The 10k in templates is scary because there is no easy way to check them for link rot without special code for parsing the template. But that's a general problem with the thousands of custom URL templates, which create huge link rot problems over time. -- GreenC 20:09, 19 September 2023 (UTC)

@GreenC amazing, thankyou! I'm in touch with the maintainers at Parliament so happy to poke them about any pages which don't exist on the new site, if you do spot any.

To clarify, the 10k in templates have been switched to the new site for about five years now - I don't think there have been any reported issues arising from it. Andrew Gray (talk) 21:10, 19 September 2023 (UTC)

We'll see how clean the migration is to api.parliament.uk - if there are enough errors it's a signal there might be other problems. Templates can hide natural entropy. But some sites can surprise and are well maintained. If we can help them find problems via your contacts all the better. -- GreenC 22:20, 19 September 2023 (UTC)