diff --git a/config/_default/menus.yaml b/config/_default/menus.yaml
index e33bc7d2..5937e87c 100644
--- a/config/_default/menus.yaml
+++ b/config/_default/menus.yaml
@@ -19,6 +19,10 @@ main:
title: Un texte d’intention sur la démarche du site
pageRef: /manifeste/
parent: Accueil
+ - name: Liens morts
+ title: Rapport sur les liens morts du site
+ pageRef: /liens-morts/
+ parent: Accueil
- name: Revue de presse
title: "Mes apparitions sur le web"
pageRef: /revue-de-presse/
diff --git a/content/liens-morts/index.md b/content/liens-morts/index.md
new file mode 100644
index 00000000..b476a24c
--- /dev/null
+++ b/content/liens-morts/index.md
@@ -0,0 +1,99 @@
+---
+title: Liens morts
+---
+
+Il est inévitable, surtout quand on crée beaucoup de liens, de se retrouver confronter à des liens morts.
+C'est ce que l'on appelle le [_link-rot_](https://en.wikipedia.org/wiki/Link_rot).
+
+Par soucis de transparence, pour faciliter le suivi des liens morts et pour inciter mes éventuels lecteurs vers lesquels j'ai créé un lien devenu mort à [m'indiquer](/contact/) comment le corriger, je présente ici une page générée automatiquement, contenant le rapport des liens morts détectés sur mon site.
+
+Je m'efforce d'automatiser le processus de détection de ces liens morts, autant pour les liens internes à mon site que pour les liens externes.
+S'il est parfaitement légitime de me tenir pour responsable de la vivacité de mes propres liens internes, **personne ne peut me rendre responsable des liens externes**.
+Ce n'est pas mon travail.
+Je n'ai aucune obligation de maintenir un outil de vérification et la transparence des résultats.
+Je le fais par plaisir du travail bien fait et par respect pour mes visiteurs, mais je n'ai aucune emprise sur les nombreux facteurs externes déterminant si un lien est accessible ou non par mon outil.
+
+## Méthodologie
+
+J'ai créé un script exploitant [cURL](https://curl.se/docs/) avec les paramètres suivants :
+
+```javascript
+const args = [
+ "--silent",
+ "--location",
+ "--fail",
+ "--max-time",
+ `${REQUEST_TIMEOUT_SECONDS}`,
+ "--output",
+ "/dev/null",
+ "--write-out",
+ "%{http_code}",
+ "--user-agent",
+ DEFAULT_USER_AGENT,
+ "--request",
+ method,
+ url,
+];
+```
+
+`DEFAULT_USER_AGENT` est un UA valide et régulièrement mis à jour.
+Je fais une première requête avec la méthode [`HEAD`](https://developer.mozilla.org/fr/docs/Web/HTTP/Reference/Methods/HEAD), et si cette requête échoue, j'en envoie une autre avec la méthode `GET`, après un délais de 5s.
+
+Trois cas de figure se présentent à ce stade.
+
+### Code HTTP entre 200 et 400
+
+Mon outil considère systématiquement qu'un code HTTP supérieur à 200 et strictement inférieur à 400 est une page accessible.
+
+Cela peut générer des faux positifs (des pages considérées comme accessibles, mais qui ne le sont pas), notamment dans les cas suivants :
+
+- Si le site affiche une page d'erreur sans relayer le code HTTP correspondant à l'erreur
+- L'URL est conservée pour un contenu totalement différent de la page originale
+
+Lorsque je constate qu'un URL retourne un code strictement inférieur à 400, il n'est pas re-testé avant 1 mois.
+
+### Code HTTP entre 400 et 499
+
+Toute réponse avec un code HTTP compris entre 400 et 499 est considérée comme une erreur, dans le respect de la [RFC 7231](https://datatracker.ietf.org/doc/html/rfc7231).
+
+Cela génère de nombreux faux négatifs (des pages considérées comme inaccessibles alors qu'elles le sont), symptomatiques d'une volonté de blocage des techniques de navigation automatisée, ou d'un problème de paramétrage de mon outil.
+
+Par construction, par honnêteté intellectuelle et par bienveillance, mon outil est développé de manière à ne pas être intrusif.
+Son "paramétrage" permettrait en théorie d'exploiter des techniques plus agressives afin de limiter ces faux négatifs.
+J'ai fait le choix délibéré de ne pas rendre mon outil plus agressif, et de marquer tout lien retournant un code supérieur ou égal à 400 comme étant inaccessible, peu importe la raison réelle.
+
+Je considère que ne pas respecter la RFC 7231 est une pratique destructive.
+Donc les serveurs qui répondent avec un code inapproprié doivent être marqués comme étant inaccessibles.
+
+Le problème ici est que, si l'on retourne une erreur 403 pour un contenu qui existe réellement, sous prétexte que la navigation ne s'est pas faite avec un navigateur "traditionnel", il n'est pas possible pour moi de savoir si la page a été déplacée, si j'ai commis une erreur dans le copier-coller de l'URL, ou si j'ai accédé à un URL protégé par un mot de passe (un exemple de motif légitime d'utilisation de l'erreur 403).
+
+Il existe trop de ces cas de figure pour que j'accepte de prendre le temps de les identifier manuellement.
+
+Les requêtes ayant abouti à un code HTTP compris entre 400 et 499 ne sont pas réitérées avant 1 semaine.
+
+### Code HTTP supérieur ou égal à 500
+
+Les requêtes ayant abouti à un code HTTP supérieur ou égal à 500 ne sont pas réitérées avant 1 jour : ces erreurs sont censées être légitimes, transitoires et promptement corrigées.
+
+J'ai néanmoins identifié que certains serveurs répondent à un navigateur automatisé avec une erreur 500.
+Je refuse de constituer et de maintenir une liste de ces serveurs.
+
+### Timeout
+
+De nombreux sites ont fait le choix de punir la navigation automatisée en ne répondant tout simplement pas à la requête, en laissant le client "tourner dans le vide".
+Il n'est donc pas possible, pour un script bienveillant, de savoir si le serveur distant bloque la requête ou s'il s'agit d'un problème transitoire.
+
+On pourrait ergoter longtemps sur le bienfondé (ou pas) de cette technique.
+Pour ma part, je considère qu'elle est destructive.
+Donc les serveurs qui ne répondent jamais doivent être marqués comme étant inaccessibles, parce que certains d'entre eux peuvent réellement être temporairement inaccessibles.
+
+Les requêtes ayant abouti à un _timeout_ ne sont pas renouvelées avant 1 semaine.
+
+### Autres cas
+
+Il arrive que cURL me renvoie une erreur HTTP 0 (qui n'existe pas réellement).
+L'examen des journaux détaillés de ces requêtes m'apprend qu'en général (mais pas toujours), le problème est essentiellement lié aux certificats du serveur (obsolescence, nom de domaine qui ne correspond pas, etc.).
+
+Les requêtes aboutissant à un code HTTP 0 ne sont pas renouvelées avant 1 semaine.
+
+## Rapport
diff --git a/package.json b/package.json
index c9f0e5ff..229bae5d 100644
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
{
"scripts": {
- "links:refresh": "node tools/run_link_checks.js"
+ "links:refresh": "node tools/check_external_links.js"
},
"dependencies": {
"postcss-import": "^16.1.0",
diff --git a/themes/42/assets/css/links.css b/themes/42/assets/css/links.css
index e4d0e177..cf66fb35 100644
--- a/themes/42/assets/css/links.css
+++ b/themes/42/assets/css/links.css
@@ -1,9 +1,14 @@
a {
color: var(--color-link);
text-decoration: underline;
+ display: inline-block;
+ max-width: 100%;
+ overflow-wrap: anywhere;
+ word-break: break-word;
+ white-space: normal;
&:hover,
&:focus {
color: var(--color-link-hover);
}
-}
+}
\ No newline at end of file
diff --git a/themes/42/assets/css/table.css b/themes/42/assets/css/table.css
index 4b35f043..5ee2394f 100644
--- a/themes/42/assets/css/table.css
+++ b/themes/42/assets/css/table.css
@@ -44,3 +44,8 @@ tfoot td {
padding: var(--padding-half) var(--padding);
font-style: italic;
}
+
+.table-wrapper {
+ width: 100%;
+ overflow-x: auto;
+}
\ No newline at end of file
diff --git a/themes/42/layouts/_markup/render-link.html b/themes/42/layouts/_markup/render-link.html
index d701c004..e3255dff 100644
--- a/themes/42/layouts/_markup/render-link.html
+++ b/themes/42/layouts/_markup/render-link.html
@@ -7,22 +7,16 @@
{{- $site := $page.Site -}}
{{- $aff := index $site.Data.affiliates.sites $host -}}
{{- $isAffiliated := false -}}
-{{- $scratch := $page.Scratch -}}
-{{- $externalCache := $scratch.Get "externalLinksCache" -}}
-{{- if not $externalCache -}}
- {{- $externalCache = dict -}}
- {{- if fileExists "tools/cache/external_links.yaml" -}}
- {{- with readFile "tools/cache/external_links.yaml" -}}
- {{- $parsedCache := transform.Unmarshal . -}}
- {{- if $parsedCache -}}
- {{- $externalCache = $parsedCache -}}
- {{- end -}}
- {{- end -}}
- {{- end -}}
- {{- $scratch.Set "externalLinksCache" $externalCache -}}
+{{- $report := default (dict) (transform.Unmarshal (readFile "tools/cache/external_links.yaml")) -}}
+{{- $deadList := default (slice) (index $report "links") -}}
+{{- $entriesMap := default (dict) (index $report "entries") -}}
+{{- $cacheEntry := index $entriesMap .Destination -}}
+{{- $deadInfo := dict -}}
+{{- $isDeadLink := false -}}
+{{- with (first 1 (where $deadList "url" .Destination)) -}}
+ {{- $deadInfo = index . 0 -}}
+ {{- $isDeadLink = true -}}
{{- end -}}
-{{- $cacheEntry := index $externalCache .Destination -}}
-{{- $isDeadLink := and $cacheEntry (eq (index $cacheEntry "manually_killed") true) -}}
{{- $newURL := .Destination -}}
{{- if and $isExternal $aff -}}
{{- $param := $aff.param -}}
@@ -34,12 +28,7 @@
{{- $newURL = printf "%s://%s%s?%s=%s" $parsed.Scheme $host $path $param $value -}}
{{- end -}}
{{- end -}}
-{{- $titlePrefix := "" -}}
-{{- if $isAffiliated -}}
-{{- $titlePrefix = "Lien affilié" -}}
-{{- else if $isExternal -}}
-{{- $titlePrefix = "Lien externe" -}}
-{{- end -}}
+{{- $titlePrefix := cond $isAffiliated "Lien affilié" (cond $isExternal "Lien externe" "") -}}
{{- $classes := slice -}}
{{- if $isExternal -}}
{{- $classes = $classes | append "external" -}}
@@ -50,9 +39,32 @@
{{- if $isDeadLink -}}
{{- $classes = $classes | append "dead" -}}
{{- end -}}
-
{{- .Text | safeHTML -}}
-{{- /* */ -}}
\ No newline at end of file
+{{- /* */ -}}
diff --git a/themes/42/layouts/_partials/liens-morts/report.html b/themes/42/layouts/_partials/liens-morts/report.html
new file mode 100644
index 00000000..2165b186
--- /dev/null
+++ b/themes/42/layouts/_partials/liens-morts/report.html
@@ -0,0 +1,82 @@
+{{- $defaultReportPath := "tools/cache/external_links.yaml" -}}
+{{- $reportPath := default $defaultReportPath .ReportPath -}}
+{{- $report := default (dict) .Report -}}
+{{- if or (eq (len $report) 0) (not (isset $report "links")) -}}
+ {{- if fileExists $reportPath -}}
+ {{- with readFile $reportPath -}}
+ {{- $report = . | unmarshal -}}
+ {{- end -}}
+ {{- else -}}
+ {{- warnf "Rapport des liens morts introuvable (%s)" $reportPath -}}
+ {{- end -}}
+{{- end -}}
+{{- $allPages := where site.Pages ".File" "!=" nil -}}
+{{- $links := default (slice) $report.links -}}
+{{- $linkCount := len $links -}}
+{{- $generatedLabel := "" -}}
+{{- with $report.generatedAt -}}
+ {{- $ts := time . -}}
+ {{- $generatedLabel = $ts.Format "02/01/2006" -}}
+{{- end -}}
+
+
+ {{ partial "stat.html" (dict "title" "Dernière mise à jour" "value" $generatedLabel)}}
+ {{ partial "stat.html" (dict "title" "Liens morts détectés" "value" $linkCount)}}
+
+
+
+
+
+
+ | URL |
+ Emplacements |
+ Statut |
+
+
+
+ {{- range $links }}
+
+ |
+ {{ .url }}
+ |
+
+ {{- $locations := default (slice) .locations -}}
+ {{- if gt (len $locations) 0 -}}
+ {{- range $locations }}
+ {{- $file := .file -}}
+ {{- $line := .line -}}
+ {{- $pagePath := .page -}}
+ {{- $matchedPage := false -}}
+ {{- if $pagePath -}}
+ {{- $candidate := site.GetPage $pagePath -}}
+ {{- if $candidate -}}
+ {{- $matchedPage = $candidate -}}
+ {{- end -}}
+ {{- end -}}
+ {{- if and (not $matchedPage) $file -}}
+ {{- $normalized := replaceRE "^content/" "" $file -}}
+ {{- $candidates := where $allPages "File.Path" $normalized -}}
+ {{- if gt (len $candidates) 0 -}}
+ {{- $matchedPage = index $candidates 0 -}}
+ {{- end -}}
+ {{- end -}}
+ {{- if $matchedPage -}}
+ {{ $matchedPage.Title }}
+ {{- else if $file -}}
+ {{ $file }}{{ if $line }}:{{ $line }}{{ end }}
+ {{- else -}}
+ Emplacement inconnu
+ {{- end -}}
+
+ {{- end }}
+ {{- else -}}
+ Emplacements inconnus
+ {{- end -}}
+ |
+ {{ .status }} |
+
+ {{- end }}
+
+
+
+
diff --git a/themes/42/layouts/liens-morts/single.html b/themes/42/layouts/liens-morts/single.html
new file mode 100644
index 00000000..04581195
--- /dev/null
+++ b/themes/42/layouts/liens-morts/single.html
@@ -0,0 +1,34 @@
+{{ define "main" }}
+{{ partial "hero-page.html" . }}
+
+
+
+ {{- if eq .Page.Parent.RelPermalink "/interets/liens-interessants/" -}}
+ {{- else -}}
+ {{- if .Params.cover -}}
+ {{- partial "media/render-image.html" (dict
+ "Page" .Page
+ "Destination" .Params.cover
+ ) -}}
+ {{- end -}}
+ {{- end -}}
+
+ {{ with .TableOfContents }}
+ {{ if gt (len (plainify .)) 0 }}
+
+
+ Sommaire
+
+ {{ . | safeHTML }}
+
+ {{ end }}
+ {{ end }}
+
+ {{ .Content }}
+
+ {{- partial "liens-morts/report.html" (dict "Page" .) -}}
+
+ {{- partial "asides/keywords.html" . }}
+
+
+{{ end }}
diff --git a/tools/check_external_links.js b/tools/check_external_links.js
index f55f6a73..3eea5116 100644
--- a/tools/check_external_links.js
+++ b/tools/check_external_links.js
@@ -1,77 +1,98 @@
+#!/usr/bin/env node
+
const fs = require("fs");
const path = require("path");
-const yaml = require("js-yaml");
const util = require("util");
-const { execFile } = require("child_process");
+const yaml = require("js-yaml");
const UserAgent = require("user-agents");
+const { execFile } = require("child_process");
const {
collectMarkdownLinksFromFile,
extractLinksFromText,
} = require("./lib/markdown_links");
+const execFileAsync = util.promisify(execFile);
+
const SITE_ROOT = path.resolve(__dirname, "..");
+const CONTENT_DIR = path.join(SITE_ROOT, "content");
const CONFIG_PATH = path.join(__dirname, "config.json");
+const DAY_MS = 24 * 60 * 60 * 1000;
-let config = {};
-if (fs.existsSync(CONFIG_PATH)) {
- try {
- config = JSON.parse(fs.readFileSync(CONFIG_PATH, "utf8"));
- } catch (error) {
- console.warn(
- `Failed to parse ${path.relative(SITE_ROOT, CONFIG_PATH)}. Using defaults. (${error.message})`
- );
- }
-}
-
-const externalConfig = {
+const DEFAULT_CONFIG = {
cacheDir: path.join(__dirname, "cache"),
cacheFile: "external_links.yaml",
hostDelayMs: 2000,
retryDelayMs: 5000,
requestTimeoutSeconds: 5,
- cacheTtlSuccessDays: 7,
- cacheTtlClientErrorDays: 0,
- outputFormat: "markdown",
- outputFile: path.join(__dirname, "cache", "external_links_report.md"),
+ cacheTtlSuccessDays: 30,
+ cacheTtlClientErrorDays: 7,
+ cacheTtlServerErrorDays: 1,
+ cacheTtlTimeoutDays: 7,
+ maxConcurrentHosts: 4,
userAgent: null,
enableCookies: true,
cookieJar: path.join(__dirname, "cache", "curl_cookies.txt"),
- ...(config.externalLinks || {}),
};
-const CONTENT_DIR = path.join(SITE_ROOT, "content");
-const CACHE_DIR = path.isAbsolute(externalConfig.cacheDir)
- ? externalConfig.cacheDir
- : path.resolve(SITE_ROOT, externalConfig.cacheDir);
-const CACHE_PATH = path.isAbsolute(externalConfig.cacheFile)
- ? externalConfig.cacheFile
- : path.join(CACHE_DIR, externalConfig.cacheFile);
-const OUTPUT_FILE = path.isAbsolute(externalConfig.outputFile)
- ? externalConfig.outputFile
- : path.resolve(SITE_ROOT, externalConfig.outputFile);
-const COOKIE_JAR = externalConfig.cookieJar
- ? path.isAbsolute(externalConfig.cookieJar)
- ? externalConfig.cookieJar
- : path.resolve(SITE_ROOT, externalConfig.cookieJar)
+function loadConfig() {
+ if (!fs.existsSync(CONFIG_PATH)) {
+ return {};
+ }
+ try {
+ return JSON.parse(fs.readFileSync(CONFIG_PATH, "utf8"));
+ } catch (error) {
+ console.warn(
+ `Impossible de parser ${path.relative(SITE_ROOT, CONFIG_PATH)} (${error.message}).`
+ );
+ return {};
+ }
+}
+
+const rawConfig = loadConfig();
+const settings = {
+ ...DEFAULT_CONFIG,
+ ...(rawConfig.externalLinks || {}),
+};
+
+const CACHE_DIR = path.isAbsolute(settings.cacheDir)
+ ? settings.cacheDir
+ : path.resolve(SITE_ROOT, settings.cacheDir);
+const REPORT_PATH = path.isAbsolute(settings.cacheFile)
+ ? settings.cacheFile
+ : path.join(CACHE_DIR, settings.cacheFile);
+const COOKIE_JAR = settings.cookieJar
+ ? path.isAbsolute(settings.cookieJar)
+ ? settings.cookieJar
+ : path.resolve(SITE_ROOT, settings.cookieJar)
: path.join(CACHE_DIR, "curl_cookies.txt");
-const CACHE_TTL_SUCCESS_DAYS = Number(externalConfig.cacheTtlSuccessDays) || 0;
-const CACHE_TTL_CLIENT_ERROR_DAYS = Number(externalConfig.cacheTtlClientErrorDays) || 0;
-const HOST_DELAY_MS = Number(externalConfig.hostDelayMs) || 0;
-const RETRY_DELAY_MS = Number(externalConfig.retryDelayMs) || 0;
-const REQUEST_TIMEOUT_SECONDS = Number(externalConfig.requestTimeoutSeconds) || 0;
-const maxConcurrentConfig = Number(externalConfig.maxConcurrentHosts);
-const MAX_CONCURRENT_HOSTS =
- Number.isFinite(maxConcurrentConfig) && maxConcurrentConfig > 0
- ? maxConcurrentConfig
- : 4;
+const HOST_DELAY_MS = Math.max(0, Number(settings.hostDelayMs) || 0);
+const RETRY_DELAY_MS = Math.max(0, Number(settings.retryDelayMs) || 0);
+const REQUEST_TIMEOUT_SECONDS = Math.max(1, Number(settings.requestTimeoutSeconds) || 5);
+const MAX_CONCURRENT_HOSTS = Math.max(
+ 1,
+ Number.isFinite(Number(settings.maxConcurrentHosts))
+ ? Number(settings.maxConcurrentHosts)
+ : DEFAULT_CONFIG.maxConcurrentHosts
+);
const DEFAULT_USER_AGENT =
- typeof externalConfig.userAgent === "string" && externalConfig.userAgent.trim()
- ? externalConfig.userAgent.trim()
+ typeof settings.userAgent === "string" && settings.userAgent.trim()
+ ? settings.userAgent.trim()
: new UserAgent().toString();
-const ENABLE_COOKIES = externalConfig.enableCookies !== false;
-const PROGRESS_FILE = path.join(__dirname, "cache", "external_links_progress.csv");
-const execFileAsync = util.promisify(execFile);
+const ENABLE_COOKIES = settings.enableCookies !== false;
+
+const CACHE_TTL_SUCCESS_MS = daysToMs(
+ pickNumber(settings.cacheTtlSuccessDays, DEFAULT_CONFIG.cacheTtlSuccessDays)
+);
+const CACHE_TTL_CLIENT_ERROR_MS = daysToMs(
+ pickNumber(settings.cacheTtlClientErrorDays, DEFAULT_CONFIG.cacheTtlClientErrorDays)
+);
+const CACHE_TTL_SERVER_ERROR_MS = daysToMs(
+ pickNumber(settings.cacheTtlServerErrorDays, DEFAULT_CONFIG.cacheTtlServerErrorDays)
+);
+const CACHE_TTL_TIMEOUT_MS = daysToMs(
+ pickNumber(settings.cacheTtlTimeoutDays, DEFAULT_CONFIG.cacheTtlTimeoutDays)
+);
fs.mkdirSync(CACHE_DIR, { recursive: true });
if (ENABLE_COOKIES) {
@@ -81,267 +102,249 @@ if (ENABLE_COOKIES) {
}
}
-try {
- if (fs.existsSync(PROGRESS_FILE)) {
- fs.unlinkSync(PROGRESS_FILE);
+function pickNumber(value, fallback) {
+ const parsed = Number(value);
+ if (Number.isFinite(parsed)) {
+ return parsed;
}
-} catch (error) {
- console.warn(`Unable to remove existing progress file: ${error.message}`);
+ return fallback;
}
-let cache = {};
-if (fs.existsSync(CACHE_PATH)) {
- cache = yaml.load(fs.readFileSync(CACHE_PATH, "utf8")) || {};
-}
-let cacheDirty = false;
-
-const now = new Date();
-const BAD_LINKS = [];
-const lastHostChecks = new Map();
-const runResults = new Map();
-
-function updateProgress(processed, total) {
- process.stdout.write(`\rURL ${processed}/${total}`);
-}
-
-function isCacheValid(entry) {
- if (!entry?.checked) return false;
- const date = new Date(entry.checked);
- const ttlDays = (() => {
- const status = entry.status;
- if (typeof status === "number") {
- if (status < 400) return CACHE_TTL_SUCCESS_DAYS;
- if (status < 500) return CACHE_TTL_CLIENT_ERROR_DAYS;
- return 0;
- }
+function daysToMs(days) {
+ if (!Number.isFinite(days) || days <= 0) {
return 0;
- })();
- if (ttlDays <= 0) return false;
- return (now - date) / (1000 * 60 * 60 * 24) < ttlDays;
-}
-
-async function collectMarkdownLinks(filePath, occurrencesMap) {
- const entries = await collectMarkdownLinksFromFile(filePath);
- for (const { url, line } of entries) {
- recordOccurrence(occurrencesMap, filePath, line, url);
}
+ return days * DAY_MS;
}
-function recordOccurrence(occurrencesMap, filePath, lineNumber, url) {
- if (!occurrencesMap.has(url)) {
- occurrencesMap.set(url, { url, occurrences: [] });
+function ensureDirectoryExists(targetFile) {
+ fs.mkdirSync(path.dirname(targetFile), { recursive: true });
+}
+
+function toPosix(relativePath) {
+ return typeof relativePath === "string" ? relativePath.split(path.sep).join("/") : relativePath;
+}
+
+function relativeToSite(filePath) {
+ return toPosix(path.relative(SITE_ROOT, filePath));
+}
+
+function toPagePath(relativeContentPath) {
+ if (!relativeContentPath) return null;
+ let normalized = toPosix(relativeContentPath);
+ if (!normalized) return null;
+ normalized = normalized.replace(/^content\//, "");
+ if (!normalized) {
+ return "/";
}
- const entry = occurrencesMap.get(url);
- const alreadyRecorded = entry.occurrences.some(
- (item) => item.file === filePath && item.line === lineNumber
- );
- if (!alreadyRecorded) {
- entry.occurrences.push({ file: filePath, line: lineNumber });
+ normalized = normalized.replace(/\/index\.md$/i, "");
+ normalized = normalized.replace(/\/_index\.md$/i, "");
+ normalized = normalized.replace(/\.md$/i, "");
+ normalized = normalized.replace(/\/+/g, "/");
+ normalized = normalized.replace(/\/+$/, "");
+ normalized = normalized.replace(/^\/+/, "");
+ if (!normalized) {
+ return "/";
}
+ return `/${normalized}`;
}
-function delay(ms) {
- return new Promise((resolve) => setTimeout(resolve, ms));
-}
-
-async function applyHostDelay(host) {
- if (!host) return;
- const last = lastHostChecks.get(host);
- if (last) {
- const elapsed = Date.now() - last;
- const waitTime = HOST_DELAY_MS - elapsed;
- if (waitTime > 0) {
- await delay(waitTime);
+function deriveBundlePagePath(contentRelative) {
+ if (!contentRelative) return null;
+ const bundleRoot = contentRelative.replace(/\/data\/.*$/, "");
+ const candidates = [`${bundleRoot}/index.md`, `${bundleRoot}/_index.md`];
+ for (const candidate of candidates) {
+ const absolute = path.join(CONTENT_DIR, candidate);
+ if (fs.existsSync(absolute)) {
+ return toPagePath(candidate);
}
}
+ return toPagePath(bundleRoot);
}
-function recordHostCheck(host) {
- if (host) {
- lastHostChecks.set(host, Date.now());
+function derivePagePath(relativeFile) {
+ if (typeof relativeFile !== "string") return null;
+ const normalized = toPosix(relativeFile);
+ if (!normalized || !normalized.startsWith("content/")) return null;
+ const contentRelative = normalized.slice("content/".length);
+ if (contentRelative.includes("/data/")) {
+ return deriveBundlePagePath(contentRelative);
}
+ return toPagePath(contentRelative);
}
-function extractHost(url) {
+function loadState() {
+ if (!fs.existsSync(REPORT_PATH)) {
+ return { generatedAt: null, links: [], entries: {} };
+ }
try {
- return new URL(url).hostname;
- } catch (_) {
- return null;
- }
-}
-
-function persistCache() {
- if (!cacheDirty) return;
- ensureDirectoryExists(CACHE_PATH);
- fs.writeFileSync(CACHE_PATH, yaml.dump(cache));
- cacheDirty = false;
-}
-
-function formatLocations(occurrences) {
- return occurrences
- .map(({ file, line }) => `${path.relative(SITE_ROOT, file)}:${line}`)
- .join("; ");
-}
-
-function escapeCsvField(value) {
- const stringValue = String(value);
- if (/[",\n]/.test(stringValue)) {
- return `"${stringValue.replace(/"/g, '""')}"`;
- }
- return stringValue;
-}
-
-function appendProgress(url, occurrences, status) {
- const locationText = formatLocations(occurrences);
- const statusText =
- typeof status === "number" && status < 400 && status !== null ? "" : status ?? "";
- const line = [
- escapeCsvField(url),
- escapeCsvField(locationText),
- escapeCsvField(statusText),
- ].join(",");
- fs.appendFileSync(PROGRESS_FILE, `${line}\n`);
-}
-
-function groupEntriesByHost(entries) {
- const result = new Map();
- for (const entry of entries) {
- const host = extractHost(entry.url);
- const key = host || `__invalid__:${entry.url}`;
- if (!result.has(key)) {
- result.set(key, { host, entries: [] });
+ const payload = yaml.load(fs.readFileSync(REPORT_PATH, "utf8")) || {};
+ if (payload.entries && typeof payload.entries === "object") {
+ return {
+ generatedAt: payload.generatedAt || null,
+ links: Array.isArray(payload.links) ? payload.links : [],
+ entries: normalizeEntries(payload.entries),
+ };
}
- result.get(key).entries.push(entry);
- }
- return Array.from(result.values());
-}
-
-async function runWithConcurrency(items, worker, concurrency) {
- const executing = new Set();
- const promises = [];
- for (const item of items) {
- const promise = Promise.resolve().then(() => worker(item));
- promises.push(promise);
- executing.add(promise);
- const clean = () => executing.delete(promise);
- promise.then(clean).catch(clean);
- if (executing.size >= concurrency) {
- await Promise.race(executing);
- }
- }
- return Promise.all(promises);
-}
-
-async function curlRequest(url, method) {
- const args = [
- "--silent",
- "--location",
- "--fail",
- "--max-time",
- `${REQUEST_TIMEOUT_SECONDS}`,
- "--output",
- "/dev/null",
- "--write-out",
- "%{http_code}",
- "--user-agent",
- DEFAULT_USER_AGENT,
- "--request",
- method,
- url,
- ];
-
- if (ENABLE_COOKIES) {
- args.push("--cookie", COOKIE_JAR, "--cookie-jar", COOKIE_JAR);
- }
-
- try {
- const { stdout } = await execFileAsync("curl", args);
- const status = parseInt(stdout.trim(), 10);
return {
- status: Number.isNaN(status) ? null : status,
- errorType: null,
- method: method.toUpperCase(),
+ generatedAt: payload.generatedAt || null,
+ links: Array.isArray(payload.links) ? payload.links : [],
+ entries: normalizeEntries(payload),
};
} catch (error) {
- const rawStatus = error?.stdout?.toString().trim();
- const status = rawStatus ? parseInt(rawStatus, 10) : null;
- const errorCode = Number(error?.code);
- const errorType = errorCode === 28 ? "timeout" : null;
- return {
- status: Number.isNaN(status) ? null : status,
- errorType,
- method: method.toUpperCase(),
+ console.warn(
+ `Impossible de lire ${path.relative(SITE_ROOT, REPORT_PATH)} (${error.message}).`
+ );
+ return { generatedAt: null, links: [], entries: {} };
+ }
+}
+
+function normalizeEntries(rawEntries) {
+ const normalized = {};
+ if (!rawEntries || typeof rawEntries !== "object") {
+ return normalized;
+ }
+ for (const [url, data] of Object.entries(rawEntries)) {
+ if (!url.includes("://")) {
+ continue;
+ }
+ normalized[url] = normalizeEntryShape(url, data);
+ }
+ return normalized;
+}
+
+function normalizeEntryShape(url, raw) {
+ const checkedAt = raw?.checkedAt || raw?.checked || null;
+ const locations = normalizeLocations(raw?.locations, raw?.files);
+ return {
+ url,
+ status: typeof raw?.status === "number" ? raw.status : null,
+ errorType: raw?.errorType || null,
+ method: raw?.method || null,
+ checkedAt,
+ locations,
+ };
+}
+
+function normalizeLocations(locations, fallbackFiles) {
+ const items = [];
+ if (Array.isArray(locations)) {
+ for (const entry of locations) {
+ if (!entry) continue;
+ if (typeof entry === "string") {
+ const [filePart, linePart] = entry.split(":");
+ const filePath = toPosix(filePart.trim());
+ items.push({
+ file: filePath,
+ line: linePart ? Number.parseInt(linePart, 10) || null : null,
+ page: derivePagePath(filePath),
+ });
+ } else if (typeof entry === "object") {
+ const file = sizeof(entry.file) ? entry.file : null;
+ if (file) {
+ const normalizedFile = toPosix(file);
+ items.push({
+ file: normalizedFile,
+ line: typeof entry.line === "number" ? entry.line : null,
+ page:
+ typeof entry.page === "string" && entry.page.trim()
+ ? toPosix(entry.page.trim())
+ : derivePagePath(normalizedFile),
+ });
+ }
+ }
+ }
+ }
+
+ if (items.length === 0 && Array.isArray(fallbackFiles)) {
+ for (const file of fallbackFiles) {
+ if (!file) continue;
+ const normalizedFile = toPosix(file);
+ items.push({
+ file: normalizedFile,
+ line: null,
+ page: derivePagePath(normalizedFile),
+ });
+ }
+ }
+
+ return dedupeAndSortLocations(items);
+}
+
+function sizeof(value) {
+ return typeof value === "string" && value.trim().length > 0;
+}
+
+function dedupeAndSortLocations(list) {
+ if (!Array.isArray(list) || list.length === 0) {
+ return [];
+ }
+ const map = new Map();
+ for (const item of list) {
+ if (!item?.file) continue;
+ const key = `${item.file}::${item.line ?? ""}`;
+ if (!map.has(key)) {
+ const normalizedFile = toPosix(item.file);
+ map.set(key, {
+ file: normalizedFile,
+ line: typeof item.line === "number" ? item.line : null,
+ page:
+ typeof item.page === "string" && item.page.trim()
+ ? toPosix(item.page.trim())
+ : derivePagePath(normalizedFile),
+ });
+ }
+ }
+ return Array.from(map.values()).sort((a, b) => {
+ const fileDiff = a.file.localeCompare(b.file);
+ if (fileDiff !== 0) return fileDiff;
+ const lineA = a.line ?? Number.POSITIVE_INFINITY;
+ const lineB = b.line ?? Number.POSITIVE_INFINITY;
+ return lineA - lineB;
+ });
+}
+
+function saveState(state) {
+ ensureDirectoryExists(REPORT_PATH);
+ fs.writeFileSync(REPORT_PATH, yaml.dump(state), "utf8");
+}
+
+function createEntry(url, existing = {}) {
+ return {
+ url,
+ status: typeof existing.status === "number" ? existing.status : null,
+ errorType: existing.errorType || null,
+ method: existing.method || null,
+ checkedAt: existing.checkedAt || null,
+ locations: Array.isArray(existing.locations) ? dedupeAndSortLocations(existing.locations) : [],
+ };
+}
+
+function mergeOccurrences(entries, occurrences) {
+ const merged = {};
+ for (const [url, urlOccurrences] of occurrences.entries()) {
+ const existing = entries[url] || createEntry(url);
+ merged[url] = {
+ ...existing,
+ url,
+ locations: dedupeAndSortLocations(urlOccurrences),
};
}
+ return merged;
}
-function shouldRetryWithGet(result) {
- if (result.errorType) return true;
- if (result.status === null) return true;
- return result.status >= 400;
-}
-
-async function checkLink(url) {
- let info = runResults.get(url);
- if (!info) {
- const cachedInfo = cache[url];
- if (cachedInfo?.manually_killed === true) {
- // Do not re-test manually killed links
- info = cachedInfo;
- } else if (!isCacheValid(cachedInfo)) {
- const host = extractHost(url);
- if (host) {
- await applyHostDelay(host);
- }
-
- let result = await curlRequest(url, "HEAD");
- recordHostCheck(host);
-
- if (shouldRetryWithGet(result)) {
- await delay(RETRY_DELAY_MS);
- if (host) {
- await applyHostDelay(host);
- }
- result = await curlRequest(url, "GET");
- recordHostCheck(host);
- }
-
- info = {
- ...(cachedInfo || {}),
- status: result.status ?? null,
- errorType: result.errorType || null,
- method: result.method,
- checked: new Date().toISOString(),
- };
- cache[url] = info; // preserves files, manual flags, etc.
- cacheDirty = true;
- persistCache();
- } else if (cachedInfo) {
- info = cachedInfo;
- } else {
- info = {
- status: null,
- errorType: "unknown",
- method: "HEAD",
- checked: new Date().toISOString(),
- };
- }
- runResults.set(url, info);
+function recordOccurrence(map, filePath, line, url) {
+ if (!map.has(url)) {
+ map.set(url, []);
}
- return info;
-}
-
-function processYamlRecursively(obj, links = new Set()) {
- if (typeof obj === "string") {
- for (const link of extractLinksFromText(obj)) {
- links.add(link);
- }
- } else if (Array.isArray(obj)) {
- for (const item of obj) processYamlRecursively(item, links);
- } else if (typeof obj === "object" && obj !== null) {
- for (const key in obj) processYamlRecursively(obj[key], links);
+ const relativeFile = relativeToSite(filePath);
+ const normalizedLine = typeof line === "number" && Number.isFinite(line) ? line : null;
+ const pagePath = derivePagePath(relativeFile);
+ const list = map.get(url);
+ const key = `${relativeFile}:${normalizedLine ?? ""}`;
+ if (!list.some((item) => `${item.file}:${item.line ?? ""}` === key)) {
+ list.push({ file: relativeFile, line: normalizedLine, page: pagePath });
}
- return links;
}
function stripYamlInlineComment(line) {
@@ -380,36 +383,64 @@ function isBlockScalarIndicator(line) {
return /:\s*[>|][0-9+-]*\s*$/.test(cleaned);
}
-async function collectYamlLinks(filePath, occurrencesMap) {
+function processYamlRecursively(obj, links = new Set()) {
+ if (typeof obj === "string") {
+ for (const link of extractLinksFromText(obj)) {
+ links.add(link);
+ }
+ } else if (Array.isArray(obj)) {
+ for (const item of obj) {
+ processYamlRecursively(item, links);
+ }
+ } else if (obj && typeof obj === "object") {
+ for (const value of Object.values(obj)) {
+ processYamlRecursively(value, links);
+ }
+ }
+ return links;
+}
+
+async function collectMarkdownLinks(filePath, occurrences) {
+ const entries = await collectMarkdownLinksFromFile(filePath);
+ for (const { url, line } of entries) {
+ recordOccurrence(occurrences, filePath, line, url);
+ }
+}
+
+async function collectYamlLinks(filePath, occurrences) {
let linkSet = new Set();
try {
const doc = yaml.load(fs.readFileSync(filePath, "utf8"));
- linkSet = processYamlRecursively(doc);
- } catch (e) {
- console.error(`Failed to parse YAML file: ${filePath}`);
+ if (doc) {
+ linkSet = processYamlRecursively(doc);
+ }
+ } catch (error) {
+ console.warn(`Impossible de parser ${relativeToSite(filePath)} (${error.message}).`);
+ return;
+ }
+ if (linkSet.size === 0) {
return;
}
- if (linkSet.size === 0) return;
-
const recorded = new Map();
- const rawLines = fs.readFileSync(filePath, "utf8").split(/\r?\n/);
+ const lines = fs.readFileSync(filePath, "utf8").split(/\r?\n/);
let inBlockScalar = false;
let blockIndent = 0;
- const markRecorded = (url, lineNumber) => {
+ const mark = (url, lineNumber) => {
if (!recorded.has(url)) {
recorded.set(url, new Set());
}
- const lines = recorded.get(url);
- if (lines.has(lineNumber)) return;
- lines.add(lineNumber);
- recordOccurrence(occurrencesMap, filePath, lineNumber, url);
+ const set = recorded.get(url);
+ if (!set.has(lineNumber)) {
+ set.add(lineNumber);
+ recordOccurrence(occurrences, filePath, lineNumber, url);
+ }
};
- for (let index = 0; index < rawLines.length; index++) {
+ for (let index = 0; index < lines.length; index++) {
const lineNumber = index + 1;
- const line = rawLines[index];
+ const line = lines[index];
const indent = line.match(/^\s*/)?.[0].length ?? 0;
const trimmed = line.trim();
@@ -419,12 +450,11 @@ async function collectYamlLinks(filePath, occurrencesMap) {
continue;
}
if (trimmed === "" || indent >= blockIndent) {
- if (isYamlCommentLine(line)) {
- continue;
- }
- for (const link of extractLinksFromText(line)) {
- if (linkSet.has(link)) {
- markRecorded(link, lineNumber);
+ if (!isYamlCommentLine(line)) {
+ for (const link of extractLinksFromText(line)) {
+ if (linkSet.has(link)) {
+ mark(link, lineNumber);
+ }
}
}
continue;
@@ -440,221 +470,346 @@ async function collectYamlLinks(filePath, occurrencesMap) {
blockIndent = indent + 1;
}
- if (isYamlCommentLine(line)) continue;
-
- if (!trimmedWithoutComment) continue;
+ if (isYamlCommentLine(line) || !trimmedWithoutComment) {
+ continue;
+ }
for (const link of extractLinksFromText(withoutComment)) {
if (linkSet.has(link)) {
- markRecorded(link, lineNumber);
+ mark(link, lineNumber);
}
}
}
for (const link of linkSet) {
if (!recorded.has(link) || recorded.get(link).size === 0) {
- recordOccurrence(occurrencesMap, filePath, "?", link);
+ recordOccurrence(occurrences, filePath, null, link);
}
}
}
function walk(dir, exts) {
let results = [];
- const list = fs.readdirSync(dir);
- for (const file of list) {
- const fullPath = path.resolve(dir, file);
- const stat = fs.statSync(fullPath);
- if (stat.isDirectory()) {
+ const entries = fs.readdirSync(dir, { withFileTypes: true });
+ for (const entry of entries) {
+ const fullPath = path.join(dir, entry.name);
+ if (entry.isDirectory()) {
results = results.concat(walk(fullPath, exts));
- } else if (exts.includes(path.extname(fullPath))) {
+ } else if (exts.includes(path.extname(entry.name))) {
results.push(fullPath);
}
}
return results;
}
-function ensureDirectoryExists(targetFile) {
- fs.mkdirSync(path.dirname(targetFile), { recursive: true });
+function delay(ms) {
+ return new Promise((resolve) => setTimeout(resolve, ms));
}
-function escapeMarkdownCell(value) {
- return String(value).replace(/\|/g, "\\|").replace(/\r?\n/g, " ");
-}
+const lastHostChecks = new Map();
-function generateMarkdownReport(entries) {
- const header = [
- "# Broken External Links",
- "",
- `Generated: ${new Date().toISOString()}`,
- "",
- ];
- if (entries.length === 0) {
- return header.concat(["No broken external links found."]).join("\n");
- }
- const rows = entries.map((entry) => {
- const url = escapeMarkdownCell(entry.url);
- const location = escapeMarkdownCell(entry.location);
- const status = escapeMarkdownCell(entry.status);
- return `| ${url} | ${location} | ${status} |`;
- });
- return header
- .concat(["| URL | Location | Status |", "| --- | --- | --- |", ...rows])
- .join("\n");
-}
-
-function generateCsvReport(entries) {
- const lines = [`"url","location","status"`];
- for (const entry of entries) {
- const line = [entry.url, entry.location, entry.status]
- .map((field) => `"${String(field).replace(/"/g, '""')}"`)
- .join(",");
- lines.push(line);
- }
- return lines.join("\n");
-}
-
-function writeReport(entries) {
- const format = String(externalConfig.outputFormat || "markdown").toLowerCase();
- const content =
- format === "csv" ? generateCsvReport(entries) : generateMarkdownReport(entries);
- ensureDirectoryExists(OUTPUT_FILE);
- fs.writeFileSync(OUTPUT_FILE, content, "utf8");
-}
-
-(async () => {
- const occurrencesByUrl = new Map();
- const mdFiles = walk(CONTENT_DIR, [".md", ".markdown"]);
- const yamlFiles = walk(CONTENT_DIR, [".yaml", ".yml"]);
- for (const file of mdFiles) {
- await collectMarkdownLinks(file, occurrencesByUrl);
- }
- for (const file of yamlFiles) {
- await collectYamlLinks(file, occurrencesByUrl);
- }
-
- const uniqueEntries = Array.from(occurrencesByUrl.values());
- const activeUrls = new Set(uniqueEntries.map((entry) => entry.url));
- let cachePruned = false;
- for (const url of Object.keys(cache)) {
- if (!activeUrls.has(url)) {
- delete cache[url];
- cachePruned = true;
- }
- }
- if (cachePruned) {
- cacheDirty = true;
- }
- // Update file paths, line numbers and ensure manual flags exist
- for (const entry of uniqueEntries) {
- const files = Array.from(
- new Set(entry.occurrences.map((o) => path.relative(SITE_ROOT, o.file)))
- ).sort((a, b) => a.localeCompare(b));
- const locations = Array.from(
- new Set(
- entry.occurrences.map(
- (o) => `${path.relative(SITE_ROOT, o.file)}:${o.line}`
- )
- )
- ).sort((a, b) => a.localeCompare(b));
- const existing = cache[entry.url] || {};
- cache[entry.url] = {
- ...existing,
- manually_validated: existing.manually_validated === true,
- manually_killed: existing.manually_killed === true,
- files,
- locations,
- };
- cacheDirty = true;
- }
- if (cacheDirty) {
- ensureDirectoryExists(CACHE_PATH);
- fs.writeFileSync(CACHE_PATH, yaml.dump(cache));
- cacheDirty = false;
- }
-
- // Exclude manually killed from re-checking and reporting
- const entriesToCheck = uniqueEntries.filter(
- (e) => !(cache[e.url] && cache[e.url].manually_killed === true)
- );
-
- ensureDirectoryExists(PROGRESS_FILE);
- fs.writeFileSync(PROGRESS_FILE, `"url","locations","status"\n`, "utf8");
-
- const total = entriesToCheck.length;
- if (total === 0) {
- process.stdout.write("No external links found.\n");
- ensureDirectoryExists(CACHE_PATH);
- fs.writeFileSync(CACHE_PATH, yaml.dump(cache));
- writeReport([]);
+async function applyHostDelay(host) {
+ if (!host || HOST_DELAY_MS <= 0) {
return;
}
+ const last = lastHostChecks.get(host);
+ if (last) {
+ const elapsed = Date.now() - last;
+ const wait = HOST_DELAY_MS - elapsed;
+ if (wait > 0) {
+ await delay(wait);
+ }
+ }
+}
+function recordHostCheck(host) {
+ if (host) {
+ lastHostChecks.set(host, Date.now());
+ }
+}
+
+function extractHost(url) {
+ try {
+ return new URL(url).hostname;
+ } catch (_) {
+ return null;
+ }
+}
+
+async function curlRequest(url, method, hostHeader) {
+ const args = [
+ "--silent",
+ "--location",
+ "--fail",
+ "--max-time",
+ `${REQUEST_TIMEOUT_SECONDS}`,
+ "--output",
+ "/dev/null",
+ "--write-out",
+ "%{http_code}",
+ "--user-agent",
+ DEFAULT_USER_AGENT,
+ "--request",
+ method,
+ ];
+
+ if (ENABLE_COOKIES) {
+ args.push("--cookie", COOKIE_JAR, "--cookie-jar", COOKIE_JAR);
+ }
+ if (hostHeader) {
+ args.push("-H", `Host: ${hostHeader}`);
+ }
+ args.push(url);
+
+ try {
+ const { stdout } = await execFileAsync("curl", args);
+ const status = parseInt(stdout.trim(), 10);
+ return {
+ status: Number.isNaN(status) ? null : status,
+ errorType: null,
+ method: method.toUpperCase(),
+ };
+ } catch (error) {
+ const rawStatus = error?.stdout?.toString().trim();
+ const status = rawStatus ? parseInt(rawStatus, 10) : null;
+ const errorCode = Number(error?.code);
+ const timeout = errorCode === 28 ? "timeout" : null;
+ return {
+ status: Number.isNaN(status) ? null : status,
+ errorType: timeout,
+ method: method.toUpperCase(),
+ };
+ }
+}
+
+function shouldRetryWithGet(result) {
+ if (!result) return true;
+ if (result.errorType) return true;
+ if (typeof result.status !== "number") return true;
+ return result.status >= 400;
+}
+
+function getTtlMs(entry) {
+ if (!entry) return 0;
+ if (entry.errorType === "timeout" || entry.status === 0 || entry.status === null) {
+ return CACHE_TTL_TIMEOUT_MS;
+ }
+ const status = Number(entry.status);
+ if (Number.isNaN(status)) {
+ return CACHE_TTL_TIMEOUT_MS;
+ }
+ if (status >= 500) {
+ return CACHE_TTL_SERVER_ERROR_MS;
+ }
+ if (status >= 400) {
+ return CACHE_TTL_CLIENT_ERROR_MS;
+ }
+ if (status >= 200 && status < 400) {
+ return CACHE_TTL_SUCCESS_MS;
+ }
+ return CACHE_TTL_TIMEOUT_MS;
+}
+
+function needsCheck(entry) {
+ if (!entry?.checkedAt) {
+ return true;
+ }
+ const checked = Date.parse(entry.checkedAt);
+ if (Number.isNaN(checked)) {
+ return true;
+ }
+ const ttl = getTtlMs(entry);
+ if (ttl <= 0) {
+ return true;
+ }
+ return Date.now() - checked >= ttl;
+}
+
+function groupEntriesByHost(entries) {
+ const groups = new Map();
+ for (const entry of entries) {
+ const host = extractHost(entry.url);
+ const key = host || `__invalid__:${entry.url}`;
+ if (!groups.has(key)) {
+ groups.set(key, { host, entries: [] });
+ }
+ groups.get(key).entries.push(entry);
+ }
+ return Array.from(groups.values());
+}
+
+async function runWithConcurrency(items, worker, concurrency) {
+ const executing = new Set();
+ for (const item of items) {
+ const task = Promise.resolve().then(() => worker(item));
+ executing.add(task);
+ const clean = () => executing.delete(task);
+ task.then(clean).catch(clean);
+ if (executing.size >= concurrency) {
+ await Promise.race(executing);
+ }
+ }
+ await Promise.allSettled(executing);
+}
+
+function updateEntryWithResult(entry, result) {
+ const now = new Date().toISOString();
+ entry.status = typeof result.status === "number" ? result.status : null;
+ entry.errorType = result.errorType || null;
+ entry.method = result.method;
+ entry.checkedAt = now;
+}
+
+function formatStatusForReport(entry) {
+ if (!entry) return "error";
+ if (entry.errorType === "timeout") return "timeout";
+ if (typeof entry.status === "number") return entry.status;
+ return "error";
+}
+
+function isDead(entry) {
+ if (!entry) return false;
+ if (entry.errorType === "timeout") return true;
+ if (typeof entry.status !== "number") return true;
+ return entry.status >= 400;
+}
+
+function getStatusOrder(value) {
+ if (typeof value === "number" && Number.isFinite(value)) {
+ return value;
+ }
+ const label = typeof value === "string" ? value.toLowerCase() : "";
+ if (label === "timeout") {
+ return 10000;
+ }
+ return 10001;
+}
+
+function buildDeadLinks(entries) {
+ const list = [];
+ for (const entry of Object.values(entries)) {
+ if (!isDead(entry)) continue;
+ list.push({
+ url: entry.url,
+ status: formatStatusForReport(entry),
+ locations: entry.locations || [],
+ });
+ }
+ return list.sort((a, b) => {
+ const orderDiff = getStatusOrder(a.status) - getStatusOrder(b.status);
+ if (orderDiff !== 0) return orderDiff;
+ if (typeof a.status === "number" && typeof b.status === "number") {
+ return a.status - b.status;
+ }
+ const labelDiff = String(a.status).localeCompare(String(b.status));
+ if (labelDiff !== 0) return labelDiff;
+ return a.url.localeCompare(b.url);
+ });
+}
+
+function logProgress(processed, total) {
+ process.stdout.write(`\rURLs vérifiées ${processed}/${total}`);
+}
+
+async function collectOccurrences() {
+ const occurrences = new Map();
+ const mdFiles = walk(CONTENT_DIR, [".md", ".markdown"]);
+ for (const file of mdFiles) {
+ await collectMarkdownLinks(file, occurrences);
+ }
+ const yamlFiles = walk(CONTENT_DIR, [".yaml", ".yml"]);
+ for (const file of yamlFiles) {
+ await collectYamlLinks(file, occurrences);
+ }
+ return occurrences;
+}
+
+function persistEntriesSnapshot(entries, snapshotMeta) {
+ const payload = {
+ generatedAt: snapshotMeta?.generatedAt || null,
+ links: Array.isArray(snapshotMeta?.links) ? snapshotMeta.links : [],
+ entries,
+ };
+ saveState(payload);
+}
+
+async function checkEntries(entriesToCheck, entries, snapshotMeta) {
+ if (entriesToCheck.length === 0) {
+ return;
+ }
const hostGroups = groupEntriesByHost(entriesToCheck);
- const concurrency = Math.max(1, Math.min(MAX_CONCURRENT_HOSTS, hostGroups.length || 1));
+ const concurrency = Math.max(1, Math.min(MAX_CONCURRENT_HOSTS, hostGroups.length));
let processed = 0;
+ process.stdout.write(`Vérification de ${entriesToCheck.length} URL...\n`);
await runWithConcurrency(
hostGroups,
- async ({ entries }) => {
- for (const entry of entries) {
- const info = await checkLink(entry.url);
- const status = typeof info?.status === "number" ? info.status : null;
- const errorType = info?.errorType || null;
- const hasHttpError = status !== null && status >= 400;
- const isTimeout = errorType === "timeout";
- const statusLabel = isTimeout ? "timeout" : status ?? "error";
-
- if (status === null || hasHttpError || isTimeout) {
- BAD_LINKS.push({
- location: formatLocations(entry.occurrences),
- url: entry.url,
- status: statusLabel,
- });
+ async ({ host, entries: groupEntries }) => {
+ for (const entry of groupEntries) {
+ if (host) {
+ await applyHostDelay(host);
}
-
- appendProgress(entry.url, entry.occurrences, hasHttpError || isTimeout || status === null ? statusLabel : status);
+ const hostHeader = host || extractHost(entry.url);
+ let result = await curlRequest(entry.url, "HEAD", hostHeader);
+ recordHostCheck(host);
+ if (shouldRetryWithGet(result)) {
+ if (RETRY_DELAY_MS > 0) {
+ await delay(RETRY_DELAY_MS);
+ }
+ if (host) {
+ await applyHostDelay(host);
+ }
+ result = await curlRequest(entry.url, "GET", hostHeader);
+ recordHostCheck(host);
+ }
+ updateEntryWithResult(entries[entry.url], result);
+ persistEntriesSnapshot(entries, snapshotMeta);
processed += 1;
- updateProgress(processed, total);
+ logProgress(processed, entriesToCheck.length);
}
},
concurrency
);
process.stdout.write("\n");
+}
- ensureDirectoryExists(CACHE_PATH);
- fs.writeFileSync(CACHE_PATH, yaml.dump(cache));
-
- if (BAD_LINKS.length === 0) {
- writeReport([]);
- console.log(
- `No broken external links detected. Report saved to ${path.relative(
- SITE_ROOT,
- OUTPUT_FILE
- )}.`
- );
+async function main() {
+ const occurrences = await collectOccurrences();
+ if (occurrences.size === 0) {
+ const emptyState = { generatedAt: new Date().toISOString(), links: [], entries: {} };
+ saveState(emptyState);
+ console.log("Aucun lien externe détecté.");
return;
}
- const sorted = BAD_LINKS.sort((a, b) => {
- const rank = (entry) => {
- if (entry.status === "timeout") return 2;
- if (typeof entry.status === "number") {
- return entry.status === 404 ? 0 : 1;
- }
- return 1;
- };
- const diff = rank(a) - rank(b);
- if (diff !== 0) return diff;
- if (typeof a.status === "number" && typeof b.status === "number") {
- return a.status - b.status;
- }
- return a.url.localeCompare(b.url);
- });
+ const state = loadState();
+ const mergedEntries = mergeOccurrences(state.entries, occurrences);
+ const entriesArray = Object.values(mergedEntries);
+ const pending = entriesArray.filter((entry) => needsCheck(entry));
+
+ const snapshotMeta = {
+ generatedAt: state.generatedAt || null,
+ links: Array.isArray(state.links) ? state.links : [],
+ };
+
+ await checkEntries(pending, mergedEntries, snapshotMeta);
+
+ const deadLinks = buildDeadLinks(mergedEntries);
+ const nextState = {
+ generatedAt: new Date().toISOString(),
+ links: deadLinks,
+ entries: mergedEntries,
+ };
+ saveState(nextState);
- writeReport(sorted);
console.log(
- `Found ${sorted.length} broken external link(s). Report saved to ${path.relative(
+ `Liens externes analysés: ${entriesArray.length} URL (${deadLinks.length} mort(s)). Données écrites dans ${path.relative(
SITE_ROOT,
- OUTPUT_FILE
- )}.`
+ REPORT_PATH
+ )}`
);
-})();
+}
+
+main().catch((error) => {
+ console.error("Erreur lors de la vérification des liens:", error);
+ process.exitCode = 1;
+});
diff --git a/tools/check_internal_links.js b/tools/check_internal_links.js
deleted file mode 100644
index b64b37fc..00000000
--- a/tools/check_internal_links.js
+++ /dev/null
@@ -1,102 +0,0 @@
-const fs = require("fs");
-const path = require("path");
-const http = require("http");
-const readline = require("readline");
-
-const BASE_URL = "http://127.0.0.1:1313";
-const CONTENT_DIR = path.join(__dirname, "..", "content");
-const SITE_ROOT = path.resolve(__dirname, "..");
-const BAD_LINKS = [];
-
-function isInternalLink(link) {
- return !link.includes("://") && !link.startsWith("mailto:") && !link.startsWith("tel:");
-}
-
-function extractLinksFromLine(line) {
- const regex = /\]\(([^)"]+)\)/g;
- let match;
- const links = [];
- while ((match = regex.exec(line)) !== null) {
- links.push(match[1]);
- }
- return links;
-}
-
-function getBundleRelativeUrl(mdPath, link) {
- const bundleRoot = path.dirname(mdPath);
- let urlPath;
-
- if (link.startsWith("/")) {
- urlPath = link;
- } else {
- const fullPath = path.resolve(bundleRoot, link);
- const relative = path.relative(CONTENT_DIR, fullPath);
- urlPath = "/" + relative.replace(/\\/g, "/");
- }
-
- return urlPath;
-}
-
-async function checkLink(file, lineNumber, link) {
- const relativeUrl = getBundleRelativeUrl(file, link);
- const fullUrl = `${BASE_URL}${relativeUrl}`;
- return new Promise((resolve) => {
- http.get(fullUrl, (res) => {
- if (res.statusCode !== 200) {
- BAD_LINKS.push([path.relative(SITE_ROOT, file), link, lineNumber]);
- }
- res.resume();
- resolve();
- }).on("error", () => {
- BAD_LINKS.push([path.relative(SITE_ROOT, file), link, lineNumber]);
- resolve();
- });
- });
-}
-
-async function processFile(filePath) {
- const fileStream = fs.createReadStream(filePath);
- const rl = readline.createInterface({ input: fileStream, crlfDelay: Infinity });
- let lineNumber = 0;
-
- for await (const line of rl) {
- lineNumber++;
- const links = extractLinksFromLine(line);
- for (const link of links) {
- if (isInternalLink(link)) {
- process.stdout.write(".");
- await checkLink(filePath, lineNumber, link);
- }
- }
- }
-}
-
-function walk(dir) {
- let results = [];
- const list = fs.readdirSync(dir);
- list.forEach((file) => {
- file = path.resolve(dir, file);
- const stat = fs.statSync(file);
- if (stat && stat.isDirectory()) {
- results = results.concat(walk(file));
- } else if (file.endsWith(".md")) {
- results.push(file);
- }
- });
- return results;
-}
-
-(async () => {
- const files = walk(CONTENT_DIR);
- console.log(`Analyzing ${files.length} Markdown files...`);
- for (const file of files) {
- await processFile(file);
- }
-
- console.log("\n\n=== Broken Internal Links Report ===");
- if (BAD_LINKS.length === 0) {
- console.log("✅ No broken internal links found.");
- } else {
- console.table(BAD_LINKS.map(([f, u, l]) => ({ File: f + '#' + l, URL: u })));
- }
-})();
diff --git a/tools/mark_dead_links.js b/tools/mark_dead_links.js
deleted file mode 100644
index 60260ecc..00000000
--- a/tools/mark_dead_links.js
+++ /dev/null
@@ -1,450 +0,0 @@
-#!/usr/bin/env node
-
-const fs = require("fs");
-const path = require("path");
-const yaml = require("js-yaml");
-
-const SITE_ROOT = path.resolve(__dirname, "..");
-const CONFIG_PATH = path.join(__dirname, "config.json");
-
-function loadConfig() {
- if (!fs.existsSync(CONFIG_PATH)) {
- return {};
- }
- try {
- return JSON.parse(fs.readFileSync(CONFIG_PATH, "utf8"));
- } catch (error) {
- console.warn(
- `Impossible de parser ${path.relative(SITE_ROOT, CONFIG_PATH)} (${error.message}).`
- );
- return {};
- }
-}
-
-const config = loadConfig();
-const externalConfig = {
- cacheDir: path.join(__dirname, "cache"),
- cacheFile: "external_links.yaml",
- ...(config.externalLinks || {}),
-};
-
-const CACHE_DIR = path.isAbsolute(externalConfig.cacheDir)
- ? externalConfig.cacheDir
- : path.resolve(SITE_ROOT, externalConfig.cacheDir);
-const CACHE_PATH = path.isAbsolute(externalConfig.cacheFile)
- ? externalConfig.cacheFile
- : path.join(CACHE_DIR, externalConfig.cacheFile);
-
-function loadCache(cachePath) {
- if (!fs.existsSync(cachePath)) {
- return {};
- }
- try {
- return yaml.load(fs.readFileSync(cachePath, "utf8")) || {};
- } catch (error) {
- console.error(`Erreur lors de la lecture du cache YAML (${error.message}).`);
- return {};
- }
-}
-
-function getCheckedDate(info) {
- if (info && typeof info.checked === "string") {
- const parsed = new Date(info.checked);
- if (!Number.isNaN(parsed.valueOf())) {
- return parsed.toISOString();
- }
- }
- return new Date().toISOString();
-}
-
-function getStatusCode(info) {
- if (info && typeof info.status === "number") {
- return info.status;
- }
- return null;
-}
-
-const frenchDateFormatter = new Intl.DateTimeFormat("fr-FR", {
- day: "numeric",
- month: "long",
- year: "numeric",
-});
-
-function formatDisplayDate(isoString) {
- if (typeof isoString === "string") {
- const parsed = new Date(isoString);
- if (!Number.isNaN(parsed.valueOf())) {
- return frenchDateFormatter.format(parsed);
- }
- }
- return frenchDateFormatter.format(new Date());
-}
-
-function getFilesForUrl(info) {
- if (!info) return [];
- if (Array.isArray(info.files) && info.files.length > 0) {
- return info.files;
- }
- if (Array.isArray(info.locations) && info.locations.length > 0) {
- return Array.from(new Set(info.locations.map((entry) => String(entry).split(":")[0])));
- }
- return [];
-}
-
-function splitFrontmatter(content) {
- if (!content.startsWith("---")) {
- return null;
- }
- const match = content.match(/^---\n([\s\S]*?)\n---\n?/);
- if (!match) {
- return null;
- }
- const frontmatterText = match[1];
- let frontmatter = {};
- try {
- frontmatter = yaml.load(frontmatterText) || {};
- } catch (error) {
- console.error(`Frontmatter YAML invalide (${error.message}).`);
- return null;
- }
- const block = match[0];
- const body = content.slice(block.length);
- return { frontmatter, block, body };
-}
-
-function escapeRegExp(value) {
- return value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
-}
-
-function ensureTrailingNewline(value) {
- if (!value.endsWith("\n")) {
- return `${value}\n`;
- }
- return value;
-}
-
-function ensureBlankLineBeforeAppend(body) {
- if (body.endsWith("\n\n")) {
- return body;
- }
- if (body.endsWith("\n")) {
- return `${body}\n`;
- }
- return `${body}\n\n`;
-}
-
-function markInterestingLink(filePath, url, info) {
- const original = fs.readFileSync(filePath, "utf8");
- const parsed = splitFrontmatter(original);
- if (!parsed) {
- console.warn(`Frontmatter introuvable pour ${path.relative(SITE_ROOT, filePath)}, ignoré.`);
- return { changed: false };
- }
-
- const { frontmatter } = parsed;
- let body = parsed.body;
- const checkedDate = getCheckedDate(info);
- const displayDate = formatDisplayDate(checkedDate);
- const httpCode = getStatusCode(info);
- let changed = false;
-
- if (typeof frontmatter.title === "string" && !frontmatter.title.startsWith("[Lien mort]")) {
- frontmatter.title = `[Lien mort] ${frontmatter.title}`;
- changed = true;
- }
-
- let statusEntries = [];
- if (Array.isArray(frontmatter.status)) {
- statusEntries = [...frontmatter.status];
- }
-
- let statusEntry = statusEntries.find(
- (entry) => entry && typeof entry === "object" && entry.date === checkedDate
- );
- if (!statusEntry) {
- statusEntry = { date: checkedDate, http_code: httpCode };
- statusEntries.push(statusEntry);
- changed = true;
- } else if (statusEntry.http_code !== httpCode) {
- statusEntry.http_code = httpCode;
- changed = true;
- }
- frontmatter.status = statusEntries;
-
- const noteLine = `> Lien inaccessible depuis le ${displayDate}`;
- const noteRegex = /(>\s*Lien inaccessible depuis le\s+)([^\n]+)/;
- const existing = body.match(noteRegex);
- if (existing) {
- const current = existing[2].trim();
- if (current !== displayDate) {
- body = body.replace(noteRegex, `> Lien inaccessible depuis le ${displayDate}`);
- changed = true;
- }
- } else {
- body = ensureBlankLineBeforeAppend(body);
- body += `${noteLine}\n`;
- changed = true;
- }
-
- if (!changed) {
- return { changed: false };
- }
-
- const newFrontmatter = yaml.dump(frontmatter);
- const updatedContent = `---\n${newFrontmatter}---\n${body}`;
- if (updatedContent === original) {
- return { changed: false };
- }
- fs.writeFileSync(filePath, updatedContent, "utf8");
- return { changed: true };
-}
-
-function collectDeadlinkMaxId(body) {
- let maxId = 0;
- const regex = /\[\^deadlink-(\d+)\]/g;
- let match;
- while ((match = regex.exec(body)) !== null) {
- const value = parseInt(match[1], 10);
- if (Number.isInteger(value) && value > maxId) {
- maxId = value;
- }
- }
- return maxId;
-}
-
-function findExistingDeadlinkReference(line, url) {
- if (!line.includes(url)) return null;
- const escapedUrl = escapeRegExp(url);
- const markdownRegex = new RegExp(`\\[[^\\]]*\\]\\(${escapedUrl}\\)`);
- const angleRegex = new RegExp(`<${escapedUrl}>`);
-
- let referenceId = null;
-
- const searchers = [
- { regex: markdownRegex },
- { regex: angleRegex },
- ];
-
- for (const { regex } of searchers) {
- const match = regex.exec(line);
- if (!match) continue;
- const start = match.index;
- const end = start + match[0].length;
- const tail = line.slice(end);
- const footnoteMatch = tail.match(/^([\s)*_~`]*?)\[\^deadlink-(\d+)\]/);
- if (footnoteMatch) {
- referenceId = `deadlink-${footnoteMatch[2]}`;
- break;
- }
- }
- return referenceId;
-}
-
-function insertDeadlinkReference(line, url, nextId) {
- const escapedUrl = escapeRegExp(url);
- const markdownRegex = new RegExp(`\\[[^\\]]*\\]\\(${escapedUrl}\\)`);
- const angleRegex = new RegExp(`<${escapedUrl}>`);
-
- const footnoteRef = `[^deadlink-${nextId}]`;
-
- const markdownMatch = markdownRegex.exec(line);
- if (markdownMatch) {
- const end = markdownMatch.index + markdownMatch[0].length;
- let insertPos = end;
- while (insertPos < line.length && /[*_]/.test(line[insertPos])) {
- insertPos += 1;
- }
- return line.slice(0, insertPos) + ' ' + footnoteRef + line.slice(insertPos);
- }
-
- const angleMatch = angleRegex.exec(line);
- if (angleMatch) {
- const end = angleMatch.index + angleMatch[0].length;
- return line.slice(0, end) + footnoteRef + line.slice(end);
- }
-
- return null;
-}
-
-function upsertFootnoteDefinition(body, footnoteId, isoDate) {
- const displayDate = formatDisplayDate(isoDate);
- const desired = `Lien inaccessible depuis le ${displayDate}`;
- const definitionRegex = new RegExp(`^\\[\\^${footnoteId}\\]:\\s*(.+)$`, "m");
- const match = definitionRegex.exec(body);
- if (match) {
- if (match[1].trim() !== desired) {
- return {
- body: body.replace(definitionRegex, `[^${footnoteId}]: ${desired}`),
- changed: true,
- };
- }
- return { body, changed: false };
- }
- let updated = ensureTrailingNewline(body);
- updated = ensureBlankLineBeforeAppend(updated);
- updated += `[^${footnoteId}]: ${desired}\n`;
- return { body: updated, changed: true };
-}
-
-function markMarkdownLink(filePath, url, info) {
- const original = fs.readFileSync(filePath, "utf8");
- const parsed = splitFrontmatter(original);
- const hasFrontmatter = Boolean(parsed);
- const block = parsed?.block ?? "";
- const bodyOriginal = parsed ? parsed.body : original;
-
- const lines = bodyOriginal.split("\n");
- let inFence = false;
- let fenceChar = null;
- let referenceId = null;
- let changed = false;
- let maxId = collectDeadlinkMaxId(bodyOriginal);
-
- for (let i = 0; i < lines.length; i += 1) {
- const line = lines[i];
-
- const trimmed = line.trimStart();
- const fenceMatch = trimmed.match(/^([`~]{3,})/);
- if (fenceMatch) {
- const currentFenceChar = fenceMatch[1][0];
- if (!inFence) {
- inFence = true;
- fenceChar = currentFenceChar;
- continue;
- }
- if (fenceChar === currentFenceChar) {
- inFence = false;
- fenceChar = null;
- continue;
- }
- }
-
- if (inFence) {
- continue;
- }
-
- if (!line.includes(url)) {
- continue;
- }
-
- const existingRef = findExistingDeadlinkReference(line, url);
- if (existingRef) {
- referenceId = existingRef;
- break;
- }
-
- const nextId = maxId + 1;
- const updatedLine = insertDeadlinkReference(line, url, nextId);
- if (updatedLine) {
- lines[i] = updatedLine;
- referenceId = `deadlink-${nextId}`;
- maxId = nextId;
- changed = true;
- break;
- }
- }
-
- if (!referenceId) {
- return { changed: false };
- }
-
- let body = lines.join("\n");
- const { body: updatedBody, changed: definitionChanged } = upsertFootnoteDefinition(
- body,
- referenceId,
- getCheckedDate(info)
- );
-
- body = updatedBody;
- if (definitionChanged) {
- changed = true;
- }
-
- if (!changed) {
- return { changed: false };
- }
-
- const updatedContent = hasFrontmatter ? `${block}${body}` : body;
- if (updatedContent === original) {
- return { changed: false };
- }
- fs.writeFileSync(filePath, updatedContent, "utf8");
- return { changed: true };
-}
-
-function processFile(absolutePath, url, info) {
- if (!fs.existsSync(absolutePath)) {
- console.warn(`Fichier introuvable: ${absolutePath}`);
- return { changed: false };
- }
- const relative = path.relative(SITE_ROOT, absolutePath);
- if (relative.startsWith("content/interets/liens-interessants/")) {
- return markInterestingLink(absolutePath, url, info);
- }
- if (path.extname(relative).toLowerCase() === ".md") {
- return markMarkdownLink(absolutePath, url, info);
- }
- return { changed: false };
-}
-
-function main() {
- if (!fs.existsSync(CACHE_PATH)) {
- console.error("Cache introuvable. Exécutez d'abord tools/check_external_links.js.");
- process.exit(1);
- }
-
- const cache = loadCache(CACHE_PATH);
- const entries = Object.entries(cache).filter(
- ([, info]) => info && info.manually_killed === true
- );
-
- if (entries.length === 0) {
- console.log("Aucun lien marqué comme mort manuellement dans le cache.");
- return;
- }
-
- let updates = 0;
- let warnings = 0;
-
- for (const [url, info] of entries) {
- const files = getFilesForUrl(info);
- if (files.length === 0) {
- console.warn(`Aucun fichier associé à ${url}.`);
- warnings += 1;
- continue;
- }
- for (const relativePath of files) {
- const absolutePath = path.isAbsolute(relativePath)
- ? relativePath
- : path.resolve(SITE_ROOT, relativePath);
- try {
- const { changed } = processFile(absolutePath, url, info);
- if (changed) {
- updates += 1;
- console.log(
- `✅ ${path.relative(SITE_ROOT, absolutePath)} mis à jour pour ${url}`
- );
- }
- } catch (error) {
- warnings += 1;
- console.error(
- `Erreur lors du traitement de ${path.relative(SITE_ROOT, absolutePath)} (${error.message}).`
- );
- }
- }
- }
-
- if (updates === 0) {
- console.log("Aucune modification nécessaire.");
- } else {
- console.log(`${updates} fichier(s) mis à jour.`);
- }
-
- if (warnings > 0) {
- console.warn(`${warnings} fichier(s) n'ont pas pu être traités complètement.`);
- }
-}
-
-if (require.main === module) {
- main();
-}
diff --git a/tools/run_link_checks.js b/tools/run_link_checks.js
deleted file mode 100644
index 73f95360..00000000
--- a/tools/run_link_checks.js
+++ /dev/null
@@ -1,54 +0,0 @@
-#!/usr/bin/env node
-
-const path = require("path");
-const { spawn } = require("child_process");
-
-const SITE_ROOT = path.resolve(__dirname, "..");
-
-const steps = [
- { label: "check_internal_links", script: path.join(__dirname, "check_internal_links.js") },
- { label: "check_external_links", script: path.join(__dirname, "check_external_links.js") },
- { label: "update_external_links", script: path.join(__dirname, "update_external_links.js") },
- { label: "mark_dead_links", script: path.join(__dirname, "mark_dead_links.js") },
-];
-
-function runStep({ label, script }) {
- return new Promise((resolve, reject) => {
- const child = spawn("node", [script], {
- cwd: SITE_ROOT,
- stdio: "inherit",
- });
-
- child.on("exit", (code, signal) => {
- if (typeof code === "number" && code === 0) {
- resolve();
- return;
- }
- const reason =
- typeof code === "number"
- ? `code ${code}`
- : signal
- ? `signal ${signal}`
- : "unknown reason";
- reject(new Error(`Étape "${label}" terminée avec ${reason}`));
- });
-
- child.on("error", (error) => {
- reject(new Error(`Impossible d'exécuter "${label}": ${error.message}`));
- });
- });
-}
-
-async function main() {
- for (const step of steps) {
- const relative = path.relative(SITE_ROOT, step.script);
- console.log(`\n➡️ Exécution de ${relative}...`);
- await runStep(step);
- }
- console.log("\n✅ Workflow des liens terminé.");
-}
-
-main().catch((error) => {
- console.error(`\n❌ Échec du workflow: ${error.message}`);
- process.exitCode = 1;
-});
diff --git a/tools/update_external_links.js b/tools/update_external_links.js
deleted file mode 100644
index 835a609e..00000000
--- a/tools/update_external_links.js
+++ /dev/null
@@ -1,254 +0,0 @@
-const fs = require("fs");
-const path = require("path");
-const util = require("util");
-const yaml = require("js-yaml");
-const readline = require("readline");
-const { execFile } = require("child_process");
-
-const execFileAsync = util.promisify(execFile);
-
-const SITE_ROOT = path.resolve(__dirname, "..");
-const CONFIG_PATH = path.join(__dirname, "config.json");
-
-let config = {};
-if (fs.existsSync(CONFIG_PATH)) {
- try {
- config = JSON.parse(fs.readFileSync(CONFIG_PATH, "utf8"));
- } catch (error) {
- console.warn(
- `Impossible de parser ${path.relative(
- SITE_ROOT,
- CONFIG_PATH
- )}. Valeurs par défaut utilisées. (${error.message})`
- );
- }
-}
-
-const externalConfig = {
- cacheDir: path.join(__dirname, "cache"),
- cacheFile: "external_links.yaml",
- ...(config.externalLinks || {}),
-};
-
-const CACHE_DIR = path.isAbsolute(externalConfig.cacheDir)
- ? externalConfig.cacheDir
- : path.resolve(SITE_ROOT, externalConfig.cacheDir);
-const CACHE_PATH = path.isAbsolute(externalConfig.cacheFile)
- ? externalConfig.cacheFile
- : path.join(CACHE_DIR, externalConfig.cacheFile);
-
-function ensureDirectoryExists(targetFile) {
- fs.mkdirSync(path.dirname(targetFile), { recursive: true });
-}
-
-function loadCache() {
- if (!fs.existsSync(CACHE_PATH)) return {};
- try {
- return yaml.load(fs.readFileSync(CACHE_PATH, "utf8")) || {};
- } catch (e) {
- console.error("Erreur de lecture du cache YAML:", e.message);
- return {};
- }
-}
-
-function saveCache(cache) {
- ensureDirectoryExists(CACHE_PATH);
- fs.writeFileSync(CACHE_PATH, yaml.dump(cache), "utf8");
-}
-
-function promptFactory() {
- const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
- const question = (q) =>
- new Promise((resolve) => rl.question(q, (ans) => resolve(ans.trim())));
- return {
- async ask(q) {
- return await question(q);
- },
- close() {
- rl.close();
- },
- };
-}
-
-async function ensureCheckRanIfNeeded() {
- if (fs.existsSync(CACHE_PATH)) return;
- console.log(
- "Cache introuvable. Exécution préalable de tools/check_external_links.js..."
- );
- await execFileAsync("node", [path.join(__dirname, "check_external_links.js")], {
- cwd: SITE_ROOT,
- env: process.env,
- });
-}
-
-function listBrokenUrls(cache) {
- const result = [];
- for (const [url, info] of Object.entries(cache)) {
- const status = info && typeof info.status === "number" ? info.status : null;
- const killed = info && info.manually_killed === true;
- const validated = info && info.manually_validated === true;
- if (killed) continue; // on ne traite plus ces URL
- if (validated) continue; // déjà validé manuellement
- if (status !== null && (status >= 400 || status === 0)) {
- result.push({ url, info });
- }
- }
- return result;
-}
-
-function getFilesForUrl(info) {
- let files = [];
- if (Array.isArray(info?.files) && info.files.length > 0) {
- files = info.files;
- } else if (Array.isArray(info?.locations) && info.locations.length > 0) {
- files = Array.from(new Set(info.locations.map((s) => String(s).split(":")[0])));
- }
- return files.map((p) => path.resolve(SITE_ROOT, p));
-}
-
-function replaceInFile(filePath, from, to) {
- if (!fs.existsSync(filePath)) return { changed: false };
- const original = fs.readFileSync(filePath, "utf8");
- if (!original.includes(from)) return { changed: false };
- const updated = original.split(from).join(to);
- if (updated !== original) {
- fs.writeFileSync(filePath, updated, "utf8");
- return { changed: true };
- }
- return { changed: false };
-}
-
-async function main() {
- await ensureCheckRanIfNeeded();
- let cache = loadCache();
-
- const broken = listBrokenUrls(cache);
- if (broken.length === 0) {
- console.log("Aucun lien en erreur (>= 400) à traiter.");
- return;
- }
-
- const p = promptFactory();
- try {
- for (const { url, info } of broken) {
- const statusLabel = typeof info.status === "number" ? String(info.status) : "inconnu";
- const locations = Array.isArray(info.locations) ? info.locations : [];
- const files = Array.isArray(info.files) ? info.files : Array.from(new Set(locations.map((s) => String(s).split(":")[0])));
- console.log("\nURL: ", url);
- console.log("Statut: ", statusLabel);
- if (locations.length > 0) {
- console.log("Emplacements:");
- for (const loc of locations) console.log(" - ", loc);
- } else if (files.length > 0) {
- console.log("Emplacements:");
- for (const f of files) console.log(" - ", `${f}:?`);
- } else {
- console.log("Fichiers: (aucun chemin enregistré)");
- }
-
- const choice = (
- await p.ask(
- "Action ? [i]gnorer, [c]onfirmer, [r]emplacer, [m]ort, [q]uitter (défaut: i) : "
- )
- ).toLowerCase() || "i";
-
- if (choice === "q") {
- console.log("Arrêt demandé.");
- break;
- }
-
- if (choice === "i") {
- // Ignorer
- continue;
- }
-
- if (choice === "c") {
- const nowIso = new Date().toISOString();
- cache[url] = {
- ...(cache[url] || {}),
- manually_validated: true,
- manually_killed: cache[url]?.manually_killed === true,
- status: 200,
- errorType: null,
- method: "MANUAL",
- checked: nowIso,
- };
- saveCache(cache);
- console.log("Marqué comme validé manuellement.");
- continue;
- }
-
- if (choice === "m") {
- cache[url] = {
- ...(cache[url] || {}),
- manually_killed: true,
- manually_validated: cache[url]?.manually_validated === true,
- status: cache[url]?.status ?? null,
- errorType: cache[url]?.errorType ?? null,
- method: cache[url]?.method ?? null,
- };
- saveCache(cache);
- console.log("Marqué comme mort (plus jamais retesté).");
- continue;
- }
-
- if (choice === "r") {
- if (!(Array.isArray(files) && files.length > 0)) {
- console.log(
- "Impossible de remplacer: aucun fichier enregistré pour cet URL. Relancez d'abord tools/check_external_links.js."
- );
- continue;
- }
- const newUrl = await p.ask("Nouvel URL: ");
- if (!newUrl || !newUrl.includes("://")) {
- console.log("URL invalide, opération annulée.");
- continue;
- }
- // Remplacements dans les fichiers listés
- let changedFiles = 0;
- for (const rel of files) {
- const abs = path.resolve(SITE_ROOT, rel);
- const { changed } = replaceInFile(abs, url, newUrl);
- if (changed) changedFiles++;
- }
- console.log(`Remplacements effectués dans ${changedFiles} fichier(s).`);
-
- // Mettre à jour la base: déplacer l'entrée vers la nouvelle clé
- const oldEntry = cache[url] || {};
- const newEntryExisting = cache[newUrl] || {};
- cache[newUrl] = {
- ...newEntryExisting,
- files: Array.isArray(oldEntry.files) ? [...oldEntry.files] : files,
- locations: Array.isArray(oldEntry.locations)
- ? [...oldEntry.locations]
- : Array.isArray(oldEntry.files)
- ? oldEntry.files.map((f) => `${f}:?`)
- : Array.isArray(locations)
- ? [...locations]
- : [],
- manually_validated: false,
- manually_killed: false,
- status: null,
- errorType: null,
- method: newEntryExisting.method || null,
- checked: null,
- };
- delete cache[url];
- saveCache(cache);
- console.log("Base mise à jour pour le nouvel URL.");
- continue;
- }
-
- console.log("Choix non reconnu. Ignoré.");
- }
- } finally {
- p.close();
- }
-
- console.log("\nTerminé. Vous pouvez relancer 'node tools/check_external_links.js' pour mettre à jour les statuts.");
-}
-
-main().catch((err) => {
- console.error("Erreur:", err);
- process.exitCode = 1;
-});