Hacker News Daily

Perplexity AI exposed for stealthily scraping the web, dodging no-crawl rules


Listen Later

Cloudflare exposes Perplexity AI’s stealth crawling tactics
  • Perplexity’s crawlers bypass common no-crawl directives (robots.txt) by switching from declared bot user agents to generic browser strings, primarily mimicking Chrome on macOS.
  • When blocked, Perplexity rotates IP addresses and ASNs outside their official ranges to evade detection, violating ethical web crawling norms.
  • Cloudflare’s tests with private domains blocking all crawlers still showed Perplexity returning detailed data, indicating covert scraping.
  • Cloudflare responded by delisting Perplexity as a verified bot and deploying managed rules—available even on free plans—to detect and block these evasive crawlers.
  • The case highlights tensions between AI companies’ aggressive data harvesting for training and the web ecosystem’s control measures, underscoring the need for transparent bot behavior standards.
  • “Objects should shut the fuck up” — critique of excessive device noise
    • Modern consumer products like cars, washing machines, and baby monitors produce intrusive, often unnecessary audible alerts with minimal user control or configurability.
    • Examples include persistent, startling LPG warnings in cars and non-disableable beeps on every washing machine control interaction, increasing user annoyance and potentially reducing safety.
    • The author’s frustrated tone underscores widespread alert fatigue caused by default sounds that prioritize notifications over user context or wellbeing.
    • Exceptions praised are devices with subtle, considerate alerts, such as dishwashers opening their doors silently after cycles or silent e-readers.
    • This calls for design philosophies that prioritize user control and reduce noise pollution in everyday technology.
    • Could interstellar object 3I/ATLAS be alien technology?
      • Researchers analyzed the recently discovered 3I/ATLAS’s unusual orbital dynamics and non-gravitational acceleration, hypothesizing it might be a technological artifact with possible intelligence and intent.
      • The object’s orbital tilt and trajectories near inner planets are statistically improbable for random interstellar visitors and could enable stealthy Solar System access.
      • The paper entertains the idea of a “Dark Forest” scenario where advanced civilizations might behave hostilely, suggesting 3I/ATLAS could be benign or malign.
      • The authors treat the hypothesis primarily as a pedagogical exercise, emphasizing the importance of scientific openness to such testable but speculative ideas.
      • The study provokes debate on interpreting limited data about interstellar visitors and the implications for SETI and planetary defense.
      • ChatGPT in university writing classes: a year-long experiment
        • UVA professor Piers Gelly integrated ChatGPT use into his writing curriculum, tasking 72 students to critically engage AI tools rather than banning them.
        • Students viewed AI skeptically yet pragmatically, using it for brainstorming and editing while recognizing its tendency toward bland and hallucinated content.
        • Classroom discussions highlighted differences between AI-generated “romanticized” prose and more mundane human writing, sparking reflection on storytelling and creativity.
        • Faculty found AI useful for grading speed and assignment design, though students largely preferred human feedback; most agreed human instructors remain essential.
        • The experiment illustrates a nuanced “messy middle” where human creativity and AI support coexist, suggesting collaborative rather than adversarial futures in education.
        • ...more
          View all episodesView all episodes
          Download on the App Store

          Hacker News DailyBy The Podcast Collective - Ai Podcasts