Autopilot or Co‑Pilot? Unpacking the Role of LLMs in Modern Development

(I made this image with ChatGPT’s Sora 💅🏼)

The past few years has seen numerous AI companies emerge. Investors have spent trillions of dollars hoping for an eventual pay-day. The hype coming from people working at these companies is so loud it can be deafening!

As a software engineer it’s become increasingly tough to discern what is truly valuable vs what might be a waste of time…

The goal of this series of posts is to separate the hype from the practical value of using LLM’s while developing software.

The AI tools I use frequently include:

You’ll see them referenced through out my experience below in the series:

The Research
Anecdotes in Action
Practical Tips & Conclusion

July 04, 2025Niles V. McGiver

Autopilot or Co‑Pilot? 1 - AI Research

📊📚 The State of the Art – Recent Research Highlights

To ground us a bit I want to start by summarizing some key research findings from the last 12 months.

The Good

Stanford Study - Predicting Expert Evaluations in Software Code Reviews

This study found lots of productivity gains for engineers in the workplace, but with a fair amount of nuance.

Complexity and Maturity: The largest gains (30-40%) were seen in low-complexity, greenfield tasks, while high-complexity, brownfield tasks showed the smallest gains (0-10%).
Language Popularity: AI was found to be less helpful for low-popularity languages and more effective for popular languages like Python (10-20% gains).
Codebase Size: As the codebase grows, the productivity gains from AI decrease significantly due to limitations in context windows and signal-to-noise ratio.

In conclusion, the study found that AI increases developer productivity by an average of 15-20%, but its effectiveness is highly dependent on the specific task, codebase, and language.

The Bad

Continuing to reference the Stanford study above, we should also highlight that code created with LLM tools has some issues.

Along with the productivity gains described above there seems to be a lot of technical debt, or re-work, accumulated because of the use of these tools. So much so that it cuts the productivity gains in half. Even with this fiction, it still seems to be a net positive on productivity, but it’s something to keep in mind as we merge features to main.

See YouTube video by the lead author here.

UTSA - Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities

All tested models exhibited package hallucination, with rates between: 0.22% - 46.15%. This included packages that could potentially be security risks.

Organizations must carefully balance the trade-offs between model performance, security, and resource constraints when selecting models for code generation tasks.

The Ugly

NANDA - an MIT-led initiative - ROI Study

95% of enterprise generative AI pilot projects fail to deliver a meaningful ROI within 6 months.

Projects often stall in “pilot purgatory” because AI outputs are “confidently wrong,” requiring employees to spend extra time double-checking and correcting the results. This “verification tax” erases any potential ROI.

METR Study

Using AI tools actually slowed down experienced open-source developers by ~ -20% on average, contrary to their own predictions (~ +20%).

The slowdown was attributed to developers spending more time reviewing AI outputs and prompting, due to AI’s unreliability and lack of implicit context in complex, familiar codebases.

The Takeaway

LLM based coding tools seem to add the most value when you are working with:

type-ahead (where you can digest and verify each line more readily),
a prototype,
new feature,
greenfield app,
small to mid-size codebases, or
more compact brownfield features

Beyond that they begin to have eroding value.

Sadly, these tools are not magic 🪄😞

If you are working with:

highly ambiguous specs,
a large, highly complex codebase, written in a low popularity language, or
building generative features on your own bespoke model at your company

you should ask yourself: “Is using this tool going to make me more productive in this context?”

Still at the end of the day there seems to be clear boosts to productivity in many-many projects using these tools in the right contexts.

July 03, 2025Niles V. McGiver

Autopilot or Co‑Pilot? 2 -

✍🏼 Anecdotal Experiences – Real‑World Stories

Below are some of anecdotes while using AI the past year.

The One‑Line Fix

I can’t tell you how many times I’ve asked an LLM for a regex or complex SQL query syntax and gotten a correct solution in seconds, saving hours of time!

LLM based auto-complete is my favorite feature of all these tools (thank you Tabnine)! Once in a while you describe the method, a hit the tab button and something surprising that saves you days appears!

AI can Tempt Veterans to take Shortcuts 🤖☠️

I’ve seen good engineers rely on Claude like a crutch and stop thinking critically. I’ve even seen staff engineers open up PR’s with embarrassing regressions, poorly written code, and tests that miss core parts of a feature. PR’s created with AI must be read and reviewed critically, even for those with more seniority.

Pair‑Programming with an LLM

I love to explore a new framework or unfamiliar part of a codebase with Claude. Its great at summarizing a class or feature set. It can also be very helpful “pair” by prompting it with smaller questions and actively partnering with it. I’ve also had great experiences working with Gemini the browser layering on increasingly complex level of queries.

It is Magic with Boilerplate Code 🪄

Recently I used Cursor to help a team transition from Rails ERB templates to React views by generating boilerplate code. I’ve also been able to verbally describe the shape and attributes of a JSON payload to the LLM. I used this to create example while testing API endpoints.

This can help engineers from having to work on mind numbing tasks.

When the Model Hallucinates

More times than I care to admit, I have been excited to see a LLM suggest a perfect library, module, or syntactic sugar to compliment the feature I’m working on, only to find out it is non-existent. 😿

Try Using it in Weird Places

I like to use Cursor to make a first draft of an ERD (Entity Relationship Diagram) diagram in Mermaid format. Sure, I might have to adjust a few data types and relationships, but it provides a helpful first pass.

I also have set up an MCP (Model Context Protocol) integrations with Figma (design app), Notion (wiki) and Linear (ticketing manager) to read PRD’s (Product Requirement Documents), ADR’s (Architectural Design Record), the tickets in an epic. Usually the more context the better the outcome (up to 100,000 lines of context). Often times it will at least create something that is at least useful as a first draft.

Fool Me Twice, Shame on Me

If it doesn’t create something useful on the first 2 passes I will often times just cut-bait and do it on my own. Don’t waste your time~!

If you are using these tools right, you are feeding it something small to medium that you could likely do mostly on your own in under 15 minutes. You’re not going to find the perfect magic incantation every time.

Wiley Coyote Effect 🐺🏔️

Sometimes while working with LLM’s you can think you are a based borg-minded-genius.

Then days later… after the LLM assisted code is deployed into production, you discover a subtle but major un-tested regression.

You realize at that moment that you are in-fact actually Wiley Coyote standing off the edge of a cliff.

July 02, 2025Niles V. McGiver

Autopilot or Co‑Pilot? 3 - Practical Tips - Making LLM's Work for You

🛠️🧰 Tips for Making LLMs Work for You

When I first tried using AI I cautiously tried to use it everywhere for everything. Over time, I began to find ways to apply it that made me the most productive.

Below are some tips I’ve compiled while working with AI the past few years. It is by no means exhaustive~!

Tip	Why It Helps	Quick Example
Focus on Smaller Problems	The larger your codebase the more the context window is degraded	`Build boilerplate for a form that updates the user model`
Start with a Clear Prompt	Reducing ambiguity, yields more accurate snippets.	`Generate a Python function that parses ISO‑8601 dates`
Validate with Tests	Catch hallucinations early.	Write a unit test that must pass before accepting the LLM's suggestion, write an e2e before you start the feature. Make new ones along the way. Make sure they all pass!
Always Verify Packages	LLM's will confidently give you a magic package that perfect for your problem, sometimes they don't exist! In the worst-case scenario they can link you to something with security vulnerabilities	Google the NPM package before you commit! Look at the commit history and look at the maintainers.
When spiking prompt it 3 times and compare the results	LLM's are great at generating ideas. Run the same prompt 3 times and take the best parts of the results.	`Refactor this model callback into a service and apply it across the code base.`
Assume Code was Written by a JR	This will challenge you to think critically about the context (LLMs don't), and surface issues	Read every line in your editor as if it were written by a JR
Iterate & Refine	Treat the model as a conversational partner.	Follow‑up: "Add error handling for invalid strings." Apply Fowler's Refactoring principles to evaluate trade‑offs.
Leverage Contextual Files	Upload relevant code files so the model sees the surrounding architecture, you'll get better results.	Use OpenAI's file upload feature with spreadsheets, integrate your issue tracker and wiki with MCP, use cursor rule files in each dir
Documentation & Knowledge Transfer	Living documentation helps teams go quicker	LLMs can auto‑generate API docs, inline comments, and migration guides.

🤓 Good Engineering Practices

I’d like to add good engineering practices in here as well. They are helpful for all teams, especially ones that use LLM’s. They guide you towards higher levels of collaboration, productivity, & maintainability.

I find them to be imperative to do given the volume of code, re-work, and regressions that coding with AI creates.

Tip	Why It Helps	Quick Example
Design Over Implementation	A good upfront design leads to long-term development advantages.	LLMs can help generate scaffolding but shouldn't have the final say in high‑level design. Create designs and architectural plans and present them to your team to look for holes and new ways to look at the problem
Do Smoke Tests Often	Catch issues early on	Run the app in the background and do a click through ever 15 minutes
Linting is More Important than Ever	Be sure to use linters deeply and expansively	Harness tools like EsLint, Pylint, and RuboCop
Stay Security‑Aware	Prevent accidental inclusion of secrets or vulnerable patterns.	Run static analysis on generated code.
Keep Pull Requests Focused	Always assume the code can and will have flaws. The more slop you let into the codebase, the harder it will be to clean up.	I like to come up with a team agreement to keep PR's 15 files or less or 750 lines or less.

🤖👨🏽 Closing Thoughts – The Future of Human-AI Collaboration

Returning to the original question for this series is AI: “Autopilot or Co‑Pilot?”

Personally I think AI performs the role of something closer to co-pilot right now. It’s clear that software engineering will never been the same again. However, LLMs are augmentative, not a replacement for deep expertise. We don’t currently have agentic programmers, and we can’t get away with just vibe coding (at least not yet!).

In many ways the role of the senior / staff engineer is even more important than ever in this context. If we are using these tools for productivity boosts, we must also double down on ensuring our codebases are maintainable, performant, secure, and well architected.

I encourage folks to experiment with these tools evolve your best practices. As new tools and models come out we should continue to engage with them and find their best purposes.

Happy coding with your robot friends! 🎉

December 27, 2024Niles V. McGiver

How to build a Python‑based web scraper that utilize OpenAI's SDK

(I made this image with ChatGPT’s Sora 💅🏼)

📖 Overview

In this tutorial we’ll build a Python‑based web scraper that:

Navigates dynamic sites with Selenium.
Extracts and cleans HTML using BeautifulSoup.
Analyzes the scraped text with LangChain (LLM‑powered summarization / keyword extraction).
Presents the results through a lightweight Streamlit dashboard.

By the end you’ll have a reusable pipeline that can scrape any public web page, feed the raw content to an LLM, and display the AI‑generated insights—all in a single, reproducible script.

🛠️ Prerequisites

Assuming you have an up-to-date version of Homebrew and MacOS

Python ≥ 3.13+

 brew install python@3.13

Selenium

 pip install selenium

ChromeDriver

Download from https://chromedriver.chromium.org/

BeautifulSoup4

 pip install beautifulsoup4

LangChain + OpenAI SDK

 pip install langchain openai

Streamlit

 pip install streamlit

OpenAI

You’ll also need to sign up for an OpenAI account

Get an API key (or any other LLM provider supported by LangChain).

Set it as an environment variable:

export OPENAI_API_KEY="sk-your‑key-here"

🚀 Step‑by‑Step Implementation

1️⃣ Initialise Selenium

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

# Path to your ChromeDriver executable
driver_path = "/path/to/chromedriver"
service = Service(driver_path)

options = webdriver.ChromeOptions()
options.add_argument("--headless")          # Run without opening a browser window
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")

driver = webdriver.Chrome(service=service, options=options)

def fetch_page(url: str) -> str:
    driver.get(url)
    time.sleep(3)  # Wait for dynamic content to load (adjust as needed)
    driver.quit()  # Always close the Selenium session when done
    return driver.page_source

2️⃣ Parse HTML with BeautifulSoup

from bs4 import BeautifulSoup

def extract_text(html: str) -> str:
    soup = BeautifulSoup(html, "html.parser")
    
    # Remove scripts / styles
    for tag in soup(["script", "style"]):
        tag.decompose()
    
    # Grab the main article body – adapt the selector to the target site
    article = soup.select_one("article") or soup.body
    return article.get_text(separator="\n", strip=True)

3️⃣ Analyze Text with LangChain

from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2)

def summarize(content: str) -> str:
    chain = load_summarize_chain(llm, chain_type="map_reduce")
    return chain.run(content)

def extract_keywords(content: str) -> list[str]:
    prompt = (
        "Extract the top 5 keywords from the following text. "
        "Return them as a comma‑separated list.\n\n"
        f"{content}"
    )
    response = llm(prompt)
    return [kw.strip() for kw in response.split(",")]

4️⃣ Build a Streamlit UI

Create a file called app.py:

import streamlit as st

st.title("🕸️ AI‑Powered Web Scraper")
url = st.text_input("Enter a URL to scrape:", "")

if st.button("Run scraper"):
    if url:
        html = fetch_page(url)
        raw_text = extract_text(html)

        with st.spinner("Summarising..."):
            summary = summarize(raw_text)

        with st.spinner("Extracting keywords..."):
            keywords = extract_keywords(raw_text)

        st.subheader("📄 Summary")
        st.write(summary)

        st.subheader("🔑 Keywords")
        st.write(", ".join(keywords))

        st.subheader("🧾 Raw extracted text")
        st.text_area("Full text", raw_text, height=300)
    else:

Run the app locally:

$ streamlit run app.py

You’ll see a clean web interface where you can paste any URL, click Run scraper, and instantly receive a concise AI‑generated summary plus key terms.

✅ What We’ve Built

Dynamic navigation (Selenium) → handles JS‑rendered pages.

Robust text extraction (BeautifulSoup) → strips boilerplate.
LLM‑driven insight (LangChain + OpenAI) → summarises and highlights keywords.
Interactive front‑end (Streamlit) → no need for a separate web server.

From here one could easily extend the pipeline:

Add pagination support for multi‑page articles.
Store results in a SQLite or PostgreSQL database.
Swap the OpenAI model for a local LLM (e.g., Llama 2) via LangChain’s adapters.

📚 Further Reading

Happy web-scraping and enjoy watching AI turn raw web data into actionable knowledge!

Niles V. McGiver

Latest Posts

📊📚 The State of the Art – Recent Research Highlights

✍🏼 Anecdotal Experiences – Real‑World Stories

🛠️🧰 Tips for Making LLMs Work for You

🤓 Good Engineering Practices

🤖👨🏽 Closing Thoughts – The Future of Human-AI Collaboration