Build Your Own Data Agent

This is a hands-on field guide, not a Python tutorial. In about thirty minutes, you'll wire a Google Sheet to an AI model and end up with a small "data agent" that answers plain-English questions about your own donor list, grant tracker, or program budget — and shows its work. It's designed for nonprofit staff who don't code, costs roughly $2–5 a month at typical volume, and runs on tools your organization probably already uses.

Download the PDF guide Clone the Replit template

The PDF is free, no email required. The template runs free on Replit; OpenAI usage costs pennies at small scale.

Honest frame This is a starter kit, not a product. It's a working, forkable example you can point at your own data — and a guide to the decisions that actually matter (which data, which prompts, what to verify). It will not replace a data analyst, and it's not a substitute for professional judgment on sensitive information.

What it is

A "data agent" here is deliberately small: a short program that takes a plain-English question, looks at a single spreadsheet you already maintain, and writes back an answer — with the rows or numbers it used, so you can verify it. You don't write code. You rename a sample file, paste in your own data, and ask questions like you'd ask a thoughtful intern who can count.

The pattern was pulled into the open this week at MIT's EmTech AI conference, where OpenAI described Kepler, the internal data agent two engineers built that now serves more than 4,000 of their employees (VentureBeat coverage). Kepler is industrial-scale — connected to warehouses, guarded by deep permissions, tuned over months. This guide takes the same shape and shrinks it to one sheet, one API key, one afternoon. If you want the fuller picture from EmTech AI, see our notes, Notes from MIT: What OpenAI's Head of ChatGPT Engineering Says Comes Next.

What it does

Reads one spreadsheet you control
Answers plain-English questions about that sheet
Shows the rows or totals it used
Runs on a laptop, Replit, or any free Python host
Costs a few dollars a month at low volume

What it does not

Replace a data analyst or financial pro
Connect to your donor CRM or case-management system
Write back to the sheet or take actions
Handle PHI, client case files, or legal records
Explain its answers perfectly every time

How it works

Under the hood, the template is about 300 lines of Python. Here's the shape of what happens when you type a question, with no code required to understand it:

You type a question — e.g. "Which donors gave at least twice last year?"
The agent loads your sheet into memory as a table (the same sheet you've been using).
It sends your question + a description of the sheet to OpenAI's model. It does not upload the whole sheet — it sends column names, data types, and a tiny preview so the model can reason about the shape of your data.
The model writes a small, sandboxed query in pandas (a standard data library) that only reads — it can never delete, overwrite, or call the outside internet.
The agent runs the query locally, then sends the result back through the model to phrase a plain-English answer.
You see the answer plus the rows it used, so you can click through and confirm it's right before forwarding to your board.

Why this shape Three things make this safe enough to use on real data: the agent only reads (never writes), the sandbox blocks network calls and file access, and every answer is rendered with the underlying rows so a human can verify. The model helps you reason about your data; it doesn't replace your judgment about it.

What's in the Replit template

The template is a single public repository you can fork in one click. Everything is pre-wired; you rename files and paste in an API key.

File	What it's for
`agent.py`	The data agent itself — ~290 lines, heavily commented.
`data/donors.csv`	Synthetic donor list (60 rows) — safe to experiment with.
`data/grants.csv`	Synthetic grant tracker (24 rows) with start / end dates and statuses.
`data/program_budget.csv`	Synthetic program budget (7 programs) with planned vs. actual spend.
`generate_sample_data.py`	Regenerates the synthetic data if you want to start fresh.
`SAFETY.md`	What to run on your own data — and what to never feed it.
`README.md`	Ten-minute walkthrough, zero code required.

What it costs

The software itself is free. The only ongoing cost is the OpenAI API, which is pay-as-you-go by token. Typical early use lands around $2 to $5 per month — the price of a coffee, not a SaaS seat. Below is a rough estimate for three usage levels using gpt-4o-mini, the small, fast, inexpensive model the template is configured to use.

Usage

Questions / month

Est. monthly cost

Occasional

~30

~$0.50 – $1.50

Regular

~150

~$2 – $5

Heavy (small team)

~500

~$8 – $15

OpenAI's nonprofits program offers eligible 501(c)(3) organizations discounted access to paid ChatGPT plans and API credits; check the program for current terms. You can also put a hard monthly ceiling on spend in the usage limits panel — we recommend you set one on day one, before you paste your key anywhere.

Safety — read before you use it on real data

This is the section we most want nonprofit staff to actually read. A data agent is a small but real software system that sends information about your sheet to an AI provider. That's fine for many nonprofit use cases, and wrong for others. Here is the honest line.

Do not run it on

Protected health information (PHI)
Client case files (housing, DV, immigration, etc.)
Personally identifying data on minors
Sealed legal or court records
Anything under a data-use agreement that prohibits third-party processing

Generally fine to run on

Donor lists with names and amounts (your public-facing data)
Grant trackers (funders, statuses, due dates)
Program budgets, actuals, and reconciliation views
Event attendance, volunteer hours
Anonymized or aggregated program data

Two more practical safeguards. First: use your own OpenAI API key (not the free ChatGPT web app) so your data falls under OpenAI's API data-usage policy, which does not use your inputs to train their models by default. Second: set a usage cap in the OpenAI limits panel so a runaway loop can never cost you more than the dollar amount you're comfortable with. We suggest $10 for a first month.

If you take one thing away You don't need to be technical to use this. You do need to be thoughtful about what data you point it at. The credibility of AI in the nonprofit sector is going to be built one careful deployment at a time.

Walkthrough video

A short screen-recorded walkthrough is coming — Kim records it separately and we'll embed it here. In the meantime, the README in the template and the PDF guide above are written to stand on their own.

Walkthrough video — coming soon ~8 minutes. You'll watch us fork the template, paste a key, and ask the first question.

Open questions for the reader

A deliberate choice in this first version was to keep the surface area small. Three natural next steps we'd love your help thinking through — if you run into one of these, we'd like to hear from you:

Donor DB integration

Connect it to our CRM

Should the next version read directly from your donor CRM (Bloomerang, Neon, Salesforce NPSP, DonorPerfect)? That's a bigger lift and changes the safety model — but for many orgs it's the real workflow.

Multi-source

Ask across two sheets at once

Right now the agent sees one sheet at a time. A common question — "which grant funded which program over-spend?" — needs two. We're exploring the smallest-safe way to do this.

Scheduled reports

Send me this every Monday

Turn a good question into a recurring email digest (board dashboard, grant-deadline alert, cash-flow snapshot). Straightforward to add — we want to hear which reports would actually land.

FAQ

Do I need to know Python?

No. The template is pre-written. You'll rename one file, paste a key, and ask questions in English. If you can work in Google Sheets, you can use this.

Will it really stay cheap?

Yes, if you use gpt-4o-mini (the default) and set a usage cap. Each question typically costs a fraction of a cent. Heavy use by a small team usually stays under $15 / month. Set a $10 monthly cap on day one — you can always raise it later.

What kinds of data work best?

Anything that lives cleanly in a spreadsheet: donor lists, grant trackers, program budgets, event attendance, volunteer hours. Clean, well-labeled columns help the model a lot. If your sheet is a jumble of merged cells and notes, clean it up once before pointing the agent at it.

What should I never put in it?

Protected health information, client case files, data on minors, sealed records, and anything covered by a data-use agreement that prohibits third-party processing. When in doubt, ask your ED or board — the cost of being careful here is near zero.

Will OpenAI train on my data?

Not when you use your own API key. OpenAI's API data-usage policy states API inputs and outputs are not used to train models by default. That's different from the free ChatGPT web app, where consumer settings apply. This template uses the API.

Can this replace my data analyst?

No. And we'd encourage you not to frame it that way with your team. This is a way to turn a sheet into something you can ask questions of at 10pm without bothering anyone — and a way for a small org without an analyst to get faster answers on routine questions. The judgment calls stay with the human.

How do I get help if something breaks?

The README in the template has a troubleshooting section. If you're stuck, open an issue on the GitHub repo and tag @ourcommunity-tech — we read them. For hands-on support on your own deployment, get in touch.

Sources & further reading

OpenAI — "Inside our in-house data agent" (Kepler announcement, April 2026). openai.com/…inside-our-in-house-data-agent
VentureBeat — "OpenAI's AI data agent, built by two engineers, now serves 4,000 employees." venturebeat.com/…serves-4-000-employees-and
Our Community Tech — "Notes from MIT: What OpenAI's Head of ChatGPT Engineering Says Comes Next." ourcommunity.tech/insights/emtech-ai-2026-choudhry.html
OpenAI — Nonprofits program. openai.com/nonprofits
OpenAI — API data-usage policy. openai.com/policies/api-data-usage-policies
OpenAI — Usage limits settings. platform.openai.com/settings/organization/limits