How to Seed Your Test Database with AI-Generated User Data

Somewhere around the third time a developer on your team checks in a users.json fixture with twelve rows of "John Doe" and "test@example.com", you start to feel the problem viscerally. The tests pass. The staging environment looks fine. Then a real user signs up with a name in Arabic script, a role combination nobody put in the fixture, and three permission flags set at once — and something breaks.

The root issue isn't the bug. It's that your test data was never realistic enough to surface it.

Why Fixture Files Fail at Scale

Static fixture files have three failure modes that compound over time.

First, they're written by whoever set up the project initially, which means they reflect that person's mental model of a "typical user." If the product has grown, that mental model is probably wrong.

Second, they don't cover edge cases by default. Nobody writes a fixture for a user whose preferred_pronouns field is an empty array, or whose locale is pt-BR when the app was only tested with en-US.

Third, they're updated manually — which means they aren't updated at all. The fixture from 18 months ago has schema fields that no longer exist and is missing three that were added since.

The alternative isn't to copy production data (GDPR says no, and you'd just be importing real bugs). The alternative is to generate synthetic data that's realistic, varied, and driven by your actual schema.

The Setup: AI Persona Generator API

AI Persona Generator exposes a REST endpoint at POST /api/v1/personas that takes a schema description and returns structured user records. Here's the minimal request shape:

{
  "personaCount": 20,
  "format": "json",
  "projectDescription": "A B2B SaaS project management tool with role-based access",
  "outputLanguage": "english",
  "schemaMode": "hybrid",
  "personalPrompt": false,
  "customFields": true,
  "enableAvatar": false,
  "fields": [
    {
      "id": "f1",
      "name": "role",
      "type": "Job Role",
      "description": "User's role in their organization",
      "values": ["admin", "editor", "viewer", "billing_manager"],
      "valueWeightPercent": [10, 40, 40, 10],
      "multi": false
    },
    {
      "id": "f2",
      "name": "subscription_tier",
      "type": "Subscription Tier",
      "description": "Account plan",
      "values": ["free", "pro", "enterprise"],
      "valueWeightPercent": [60, 30, 10],
      "multi": false
    }
  ]
}

The valueWeightPercent array is the part most tools miss. You're not asking for random data — you're asking for data with a realistic distribution. In most SaaS products, free users dramatically outnumber enterprise accounts. That ratio matters when you're testing pagination, billing logic, or any feature gated by tier.

Making the Request

You'll need an API key from your account settings. Here's a working curl call:

curl -X POST https://aipersonagen.com/api/v1/personas \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "personaCount": 20,
    "format": "json",
    "projectDescription": "B2B SaaS project management tool",
    "outputLanguage": "english",
    "schemaMode": "hybrid",
    "personalPrompt": false,
    "customFields": true,
    "enableAvatar": false,
    "fields": [...]
  }'

The response comes back with a taskId. Generation is async for larger batches, so you poll the GET endpoint:

curl "https://aipersonagen.com/api/v1/personas?taskId=YOUR_TASK_ID" \
  -H "Authorization: Bearer YOUR_API_KEY"

When success is true, the data field contains a JSON string of your generated records.

Wiring It Into a Seed Script

Here's a Node.js seed script that generates users and inserts them into a PostgreSQL database using pg:

const { Client } = require('pg');

async function generatePersonas(apiKey, payload) {
  const res = await fetch('https://aipersonagen.com/api/v1/personas', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${apiKey}`,
    },
    body: JSON.stringify(payload),
  });
  const { taskId } = await res.json();

  // Poll until complete (simple version — add timeout in production)
  while (true) {
    await new Promise(r => setTimeout(r, 2000));
    const poll = await fetch(
      `https://aipersonagen.com/api/v1/personas?taskId=${taskId}`,
      { headers: { 'Authorization': `Bearer ${apiKey}` } }
    );
    const result = await poll.json();
    if (result.success) return JSON.parse(result.data);
  }
}

async function seed() {
  const client = new Client({ connectionString: process.env.DATABASE_URL });
  await client.connect();

  const personas = await generatePersonas(process.env.AI_PERSONA_API_KEY, {
    personaCount: 50,
    format: 'json',
    projectDescription: 'B2B SaaS project management tool with role-based access',
    outputLanguage: 'english',
    schemaMode: 'hybrid',
    personalPrompt: false,
    customFields: true,
    enableAvatar: false,
    fields: [
      {
        id: 'f1', name: 'role', type: 'Job Role',
        description: 'User role in organization',
        values: ['admin', 'editor', 'viewer', 'billing_manager'],
        valueWeightPercent: [10, 40, 40, 10], multi: false
      },
      {
        id: 'f2', name: 'subscription_tier', type: 'Subscription Tier',
        description: 'Account plan',
        values: ['free', 'pro', 'enterprise'],
        valueWeightPercent: [60, 30, 10], multi: false
      }
    ]
  });

  for (const p of personas) {
    await client.query(
      `INSERT INTO users (full_name, email, role, subscription_tier)
       VALUES ($1, $2, $3, $4)
       ON CONFLICT (email) DO NOTHING`,
      [p.full_name, p.email, p.role, p.subscription_tier]
    );
  }

  console.log(`Seeded ${personas.length} users.`);
  await client.end();
}

seed().catch(console.error);

Add it to your package.json scripts and you have npm run seed generating fresh, schema-accurate data every time.

Using It in Jest / Vitest Tests

If you prefer inline test data rather than a pre-seeded database, the same API works as a test fixture factory. A helper like this keeps tests clean:

// test-helpers/persona-factory.js
let cachedPersonas = null;

export async function getTestPersonas(count = 10) {
  if (cachedPersonas) return cachedPersonas.slice(0, count);
  // ... same fetch + poll logic as above
  cachedPersonas = result;
  return cachedPersonas.slice(0, count);
}

Cache the result across your test suite so you're not making API calls on every test run. Generate once per CI run, store in a temp file, and reuse.

What This Doesn't Replace

To be direct: this approach is best for integration tests and staging environments, not unit tests. If you're testing a function that formats a name, you don't need AI-generated personas — a few hardcoded strings are faster and more predictable.

Where it earns its keep is anywhere your test quality depends on data variety: permission logic, billing edge cases, internationalization, search and filtering, or any feature where the realistic distribution of user attributes actually matters for catching bugs before production does.