Quantcast
Channel: Martech Zone
Viewing all articles
Browse latest Browse all 1361

BigQuery: How To Use SQL To Create Demo PII Name, Address, Email, and Phone Number Data

$
0
0

update table demo pii data from sql bigquery

Safeguarding the sensitive information of your customers is crucial. Personally Identifiable Information (PII), such as names, addresses, and phone numbers, is heavily regulated under data privacy laws like GDPR, CCPA, and HIPAA. However, developers, data analysts, and QA teams still need to work with realistic data for testing, development, or demonstrations.

For example, I’ve been developing an OpenINSIGHTS demo account to output campaigns with our retail AI customer predictions. I want to provide evidence that the data is actionable, meaning I can display actual customer records with the output. However, I don’t want to display accounts, actual names, addresses, email addresses, phone numbers, or any other PII in customer data.

Why Build Fake PII Data?

  1. Compliance with Privacy Laws: Using real PII in testing or development environments can breach regulations, resulting in hefty fines.
  2. Risk Mitigation: Real PII in lower environments increases the risk of accidental leaks.
  3. Accurate Testing: Fake but realistic data allows you to effectively test data pipelines, reports, and UIs.
  4. Demo Environments: Fake data provides a professional appearance during demos without risking user privacy.

Should Fake Data Look Realistic?

Not necessarily. While fake data must work within your systems, it should still look intentionally fake to anyone observing a demo or testing environment. This helps maintain transparency and avoid confusion.

  1. System Integrity: Internal processes, such as filtering by city, state, or zip code, often rely on valid formats and patterns. Fake data should meet these requirements to ensure the system functions correctly without errors. For example, ZIP codes should align with expected formats, while other fields like names or street addresses can be replaced with clearly fake placeholders.
  2. Functional Testing: Maintaining integrity in key fields like city, state, and zip ensures that internal filters, workflows, and validations continue to operate as expected. Meanwhile, the fake data for other fields (like names and street addresses) can help test edge cases or performance without impacting functionality.
  3. Presentation Clarity: During demos, fake data that is clearly fake (e.g., Zxy Test St.) avoids confusion while still showcasing system features. This strikes a balance between professional presentation and maintaining transparency about data usage.

You can ensure secure, functional, and clear testing or demonstrations by intentionally designing fake data to look artificial while preserving system integrity in essential fields. Here’s a snapshot of what I was able to build:

Fake PII Data

SQL to Generate Fake PII Data

Here is the SQL query to create fake yet realistic PII data in your database for first_name, last_name, customer_address_line1, customer_address_line2, email_address, and phone_number, as well as creating a full name field. I’ve added that logic only to add a second address line to 20% of households.

UPDATE `project.demo.pii`
SET 
  first_name = (
    CONCAT(
      UPPER(SUBSTR('bcdfghjklmnpqrstvwxyz', CAST(FLOOR(RAND() * 21) + 1 AS INT64), 1)),
      SUBSTR('aeiou', CAST(FLOOR(RAND() * 5) + 1 AS INT64), 1),
      SUBSTR('bcdfghjklmnpqrstvwxyz', CAST(FLOOR(RAND() * 21) + 1 AS INT64), 1),
      IF(RAND() > 0.5, SUBSTR('aeiou', CAST(FLOOR(RAND() * 5) + 1 AS INT64), 1), '')
    )
  ),
  last_name = (
    CONCAT(
      UPPER(SUBSTR('bcdfghjklmnpqrstvwxyz', CAST(FLOOR(RAND() * 21) + 1 AS INT64), 1)),
      SUBSTR('aeiou', CAST(FLOOR(RAND() * 5) + 1 AS INT64), 1),
      SUBSTR('bcdfghjklmnpqrstvwxyz', CAST(FLOOR(RAND() * 21) + 1 AS INT64), 1),
      IF(RAND() > 0.5, SUBSTR('aeiou', CAST(FLOOR(RAND() * 5) + 1 AS INT64), 1), '')
    )
  ),
  customer_address_line1 = (
    CONCAT(
      CAST(CAST(FLOOR(RAND() * 99999 + 1) AS INT64) AS STRING), " ",
      UPPER(SUBSTR('bcdfghjklmnpqrstvwxyz', CAST(FLOOR(RAND() * 21) + 1 AS INT64), 1)),
      SUBSTR('aeiou', CAST(FLOOR(RAND() * 5) + 1 AS INT64), 1),
      SUBSTR('bcdfghjklmnpqrstvwxyz', CAST(FLOOR(RAND() * 21) + 1 AS INT64), 1),
      " ",
      CASE CAST(FLOOR(RAND() * 19) AS INT64)
        WHEN 0 THEN 'St'
        WHEN 1 THEN 'Ave'
        WHEN 2 THEN 'Blvd'
        WHEN 3 THEN 'Dr'
        WHEN 4 THEN 'Ln'
        WHEN 5 THEN 'Rd'
        WHEN 6 THEN 'Ci'
        WHEN 7 THEN 'Ct'
        WHEN 8 THEN 'Pl'
        WHEN 9 THEN 'Pkwy'
        WHEN 10 THEN 'Ter'
        WHEN 11 THEN 'Way'
        WHEN 12 THEN 'Sq'
        WHEN 13 THEN 'Loop'
        WHEN 14 THEN 'Trail'
        WHEN 15 THEN 'Hwy'
        WHEN 16 THEN 'Row'
        WHEN 17 THEN 'Path'
        WHEN 18 THEN 'Alley'
        ELSE 'Pass'
      END
    )
  ),
  customer_address_line2 = CASE 
    WHEN RAND() <= 0.2 THEN CONCAT(
      CASE CAST(FLOOR(RAND() * 3) AS INT64)
        WHEN 0 THEN 'Apt '
        WHEN 1 THEN 'Suite '
        ELSE 'Unit '
      END,
      CASE CAST(FLOOR(RAND() * 2) AS INT64)
        WHEN 0 THEN CONCAT(UPPER(SUBSTR('ABCDEF', CAST(FLOOR(RAND() * 6) + 1 AS INT64), 1)), CAST(FLOOR(RAND() * 999 + 1) AS STRING))
        ELSE CAST(FLOOR(RAND() * 1000 + 1) AS STRING)
      END
    )
    ELSE customer_address_line2
  END,
  email_address = CONCAT(
    LOWER(first_name), ".", LOWER(last_name), 
    CASE CAST(FLOOR(RAND() * 3) AS INT64)
      WHEN 0 THEN '@example.com'
      WHEN 1 THEN '@testmail.com'
      ELSE '@fakemail.org'
    END
  ),
  phone_number = CONCAT(
    '(', CAST(FLOOR(RAND() * 800 + 200) AS STRING), ') ',
    CAST(FLOOR(RAND() * 900 + 100) AS STRING), '-',
    CAST(FLOOR(RAND() * 9000 + 1000) AS STRING)
  ),
  customer_name = CONCAT(
    UPPER(SUBSTR(first_name, 1, 1)), SUBSTR(first_name, 2), ' ',
    UPPER(SUBSTR(last_name, 1, 1)), SUBSTR(last_name, 2)
  )
WHERE TRUE;

Breaking Down the Code

1. Generating first_name and last_name

  • The query randomly generates fake first and last names using a combination of consonants and vowels.
  • Logic:
    • Picks a random consonant → SUBSTR('bcdfghjklmnpqrstvwxyz', ...)
    • Adds a vowel → SUBSTR('aeiou', ...)
    • Combines them to form a short, readable name with optional additional vowels.
  • Names look somewhat real but are guaranteed to be fake.

2. Creating customer_address_line1

  • Combines a random house number with a randomly generated street name and type (e.g., St, Ave, Blvd).
  • Logic:
    • Randomly selects a number between 1–99,999.
    • Constructs a street name using consonants and vowels.
    • Appends a random street type from a list (e.g., “Ln”, “Way”, “Trail”).

3. Handling customer_address_line2

  • Adds apartment, suite, or unit details with a 20% probability.
  • Logic:
    • Randomly picks “Apt”, “Suite”, or “Unit”.
    • Adds a number or alphanumeric identifier.

4. Creating email_address

  1. email_address:
    • Combines first_name and last_name in lowercase.
    • Appends one of the fake domains (example.com, testmail.com, or fakemail.org).
    • Ensures the format looks like an email but is clearly fake.
    Example: john.doe@example.com

4. Creating a phone_number

  1. phone_number:
    • Generates a 10-digit number formatted as (XXX) XXX-XXXX.
    • Area code (XXX) is between 200–999 (valid area codes start with 2–9).
    • Ensures realistic phone formatting but with fake values.
    Example: (425) 678-1234

4. Combining customer_name

  • Formats the fake first and last names to title case (e.g., “John Smith”).

Final Notes

This query allows you to:

  • Generate secure, fake PII for testing.
  • Avoid compliance risks with real data.
  • Maintain data realism, ensuring effective system testing and demos.

©2024 DK New Media, LLC, All rights reserved | Disclosure

Originally Published on Martech Zone: BigQuery: How To Use SQL To Create Demo PII Name, Address, Email, and Phone Number Data


Viewing all articles
Browse latest Browse all 1361

Trending Articles