Subqueries & CTEs

hypequery provides support for Common Table Expressions (CTEs) and raw SQL capabilities that allow you to create complex queries with subqueries.

Common Table Expressions (CTEs)

CTEs are temporary result sets that you can reference within a query. They help make complex queries more readable and maintainable.

Using QueryBuilder as a CTE

You can use another QueryBuilder instance as a CTE:

import { createQueryBuilder } from '@hypequery/core';

const builder = createQueryBuilder(schema);

// Create a subquery builder
const activeUsersSubquery = builder
  .table('users')
  .select(['id', 'user_name', 'email'])
  .where('status', 'eq', 'active');

// Use it as a CTE in the main query
const results = await builder
  .table('orders')
  .withCTE('active_users', activeUsersSubquery)
  .select([
    'orders.id', 
    'orders.total', 
    'active_users.user_name'
  ])
  .innerJoin('active_users', 'user_id', 'active_users.id')
  .execute();

This will generate SQL similar to:

WITH active_users AS (
  SELECT id, user_name, email 
  FROM users 
  WHERE status = 'active'
) 
SELECT orders.id, orders.total, active_users.user_name 
FROM orders 
INNER JOIN active_users ON orders.user_id = active_users.id

Using Raw SQL as a CTE

For more complex subqueries, you can use raw SQL strings:

const results = await builder
  .table('orders')
  .withCTE(
    'monthly_totals',
    'SELECT user_id, toStartOfMonth(created_at) as month, SUM(total) as monthly_sum FROM orders GROUP BY user_id, month'
  )
  .select([
    'orders.id',
    'orders.created_at',
    'monthly_totals.monthly_sum'
  ])
  .innerJoin('monthly_totals', 'user_id', 'monthly_totals.user_id')
  .where('orders.created_at', 'gte', 'monthly_totals.month')
  .execute();

Multiple CTEs

You can chain multiple CTEs for complex analytics:

const results = await builder
  .table('events')
  .withCTE(
    'daily_users',
    'SELECT user_id, toDate(timestamp) as day, COUNT(*) as event_count FROM events GROUP BY user_id, day'
  )
  .withCTE(
    'active_users',
    'SELECT user_id, COUNT(DISTINCT day) as active_days FROM daily_users GROUP BY user_id HAVING active_days > 7'
  )
  .select(['events.*'])
  .innerJoin('active_users', 'user_id', 'active_users.user_id')
  .execute();

Raw SQL Expressions

For complex conditions that can’t be expressed using the fluent API, you can use raw SQL expressions:

HAVING Clauses

The raw() method allows you to add custom conditions to the HAVING clause:

const results = await builder
  .table('orders')
  .select(['user_id'])
  .sum('total', 'total_spent')
  .groupBy(['user_id'])
  .raw('SUM(total) > 1000')
  .raw('COUNT(DISTINCT product_id) >= 3')
  .execute();

This will generate:

SELECT user_id, SUM(total) AS total_spent 
FROM orders 
GROUP BY user_id 
HAVING SUM(total) > 1000 AND COUNT(DISTINCT product_id) >= 3

Limitations and Workarounds

hypequery doesn’t directly support nested subqueries in WHERE clauses, but you can work around this with CTEs or raw SQL expressions:

Example: IN Subqueries

To achieve a query like:

SELECT * FROM products 
WHERE category_id IN (SELECT id FROM categories WHERE active = 1)

Use a CTE approach:

// Create a CTE for active categories
const results = await builder
  .table('products')
  .withCTE(
    'active_categories',
    'SELECT id FROM categories WHERE active = 1'
  )
  .select(['products.*'])
  .innerJoin('active_categories', 'category_id', 'active_categories.id')
  .execute();

Example: Correlated Subqueries

For more complex scenarios like correlated subqueries, you may need to use your database client directly:

import { ClickHouseConnection } from '@hypequery/clickhouse';

// For complex queries that hypequery can't easily build
const client = ClickHouseConnection.getClient();
const result = await client.query({
  query: `
    SELECT 
      u.id as user_id,
      u.user_name,
      (
        SELECT COUNT(*)
        FROM orders o
        WHERE o.user_id = u.id AND o.status = 'completed'
      ) as completed_orders
    FROM users u
    WHERE u.status = 'active'
  `,
  format: 'JSONEachRow'
});

const data = await result.json();

Use Cases

Analytics Queries

CTEs are particularly useful for analytics queries:

const results = await builder
  .table('events')
  .withCTE(
    'daily_stats',
    'SELECT toDate(timestamp) as day, COUNT(*) as events, COUNT(DISTINCT user_id) as users FROM events GROUP BY day'
  )
  .withCTE(
    'weekly_stats',
    'SELECT toStartOfWeek(day) as week, SUM(events) as events, SUM(users) as users FROM daily_stats GROUP BY week'
  )
  .select(['week', 'events', 'users'])
  .from('weekly_stats')
  .orderBy('week', 'DESC')
  .limit(10)
  .execute();

Hierarchical Data

For hierarchical data or recursive queries, use a raw SQL approach with CTEs:

const client = ClickHouseConnection.getClient();
const result = await client.query({
  query: `
    WITH RECURSIVE category_tree AS (
      SELECT id, parent_id, name, 1 as level
      FROM categories
      WHERE parent_id IS NULL
      
      UNION ALL
      
      SELECT c.id, c.parent_id, c.name, ct.level + 1
      FROM categories c
      INNER JOIN category_tree ct ON c.parent_id = ct.id
    )
    SELECT * FROM category_tree ORDER BY level, name
  `,
  format: 'JSONEachRow'
});

const data = await result.json();

Best Practices

  1. Use CTEs for readability: Break complex queries into logical parts with CTEs.

  2. Optimize CTE reuse: ClickHouse may materialize CTEs if used multiple times in a query.

  3. Consider performance: Complex subqueries can impact performance; test with representative data volumes.

  4. Type safety: When using raw SQL, you may lose type safety. Cast results appropriately.

See Also