Skip to main content

The %sql magic command allows you to execute Spark SQL queries directly within your Python or Scala notebooks. It provides a seamless way to interact with your data using SQL syntax while maintaining the ability to use the results in subsequent code cells.

Basic Usage

You can use %sql for single-line queries or %%sql for multi-line blocks.

Single-Line Query

%sql SELECT * FROM range(10)

Multi-Line Query

%%sql
SELECT
category,
COUNT(*) as count
FROM
sales_data
GROUP BY
category

Working with Results (\_sqldf)

Every successful execution of a %sql command automatically captures the result in two ways:

  • Python Variable - A Spark DataFrame named \_sqldf is created in the notebook's namespace.
  • Temporary View -  A Spark temporary view named \_sqldf is registered, allowing you to query the result in subsequent SQL cells.

Note: Every successful %sql execution overwrites the previous \_sqldf. If a query fails, the previous successful \_sqldf is preserved.

Example: Using results in Python

# After running a %sql query
df = _sqldf
print(df.count())

Example: Using results in SQL

%sql SELECT * FROM _sqldf WHERE count > 100

Variable Interpolation and Parameters

Syntasa supports two ways to pass variables from your notebook into SQL queries.

Legacy Interpolation ($var or ${var})

You can substitute scalar variables (strings, integers, floats, and booleans) directly into your SQL.

  • Python -  tbl = "my\_table" $\rightarrow$ %sql SELECT \* FROM $tbl
  • Scala -  val limit = 10 $\rightarrow$ %sql SELECT \* FROM range($limit)

Named Parameter Markers (:name)

For injection-safe and type-correct binding (requires Spark 3.4+), use the colon prefix.

%sql SELECT * FROM users WHERE user_id = :uid

The system will look for a variable named uid in your notebook and bind it to the query.

Features and Behavior

Interactive Tables

Results are rendered as interactive HTML tables.

  • Display Limit -  By default, the UI displays up to 10,000 rows.
  • Truncation - If a result exceeds the limit, a footer will notify you that the results are truncated and suggest using \_sqldf to access the full dataset.

Multi-Statement Support

You can run multiple SQL statements in a single %%sql cell by separating them with semicolons (;).

%%sql
USE my_database;
CREATE OR REPLACE TEMP VIEW recent_logs AS SELECT * FROM logs WHERE date > '2023-01-01';
SELECT COUNT(*) FROM recent_logs;

Comments

  • SQL Comments -  Standard -- (line) and /\* ... \*/ (block) comments are supported.
  • Python-style Comments -  You can use # for line comments; the system will strip them before execution to prevent Spark syntax errors.

Cross-Language Support

  • Scala Notebooks -  Scala val bindings are automatically bridged. You can define a value in Scala and immediately use it in a %sql cell as a parameter or via interpolation.
  • %run Support -  If you use %run to include another notebook, any %sql cells in the child notebook will execute correctly, and any temporary views created will be available in the parent notebook.

Error Handling

  • Syntax Errors -  If your SQL has a syntax error, the system will display the error message from Spark without interrupting the notebook kernel.
  • Unresolved Variables -  If you use a $var that isn't defined, a warning will be displayed, and the literal string $var will be sent to Spark.
  • Missing Parameters - If a :name marker is used but the variable is not found in the notebook, the query will not be sent to Spark, and an error will be shown.