The %sql magic command allows you to execute Spark SQL queries directly within your Python or Scala notebooks. It provides a seamless way to interact with your data using SQL syntax while maintaining the ability to use the results in subsequent code cells.
Basic Usage
You can use %sql for single-line queries or %%sql for multi-line blocks.
Single-Line Query
%sql SELECT * FROM range(10)
Multi-Line Query
%%sql
SELECT
category,
COUNT(*) as count
FROM
sales_data
GROUP BY
category
Working with Results (\_sqldf)
Every successful execution of a %sql command automatically captures the result in two ways:
- Python Variable - A Spark DataFrame named
\_sqldfis created in the notebook's namespace. - Temporary View - A Spark temporary view named
\_sqldfis registered, allowing you to query the result in subsequent SQL cells.
Note: Every successful %sql execution overwrites the previous \_sqldf. If a query fails, the previous successful \_sqldf is preserved.
Example: Using results in Python
# After running a %sql query
df = _sqldf
print(df.count())
Example: Using results in SQL
%sql SELECT * FROM _sqldf WHERE count > 100
Variable Interpolation and Parameters
Syntasa supports two ways to pass variables from your notebook into SQL queries.
Legacy Interpolation ($var or ${var})
You can substitute scalar variables (strings, integers, floats, and booleans) directly into your SQL.
- Python -
tbl = "my\_table"$\rightarrow$%sql SELECT \* FROM $tbl - Scala -
val limit = 10$\rightarrow$%sql SELECT \* FROM range($limit)
Named Parameter Markers (:name)
For injection-safe and type-correct binding (requires Spark 3.4+), use the colon prefix.
%sql SELECT * FROM users WHERE user_id = :uid
The system will look for a variable named uid in your notebook and bind it to the query.
Features and Behavior
Interactive Tables
Results are rendered as interactive HTML tables.
- Display Limit - By default, the UI displays up to 10,000 rows.
- Truncation - If a result exceeds the limit, a footer will notify you that the results are truncated and suggest using
\_sqldfto access the full dataset.
Multi-Statement Support
You can run multiple SQL statements in a single %%sql cell by separating them with semicolons (;).
%%sql
USE my_database;
CREATE OR REPLACE TEMP VIEW recent_logs AS SELECT * FROM logs WHERE date > '2023-01-01';
SELECT COUNT(*) FROM recent_logs;
Comments
- SQL Comments - Standard
--(line) and/\* ... \*/(block) comments are supported. - Python-style Comments - You can use
#for line comments; the system will strip them before execution to prevent Spark syntax errors.
Cross-Language Support
- Scala Notebooks - Scala
valbindings are automatically bridged. You can define a value in Scala and immediately use it in a%sqlcell as a parameter or via interpolation. - %run Support - If you use
%runto include another notebook, any%sqlcells in the child notebook will execute correctly, and any temporary views created will be available in the parent notebook.
Error Handling
- Syntax Errors - If your SQL has a syntax error, the system will display the error message from Spark without interrupting the notebook kernel.
- Unresolved Variables - If you use a
$varthat isn't defined, a warning will be displayed, and the literal string$varwill be sent to Spark. - Missing Parameters - If a
:namemarker is used but the variable is not found in the notebook, the query will not be sent to Spark, and an error will be shown.