The %sql magic command allows you to execute Spark SQL queries directly within your Python or Scala notebooks. It provides a seamless way to interact with your data using SQL syntax while maintaining the ability to use the results in subsequent code cells.

Basic Usage

You can use %sql for single-line queries or %%sql for multi-line blocks.

Single-Line Query

%sql SELECT * FROM range(10)

Multi-Line Query

%%sql
SELECT 
   category, 
   COUNT(*) as count 
FROM 
   sales_data 
GROUP BY 
   category

Working with Results (`\_sqldf`)

Every successful execution of a %sql command automatically captures the result in two ways:

Python Variable - A Spark DataFrame named \_sqldf is created in the notebook's namespace.
Temporary View - A Spark temporary view named \_sqldf is registered, allowing you to query the result in subsequent SQL cells.

Note: Every successful %sql execution overwrites the previous \_sqldf. If a query fails, the previous successful \_sqldf is preserved.

Example: Using results in Python

# After running a %sql query
df = _sqldf
print(df.count())

Example: Using results in SQL

%sql SELECT * FROM _sqldf WHERE count > 100

Variable Interpolation and Parameters

Syntasa supports two ways to pass variables from your notebook into SQL queries.

Legacy Interpolation (`$var` or `${var}`)

You can substitute scalar variables (strings, integers, floats, and booleans) directly into your SQL.

Python - tbl = "my\_table" $\rightarrow$ %sql SELECT \* FROM $tbl
Scala - val limit = 10 $\rightarrow$ %sql SELECT \* FROM range($limit)

Named Parameter Markers (`:name`)

For injection-safe and type-correct binding (requires Spark 3.4+), use the colon prefix.

%sql SELECT * FROM users WHERE user_id = :uid

The system will look for a variable named uid in your notebook and bind it to the query.

Features and Behavior

Interactive Tables

Results are rendered as interactive HTML tables.

Display Limit - By default, the UI displays up to 10,000 rows.
Truncation - If a result exceeds the limit, a footer will notify you that the results are truncated and suggest using \_sqldf to access the full dataset.

Multi-Statement Support

You can run multiple SQL statements in a single %%sql cell by separating them with semicolons (;).

%%sql
USE my_database;
CREATE OR REPLACE TEMP VIEW recent_logs AS SELECT * FROM logs WHERE date > '2023-01-01';
SELECT COUNT(*) FROM recent_logs;

Comments

SQL Comments - Standard -- (line) and /\* ... \*/ (block) comments are supported.
Python-style Comments - You can use # for line comments; the system will strip them before execution to prevent Spark syntax errors.

Cross-Language Support

Scala Notebooks - Scala val bindings are automatically bridged. You can define a value in Scala and immediately use it in a %sql cell as a parameter or via interpolation.
%run Support - If you use %run to include another notebook, any %sql cells in the child notebook will execute correctly, and any temporary views created will be available in the parent notebook.

Error Handling

Syntax Errors - If your SQL has a syntax error, the system will display the error message from Spark without interrupting the notebook kernel.
Unresolved Variables - If you use a $var that isn't defined, a warning will be displayed, and the literal string $var will be sent to Spark.
Missing Parameters - If a :name marker is used but the variable is not found in the notebook, the query will not be sent to Spark, and an error will be shown.

Single-Line Query​

Multi-Line Query​

Working with Results (\_sqldf)

Example: Using results in Python​

Example: Using results in SQL​