Skip to main content

Replace Date Range - Process Modes

In data science workflows, especially in time-partitioned datasets, managing data updates with precision is critical. Syntasa’s Replace Data Range process mode provides a reliable and controlled method to update output tables based on a specified execution date range.

This process mode is designed to ensure freshness of data for a given range, without disturbing historical data outside the range. It intelligently replaces, preserves, or add new data partitions depending on whether they exist in the input source, the output table, or both.

How 'Replace Data Range' Process Mode Works?

The Replace Data Range process mode evaluates each date partition in relation to the input source, the output table, and the specified execution date range. The behavior falls into the following four cases:

  1. Overwrite Existing Partitions
    If a date partition is present in both the input and the output table, and it lies within the execution date range, that partition in the output is overwritten, even if the data is identical. You can refer Example 1 below for better understanding.
  2. Add New Partitions
    If a date partition is found in the input source but not in the output table, and it lies within the execution date range, it is added as a new partition in the output.  You can refer Example 1 below for better understanding
  3. Retain Partition Missing in Input Source
    If a date partition is found in the output table but not in the input source, and it lies within the execution date range, it is left untouched in the output table. You can refer Example 2 below for better understanding
  4. Retain Partition Outside Job Execution Range
    Any date partition in the output table that lies outside the specified execution date range is not affected at all, regardless of its presence or absence in the input. You can refer Example 3 below for better understanding

Examples

Let's break down three practical examples with detailed before-and-after tables. For simplicity, assume the execution timestamp is**7-Apr-2025 10:00 AM**  in all examples.

Example 1

This example demonstrates basic behavior where partitions are either replaced (if already present in the output table) or added (if missing), all within the specified execution date range.

Refer to the data below to understand this example:

  • Existing Output Table (A) currently includes partitions for: 2nd Jan – 5th Jan 2025
  • New Input source (B) contains data for the date range: 1st Jan – 9th Jan 2025
  • Job Execution Date Range is defined as: 1st Jan – 7th Jan 2025

Existing Output Table (A)

Date PartitionDataModified Timestamp
2nd Jan 2025chrome1-Apr-2025 09:00 AM
3rd Jan 2025edge1-Apr-2025 09:00 AM
4th Jan 2025firefox1-Apr-2025 09:00 AM
5th Jan 2025safari1-Apr-2025 09:00 AM

Input Source (B)

Date PartitionData
1st Jan 2025opera
2nd Jan 2025edge
3rd Jan 2025edge
4th Jan 2025safari
5th Jan 2025safari
6th Jan 2025firefox
7th Jan 2025firefox
8th Jan 2025firefox
9th Jan 2025chrome

Resulting Output Table After Execution

Date PartitionDataModified TimestampRemarks
1st Jan 2025opera7-Apr-2025 10:00 AM➕ New Partition
2nd Jan 2025edge7-Apr-2025 10:00 AM🔁 Replaced
3rd Jan 2025edge7-Apr-2025 10:00 AM🔁 Replaced (even with identical data)
4th Jan 2025safari7-Apr-2025 10:00 AM🔁 Replaced
5th Jan 2025safari7-Apr-2025 10:00 AM🔁 Replaced  (even with identical data)
6th Jan 2025firefox7-Apr-2025 10:00 AM➕ New Partition
7th Jan 2025firefox7-Apr-2025 10:00 AM➕ New Partition

Note: The Remarks column is provided for explanation purposes only and is not part of the actual output table. It is included here solely to help illustrate how each partition is handled during execution.

Example 2

It shows replacement of partitions within the execution date range, and also illustrates how existing partitions outside the execution range remain unchanged.

Refer to the data below to understand this example:

  • Existing Output Table (A) currently includes partitions for: 2nd Jan – 5th Jan 2025
  • New Input source (B) contains data for the date range: 1st Jan – 9th Jan 2025
  • Job Execution Date Range is defined as: 3rd Jan – 7th Jan 2025

Existing Output Table (A)

Date PartitionDataModified Timestamp
2nd Jan 2025chrome1-Apr-2025 09:00 AM
3rd Jan 2025edge1-Apr-2025 09:00 AM
4th Jan 2025firefox1-Apr-2025 09:00 AM
5th Jan 2025safari1-Apr-2025 09:00 AM

Input Source (B)

Date PartitionData
1st Jan 2025opera
2nd Jan 2025edge
3rd Jan 2025edge
4th Jan 2025safari
5th Jan 2025safari
6th Jan 2025firefox
7th Jan 2025firefox
8th Jan 2025firefox
9th Jan 2025chrome

Resulting Output Table After Execution

Date PartitionDataModified TimestampRemarks
2nd Jan 2025chrome1-Apr-2025 09:00 AM✅ Unchanged — outside execution date range
3rd Jan 2025edge7-Apr-2025 10:00 AM🔁 Replaced (even with identical data)
4th Jan 2025safari7-Apr-2025 10:00 AM🔁 Replaced
5th Jan 2025safari7-Apr-2025 10:00 AM🔁 Replaced  (even with identical data)
6th Jan 2025firefox7-Apr-2025 10:00 AM➕ New Partition
7th Jan 2025firefox7-Apr-2025 10:00 AM➕ New Partition

Note: The Remarks column is provided for explanation purposes only and is not part of the actual output table. It is included here solely to help illustrate how each partition is handled during execution.

Example 3

This example covers replacement of partitions within the execution range, along with a case where a partition exists in the output table but is missing in the input source — it remains untouched, even though it's within the execution date range.

Refer to the data below to understand this example:

  • Existing Output Table (A) currently includes partitions for: 2nd Jan – 5th Jan 2025
  • New Input source (B) contains data for the date range: 3rd Jan – 9th Jan 2025
  • Job Execution Date Range is defined as: 1st Jan – 7th Jan 2025

Existing Output Table (A)

Date PartitionDataModified Timestamp
2nd Jan 2025chrome1-Apr-2025 09:00 AM
3rd Jan 2025edge1-Apr-2025 09:00 AM
4th Jan 2025firefox1-Apr-2025 09:00 AM
5th Jan 2025safari1-Apr-2025 09:00 AM

Input Source (B)

Date PartitionData
3rd Jan 2025edge
4th Jan 2025safari
5th Jan 2025safari
6th Jan 2025firefox
7th Jan 2025firefox
8th Jan 2025firefox
9th Jan 2025chrome

Resulting Output Table After Execution

Date PartitionDataModified TimestampRemarks
2nd Jan 2025chrome1-Apr-2025 09:00 AM✅ Unchanged — date partition not found in input source
3rd Jan 2025edge7-Apr-2025 10:00 AM🔁 Replaced (even with identical data)
4th Jan 2025safari7-Apr-2025 10:00 AM🔁 Replaced
5th Jan 2025safari7-Apr-2025 10:00 AM🔁 Replaced  (even with identical data)
6th Jan 2025firefox7-Apr-2025 10:00 AM➕ New Partition
7th Jan 2025firefox7-Apr-2025 10:00 AM➕ New Partition

Note:
1st Jan 2025 record is not counted as it was not part of the job execution date range.
The Remarks column is provided for explanation purposes only and is not part of the actual output table. It is included here solely to help illustrate how each partition is handled during execution.

Important Points

  • Data Replacement Happens at the Partition Level, Not Row Level:
    • The system does not check for modifications in individual records within a date partition.
    • If data exists for a partition, the entire partition is replaced rather than modifying specific rows.
  • Ensure That a Proper Date Range Is Selected:
    • Since data outside the selected date range is not affected, ensure that the correct range is chosen to avoid unintended gaps in processing. For example, if the data already exists for 1st Jan to 4th Jan 2025 in output table and you run the job for the 1st Jan to 3rd Jan 2025, the data of 4th Jan 2025 will remain unaffected.

When to Use Replace Date Range?

The Replace Date Range process mode is recommended in the following scenarios:

  • Development Workflows (When Schema Structure is Stable):

    • Suitable when testing and refining data within a specific period while keeping the table structure unchanged.
    • Allows reprocessing data for selected date ranges without affecting other periods.
  • Production Workflows (Manual Processing for Complete Data Replacement):

    • Can be used in production for manual runs when a complete replacement of data is required for a specific date range.
    • Helps in situations where updates or corrections to a limited time frame are needed.
  • Historical Data Reprocessing:

    • Useful when customers or stakeholders request reprocessing of past data due to changes in business logic, data sources, or transformation rules.
    • Ensures that outdated or incorrect data for specific periods is replaced with newly processed results.
  • When Data Outside the Processing Date Range Should Be Preserved:

    • Unlike Drop & Replace, which clears the entire table, this mode ensures that records outside the specified date range remain unchanged.