Skip to main content

[Spring Batch] Implementing Custom Constraint Writer

ยท 4 min read

Situation ๐Ÿงโ€‹

Recently, I designed a batch process that uses Upsert in PostgreSQL for a specific logic. During implementation, due to a change in business requirements, I had to add a specific column to a composite unique condition.

The issue arose from the fact that the unique constraint of the composite unique column does not prevent duplicates with null values in a specific column.

Let's take a look at an example of the problematic situation.

create table student
(
id integer not null
constraint student_pk
primary key,
name varchar,
major varchar,
constraint student_unique
unique (name, major)
);
idnamemajor
1songkorean
2kimenglish
3parkmath
4kimNULL
5kimNULL

To avoid allowing null duplicates, the idea of inserting dummy data naturally came to mind, but I felt reluctant to store meaningless data in the database. Especially if the column where null occurs stores complex data like UUID, it would be very difficult to identify meaningless values buried among other values.

Although it may be a bit cumbersome, using a unique partial index allows us to disallow null values without inserting dummy data. I decided to pursue the most ideal solution, even if it is challenging.

Solutionโ€‹

Partial Indexโ€‹

CREATE UNIQUE INDEX stu_2col_uni_idx ON student (name, major)
WHERE major IS NOT NULL;

CREATE UNIQUE INDEX stu_1col_uni_idx ON student (name)
WHERE major IS NULL;

PostgreSQL provides the functionality of partial indexes.

Partial Index : A feature that creates an index only when certain conditions are met. It allows for efficient index creation and maintenance by narrowing the scope of the index.

When a value with only name is inserted, stu_1col_uni_idx allows only one row with the same name where major is null. By creating two complementary indexes, we can skillfully prevent duplicates with null values in a specific column.

duplicate error An error occurs when trying to store a value without major

However, when there are two unique constraints like this, since only one constraint check is allowed during Upsert execution, the batch did not run as intended.

After much deliberation, I decided to check if a specific value is missing before executing the SQL and then execute the SQL that meets the conditions.

Implementing SelectConstraintWriterโ€‹

public class SelectConstraintWriter extends JdbcBatchItemWriter<Student> {

@Setter
private String anotherSql;

@Override
public void write(List<? extends Student> items) {
if (items.isEmpty()) {
return;
}

List<? extends Student> existMajorStudents = items.stream()
.filter(student -> student.getMajor() != null)
.collect(toList());

List<? extends Student> nullMajorStudents = items.stream()
.filter(student -> student.getMajor() == null)
.collect(toList());

executeSql(existMajorStudents, sql);
executeSql(nullMajorStudents, anotherSql);
}

private void executeSql(List<? extends student> students, String sql) {
if (logger.isDebugEnabled()) {
logger.debug("Executing batch with " + students.size() + " items.");
}

int[] updateCounts;

if (usingNamedParameters) {
if (this.itemSqlParameterSourceProvider == null) {
updateCounts = namedParameterJdbcTemplate.batchUpdate(sql, students.toArray(new Map[students.size()]));
} else {
SqlParameterSource[] batchArgs = new SqlParameterSource[students.size()];
int i = 0;
for (student item : students) {
batchArgs[i++] = itemSqlParameterSourceProvider.createSqlParameterSource(item);
}
updateCounts = namedParameterJdbcTemplate.batchUpdate(sql, batchArgs);
}
} else {
updateCounts = namedParameterJdbcTemplate.getJdbcOperations().execute(sql,
(PreparedStatementCallback<int[]>) ps -> {
for (student item : students) {
itemPreparedStatementSetter.setValues(item, ps);
ps.addBatch();
}
return ps.executeBatch();
});
}

if (assertUpdates) {
for (int i = 0; i < updateCounts.length; i++) {
int value = updateCounts[i];
if (value == 0) {
throw new EmptyResultDataAccessException("Item " + i + " of " + updateCounts.length
+ " did not update any rows: [" + students.get(i) + "]", 1);
}
}
}
}
}

I implemented this by overriding the write method of the JdbcBatchItemWriter that was previously used. By checking the presence of major in the code and selecting and executing the appropriate SQL, we can ensure that the Upsert statement works correctly instead of encountering a duplicateKeyException.

Here is an example of usage:

@Bean
SelectConstraintWriter studentItemWriter() {
String sql1 =
"INSERT INTO student(id, name, major) "
+ "VALUES (nextval('hibernate_sequence'), :name, :major) "
+ "ON CONFLICT (name, major) WHERE major IS NOT NULL "
+ "DO UPDATE "
+ "SET name = :name, "
+ " major = :major";

String sql2 =
"INSERT INTO student(id, name, major) "
+ "VALUES (nextval('hibernate_sequence'), :name, :major) "
+ "ON CONFLICT (name) WHERE major IS NULL "
+ "DO UPDATE "
+ "SET name = :name, "
+ " major = :major";

SelectConstraintWriter writer = new SelectConstraintWriter();
writer.setSql(sql1);
writer.setAnotherSql(sql2);
writer.setDataSource(dataSource);
writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>());
writer.afterPropertiesSet();
return writer;
}

Conclusionโ€‹

It's regrettable that if PostgreSQL allowed multiple constraint checks during Upsert execution, we wouldn't have needed to go to such lengths. I hope for updates in future versions.


Referenceโ€‹

create unique constraint with null columns