[Spring Batch] Implementing Custom Constraint Writer
Situation 🧐
Recently, I designed a batch process that uses Upsert
in PostgreSQL
for a specific logic. During implementation, due
to a change in business requirements, I had to add a specific column to a composite unique condition.
The issue arose from the fact that the unique constraint of the composite unique column does not prevent duplicates
with null
values in a specific column.
Let's take a look at an example of the problematic situation.
create table student
(
id integer not null
constraint student_pk
primary key,
name varchar,
major varchar,
constraint student_unique
unique (name, major)
);
id | name | major |
---|---|---|
1 | song | korean |
2 | kim | english |
3 | park | math |
4 | kim | NULL |
5 | kim | NULL |
To avoid allowing null
duplicates, the idea of inserting dummy data naturally came to mind, but I felt reluctant to
store meaningless data in the database. Especially if the column where null
occurs stores complex data like UUID
, it
would be very difficult to identify meaningless values buried among other values.
Although it may be a bit cumbersome, using a unique partial index
allows us to disallow null
values without
inserting dummy data. I decided to pursue the most ideal solution, even if it is challenging.
Solution
Partial Index
CREATE UNIQUE INDEX stu_2col_uni_idx ON student (name, major)
WHERE major IS NOT NULL;
CREATE UNIQUE INDEX stu_1col_uni_idx ON student (name)
WHERE major IS NULL;
PostgreSQL
provides the functionality of partial indexes.
Partial Index : A feature that creates an index only when certain conditions are met. It allows for efficient index creation and maintenance by narrowing the scope of the index.
When a value with only name
is inserted, stu_1col_uni_idx
allows only one row with the same name
where major
is null
. By creating two complementary indexes, we can skillfully prevent duplicates with null
values in a specific
column.
An error occurs when trying to store a value without major
However, when there are two unique constraints like this, since only one constraint check is allowed during Upsert
execution, the batch did not run as intended.
After much deliberation, I decided to check if a specific value is missing before executing the SQL and then execute the SQL that meets the conditions.
Implementing SelectConstraintWriter
public class SelectConstraintWriter extends JdbcBatchItemWriter<Student> {
@Setter
private String anotherSql;
@Override
public void write(List<? extends Student> items) {
if (items.isEmpty()) {
return;
}
List<? extends Student> existMajorStudents = items.stream()
.filter(student -> student.getMajor() != null)
.collect(toList());
List<? extends Student> nullMajorStudents = items.stream()
.filter(student -> student.getMajor() == null)
.collect(toList());
executeSql(existMajorStudents, sql);
executeSql(nullMajorStudents, anotherSql);
}
private void executeSql(List<? extends student> students, String sql) {
if (logger.isDebugEnabled()) {
logger.debug("Executing batch with " + students.size() + " items.");
}
int[] updateCounts;
if (usingNamedParameters) {
if (this.itemSqlParameterSourceProvider == null) {
updateCounts = namedParameterJdbcTemplate.batchUpdate(sql, students.toArray(new Map[students.size()]));
} else {
SqlParameterSource[] batchArgs = new SqlParameterSource[students.size()];
int i = 0;
for (student item : students) {
batchArgs[i++] = itemSqlParameterSourceProvider.createSqlParameterSource(item);
}
updateCounts = namedParameterJdbcTemplate.batchUpdate(sql, batchArgs);
}
} else {
updateCounts = namedParameterJdbcTemplate.getJdbcOperations().execute(sql,
(PreparedStatementCallback<int[]>) ps -> {
for (student item : students) {
itemPreparedStatementSetter.setValues(item, ps);
ps.addBatch();
}
return ps.executeBatch();
});
}
if (assertUpdates) {
for (int i = 0; i < updateCounts.length; i++) {
int value = updateCounts[i];
if (value == 0) {
throw new EmptyResultDataAccessException("Item " + i + " of " + updateCounts.length
+ " did not update any rows: [" + students.get(i) + "]", 1);
}
}
}
}
}
I implemented this by overriding the write
method of the JdbcBatchItemWriter
that was previously used. By checking
the presence of major
in the code and selecting and executing the appropriate SQL, we can ensure that the Upsert
statement works correctly instead of encountering a duplicateKeyException
.
Here is an example of usage:
@Bean
SelectConstraintWriter studentItemWriter() {
String sql1 =
"INSERT INTO student(id, name, major) "
+ "VALUES (nextval('hibernate_sequence'), :name, :major) "
+ "ON CONFLICT (name, major) WHERE major IS NOT NULL "
+ "DO UPDATE "
+ "SET name = :name, "
+ " major = :major";
String sql2 =
"INSERT INTO student(id, name, major) "
+ "VALUES (nextval('hibernate_sequence'), :name, :major) "
+ "ON CONFLICT (name) WHERE major IS NULL "
+ "DO UPDATE "
+ "SET name = :name, "
+ " major = :major";
SelectConstraintWriter writer = new SelectConstraintWriter();
writer.setSql(sql1);
writer.setAnotherSql(sql2);
writer.setDataSource(dataSource);
writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>());
writer.afterPropertiesSet();
return writer;
}
Conclusion
It's regrettable that if PostgreSQL
allowed multiple constraint checks during Upsert
execution, we wouldn't have
needed to go to such lengths. I hope for updates in future versions.