Skip to main content

Optimizing Pagination in Spring Batch with Composite Keys

· 7 min read
Haril Song
Owner, Software Engineer at 42dot

In this article, I will discuss the issues and solutions encountered when querying a table with millions of data using Spring Batch.

Environment

  • Spring Batch 5.0.1
  • PostgreSQL 11

Problem

While using JdbcPagingItemReader to query a large table, I noticed a significant slowdown in query performance over time and decided to investigate the code in detail.

Default Behavior

The following query is automatically generated and executed by the PagingQueryProvider:

SELECT *
FROM large_table
WHERE id > ?
ORDER BY id
LIMIT 1000;

In Spring Batch, when using JdbcPagingItemReader, instead of using an offset, it generates a where clause for pagination. This allows for fast retrieval of data even from tables with millions of records without any delays.

tip

Even with LIMIT, using OFFSET means reading all previous data again. Therefore, as the amount of data to be read increases, the performance degrades. For more information, refer to the article1.

Using Multiple Sorting Conditions

The problem arises when querying a table with composite keys. When a composite key consisting of 3 columns is used as the sort key, the generated query looks like this:

SELECT *
FROM large_table
WHERE ((create_at > ?) OR
(create_at = ? AND user_id > ?) OR
(create_at = ? AND user_id = ? AND content_no > ?))
ORDER BY create_at, user_id, content_no
LIMIT 1000;

However, queries with OR operations in the where clause do not utilize indexes effectively. OR operations require executing multiple conditions, making it difficult for the optimizer to make accurate decisions. When I examined the explain output, I found the following results:

Limit  (cost=0.56..1902.12 rows=1000 width=327) (actual time=29065.549..29070.808 rows=1000 loops=1)
-> Index Scan using th_large_table_pkey on large_table (cost=0.56..31990859.76 rows=16823528 width=327) (actual time=29065.547..29070.627 rows=1000 loops=1)
" Filter: ((""create_at"" > '2023-01-28 06:58:13'::create_at without time zone) OR ((""create_at"" = '2023-01-28 06:58:13'::create_at without time zone) AND ((user_id)::text > '441997000'::text)) OR ((""create_at"" = '2023-01-28 06:58:13'::create_at without time zone) AND ((user_id)::text = '441997000'::text) AND ((content_no)::text > '9070711'::text)))"
Rows Removed by Filter: 10000001
Planning Time: 0.152 ms
Execution Time: 29070.915 ms

With a query execution time close to 30 seconds, most of the data is discarded during filtering on the index, resulting in unnecessary time wastage.

Since PostgreSQL manages composite keys as tuples, writing queries using tuples allows for utilizing the advantages of Index scan even in complex where clauses.

SELECT *
FROM large_table
WHERE (create_at, user_id, content_no) > (?, ?, ?)
ORDER BY create_at, user_id, content_no
LIMIT 1000;
Limit  (cost=0.56..1196.69 rows=1000 width=327) (actual time=3.204..11.393 rows=1000 loops=1)
-> Index Scan using th_large_table_pkey on large_table (cost=0.56..20122898.60 rows=16823319 width=327) (actual time=3.202..11.297 rows=1000 loops=1)
" Index Cond: (ROW(""create_at"", (user_id)::text, (content_no)::text) > ROW('2023-01-28 06:58:13'::create_at without time zone, '441997000'::text, '9070711'::text))"
Planning Time: 0.276 ms
Execution Time: 11.475 ms

It can be observed that data is directly retrieved through the index without discarding any data through filtering.

Therefore, when the query executed by JdbcPagingItemReader uses tuples, it means that even when using composite keys as sort keys, processing can be done very quickly.

Let's dive into the code immediately.

Modifying PagingQueryProvider

Analysis

As mentioned earlier, the responsibility of generating queries lies with the PagingQueryProvider. Since I am using PostgreSQL, the PostgresPagingQueryProvider is selected and used.

image The generated query differs based on whether it includes a group by clause.

By examining SqlPagingQueryUtils's buildSortConditions, we can see how the problematic query is generated.

image

Within the nested for loop, we can see how the query is generated based on the sort key.

Customizing buildSortConditions

Having directly inspected the code responsible for query generation, I decided to modify this code to achieve the desired behavior. However, direct overriding of this code is not possible, so I created a new class called PostgresOptimizingQueryProvider and re-implemented the code within this class.

private String buildSortConditions(StringBuilder sql) {
Map<String, Order> sortKeys = getSortKeys();
sql.append("(");
sortKeys.keySet().forEach(key -> sql.append(key).append(", "));
sql.delete(sql.length() - 2, sql.length());
if (is(sortKeys, order -> order == Order.ASCENDING)) {
sql.append(") > (");
} else if (is(sortKeys, order -> order == Order.DESCENDING)) {
sql.append(") < (");
} else {
throw new IllegalStateException("Cannot mix ascending and descending sort keys"); // Limitation of tuples
}
sortKeys.keySet().forEach(key -> sql.append("?, "));
sql.delete(sql.length() - 2, sql.length());
sql.append(")");
return sql.toString();
}

Test Code

To ensure that the newly implemented section works correctly, I validated it through a test code.

@Test
@DisplayName("The Where clause generated instead of Offset is (create_at, user_id, content_no) > (?, ?, ?).")
void test() {
// given
PostgresOptimizingQueryProvider queryProvider = new PostgresOptimizingQueryProvider();
queryProvider.setSelectClause("*");
queryProvider.setFromClause("large_table");

Map<String, Order> parameterMap = new LinkedHashMap<>();
parameterMap.put("create_at", Order.ASCENDING);
parameterMap.put("user_id", Order.ASCENDING);
parameterMap.put("content_no", Order.ASCENDING);
queryProvider.setSortKeys(parameterMap);

// when
String firstQuery = queryProvider.generateFirstPageQuery(10);
String secondQuery = queryProvider.generateRemainingPagesQuery(10);

// then
assertThat(firstQuery).isEqualTo("SELECT * FROM large_table ORDER BY create_at ASC, user_id ASC, content_no ASC LIMIT 10");
assertThat(secondQuery).isEqualTo("SELECT * FROM large_table WHERE (create_at, user_id, content_no) > (?, ?, ?) ORDER BY create_at ASC, user_id ASC, content_no ASC LIMIT 10");
}

image

The successful execution confirms that it is working as intended, and I proceeded to run the batch.

image

Guy: "is it over?"

Boy: "Shut up, It'll happen again!"

-- Within the Webtoon Hive

However, the out of range error occurred, indicating that the query was not recognized as having changed.

image

It seems that the parameter injection part is not automatically recognized just because the query has changed, so let's debug again to find the parameter injection part.

JdbcOptimizedPagingItemReader

The parameter is directly created by JdbcPagingItemReader, and I found that the number of parameters to be injected into SQL is increased by iterating through getParameterList in JdbcPagingItemReader.

image

I thought I could just override this method, but unfortunately it is not possible because it is private. After much thought, I copied the entire JdbcPagingItemReader and modified only the getParameterList part.

The getParameterList method is overridden in JdbcOptimizedPagingItemReader as follows:

private List<Object> getParameterList(Map<String, Object> values, Map<String, Object> sortKeyValue) {
// ...
// Returns the parameters that need to be set in the where clause without increasing them.
return new ArrayList<>(sortKeyValue.values());
}

There is no need to add sortKeyValue, so it is directly added to parameterList and returned.

Now, let's run the batch again.

The first query is executed without requiring parameters,

2023-03-13T17:43:14.240+09:00 DEBUG 70125 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL query [SELECT * FROM large_table ORDER BY create_at ASC, user_id ASC, content_no ASC LIMIT 2000]

The subsequent query execution receives parameters from the previous query.

2023-03-13T17:43:14.253+09:00 DEBUG 70125 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [SELECT * FROM large_table WHERE (create_at, user_id, content_no) > (?, ?, ?) ORDER BY create_at ASC, user_id ASC, content_no ASC LIMIT 2000]

The queries are executed exactly as intended! 🎉

For pagination processing with over 10 million records, queries that used to take around 30 seconds now run in the range of 0.1 seconds, representing a significant performance improvement of nearly 300 times.

image

Now, regardless of the amount of data, queries can be read within milliseconds without worrying about performance degradation. 😎

Conclusion

In this article, I introduced the method used to optimize Spring Batch in an environment with composite keys. However, there is a drawback to this method: all columns that make up the composite key must have the same sorting condition. If desc or asc are mixed within the index condition generated by the composite key, a separate index must be used to resolve this issue 😢

Let's summarize today's content in one line and conclude the article.

"Avoid using composite keys as much as possible and use surrogate keys unrelated to the business."

Reference


Footnotes

  1. https://jojoldu.tistory.com/528

Improving Code Productivity with Design Patterns in O2

· 7 min read
Haril Song
Owner, Software Engineer at 42dot

This article discusses the process of using design patterns to improve the structure of the O2 project for more flexible management.

Problem

While diligently working on development, a sudden Issue was raised one day.

image

Reflecting the contents of the Issue was not difficult. However, as I delved into the code, an issue that had been put off for a while started to surface.

image

Below is the implementation of the Markdown syntax conversion code that had been previously written.

warning

Due to the length of the code, a partial excerpt is provided. For the full code, please refer to O2 plugin v1.1.1 🙏

export async function convertToChirpy(plugin: O2Plugin) {
try {
await backupOriginalNotes(plugin);
const markdownFiles = await renameMarkdownFile(plugin);
for (const file of markdownFiles) {
// remove double square brackets
const title = file.name.replace('.md', '').replace(/\s/g, '-');
const contents = removeSquareBrackets(await plugin.app.vault.read(file));
// change resource link to jekyll link
const resourceConvertedContents = convertResourceLink(plugin, title, contents);

// callout
const result = convertCalloutSyntaxToChirpy(resourceConvertedContents);

await plugin.app.vault.modify(file, result);
}

await moveFilesToChirpy(plugin);
new Notice('Chirpy conversion complete.');
} catch (e) {
console.error(e);
new Notice('Chirpy conversion failed.');
}
}

Being unfamiliar with TypeScript and Obsidian usage, I had focused more on implementing features rather than the overall design. Now, trying to add new features, it became difficult to anticipate side effects, and the code implementation lacked clear communication of the developer's intent.

To better understand the code flow, I created a graph of the current process as follows.

Although I had separated functionalities into functions, the code was still procedurally written, where the order of code lines significantly impacted the overall operation. Adding a new feature in this state would require precise implementation to avoid breaking the entire conversion process. So, where would be the correct place to implement a new feature? Most likely, the answer would be 'I need to see the code.' Currently, with most of the code written in one large file, it was almost equivalent to needing to analyze the entire code. In object-oriented terms, one could say that the Single Responsibility Principle (SRP) was not being properly followed.

This state, no matter how positively described, did not seem easy to maintain. Since the O2 plugin was created for my personal use, I could not justify producing spaghetti code that was difficult to maintain by rationalizing, 'It's because I'm not familiar with TS.'

Before resolving the Issue, I decided to first improve the structure, putting the glory aside for a moment.

What Structure Should Be Implemented?

The O2 plugin, as a syntax conversion plugin, must be capable of converting Obsidian's Markdown syntax into various formats, which is a clear requirement.

Therefore, the design should focus primarily on scalability.

Each platform logic should be modularized, and the conversion process abstracted to implement it like a template. This way, when implementing new features for supporting different platform syntaxes, developers can focus only on the small unit of implementing syntax conversion without needing to reimplement the entire flow.

Based on this, the design requirements are as follows:

  1. Strings (content of Markdown files) should be converted in order (or not) as needed.
  2. Specific conversion logic should be skippable or dynamically controllable based on external settings.
  3. Implementing new features should be simple and should have minimal or no impact on existing code.

As there is a sequence of execution, and the ability to add features in between, the Chain of Responsibility pattern seemed appropriate for this purpose.

Applying Design Patterns

Process->Process->Process->Done! : Summary of Chain of Responsibility

export interface Converter {
setNext(next: Converter): Converter;
convert(input: string): string;
}

export abstract class AbstractConverter implements Converter {
private next: Converter;

setNext(next: Converter): Converter {
this.next = next;
return next;
}

convert(input: string): string {
if (this.next) {
return this.next.convert(input);
}
return input;
}
}

The Converter interface plays a role in converting specific strings through convert(input). By specifying the next Converter to be processed with setNext, and returning the Converter again, method chaining can be used.

With the abstraction in place, the conversion logic that was previously implemented in one file was separated into individual Converter implementations, assigning responsibility for each feature. Below is an example of the CalloutConverter that separates the Callout syntax conversion logic.

export class CalloutConverter extends AbstractConverter {
convert(input: string): string {
const result = convertCalloutSyntaxToChirpy(input);
return super.convert(result);
}
}

function convertCalloutSyntaxToChirpy(content: string) {
function replacer(match: string, p1: string, p2: string) {
return `${p2}\n{: .prompt-${replaceKeyword(p1)}}`;
}

return content.replace(ObsidianRegex.CALLOUT, replacer);
}

Now, the relationships between the classes are as follows.

Now, by combining the smallest units of functionality implemented in each Converter, a chain is created to perform operations in sequence. This is why this pattern is called Chain of Responsibility.

export async function convertToChirpy(plugin: O2Plugin) {
// ...
// Create conversion chain
frontMatterConverter.setNext(bracketConverter)
.setNext(resourceLinkConverter)
.setNext(calloutConverter);

// Request the frontMatterConverter at the head to perform the conversion, and the connected converters will operate sequentially.
const result = frontMatterConverter.convert(await plugin.app.vault.read(file));
await plugin.app.vault.modify(file, result);
// ...
}

Now that the logic has been separated into appropriate responsibilities, reading the code has become much easier. When needing to add a new feature, only the necessary Converter needs to be implemented. Additionally, without needing to know how other Converters work, new features can be added through setNext. Each Converter operates independently, following the principle of encapsulation.

Finally, I checked if all tests passed and created a PR.

image

Next Step

Although the structure has improved significantly, there is one remaining drawback. In the structure linked through setNext, calling the Converter at the very front is necessary for proper operation. If a different Converter is called instead of the one at the front, the result may be different from the intended one. If, for example, a NewConverter is implemented before frontMatterConverter but frontMatterConverter.convert(input) is not modified, NewConverter will not be applied.

This is one of the aspects that developers need to pay attention to, as there is room for error, and it is one of the areas that needs improvement in the future. For instance, implementing a kind of Context to contain the Converters and executing the conversion process without directly calling the Converters could be a way to improve. This is something I plan to implement in the next version.


2023-03-12 Update

Thanks to PR, the same functionality was performed, but with a more flexible structure using composition instead of inheritance.

Conclusion

In this article, I described the process of redistributing roles and responsibilities through design patterns from a procedurally written, monolithic file to a more object-oriented and maintainable code.

info

The complete code can be found on GitHub.

Caution when setting sortKeys in JdbcItemReader

· 4 min read
Haril Song
Owner, Software Engineer at 42dot

I would like to share an issue I encountered while retrieving large amounts of data in PostgreSQL.

Problem

While using the Spring Batch JdbcPagingItemReader, I set the sortKeys as follows:

...
.selectClause("SELECT *")
.fromClause("FROM big_partitioned_table_" + yearMonth)
.sortKeys(Map.of(
"timestamp", Order.ASCENDING,
"mmsi", Order.ASCENDING,
"imo_no", Order.ASCENDING
)
)
...

Although the current table's index is set as a composite index with timestamp, mmsi, and imo_no, I expected an Index scan to occur during the retrieval. However, in reality, a Seq scan occurred. The target table contains around 200 million records, causing the batch process to show no signs of completion. Eventually, I had to forcibly shut down the batch. Why did a Seq scan occur even when querying with index conditions? 🤔

In PostgreSQL, Seq scans occur in the following cases:

  • When the optimizer determines that a Seq scan is faster due to the table having a small amount of data
  • When the data being queried is too large (more than 10% of the table), and the optimizer deems Index scan less efficient than Seq scan
    • In such cases, you can use limit to adjust the amount of data and execute an Index scan

In this case, since select * was used, there was a possibility of a Seq scan due to the large amount of data being queried. However, due to the chunk size, the query was performed with limit, so I thought that Index scan would occur continuously.

Debugging

To identify the exact cause, let's check the actual query being executed. By slightly modifying the YAML configuration, we can observe the queries executed by the JdbcPagingItemReader.

logging:
level.org.springframework.jdbc.core.JdbcTemplate: DEBUG

I reran the batch process to directly observe the queries.

SELECT * FROM big_partitioned_table_202301 ORDER BY imo_no ASC, mmsi ASC, timestamp ASC LIMIT 1000

The order of the order by clause seemed odd, so I ran it again.

SELECT * FROM big_partitioned_table_202301 ORDER BY timestamp ASC, mmsi ASC, imo_no ASC LIMIT 1000

It was evident that the order by condition was changing with each execution.

To ensure the correct order for an Index scan, the sorting conditions must be specified in the correct order. Passing a general Map to sortKeys does not guarantee the order, leading to the SQL query not executing as intended.

To maintain order, you can use a LinkedHashMap to create the sortKeys.

Map<String, Order> sortKeys = new LinkedHashMap<>();
sortKeys.put("timestamp", Order.ASCENDING);
sortKeys.put("mmsi", Order.ASCENDING);
sortKeys.put("imo_no", Order.ASCENDING);

After making this adjustment and rerunning the batch, we could confirm that the sorting conditions were specified in the correct order.

SELECT * FROM big_partitioned_table_202301 ORDER BY timestamp ASC, mmsi ASC, imo_no ASC LIMIT 1000

Conclusion

The issue of Seq scan occurring instead of an Index scan was not something that could be verified with the application's test code, so we were unaware of any potential bugs. It was only when we observed a significant slowdown in the batch process in the production environment that we realized something was amiss. During development, I had not anticipated that the order of sorting conditions could change due to the Map data structure.

Fortunately, if an Index scan does not occur due to the large amount of data being queried, the batch process would slow down significantly with the LIMIT query, making it easy to notice the issue. However, if the data volume was low and the execution speeds of Index scan and Seq scan were similar, it might have taken a while to notice the problem.

Further consideration is needed on how to anticipate and address this issue in advance. Since the order of the order by condition is often crucial, it is advisable to use LinkedHashMap over HashMap whenever possible.

[O2] Developing an Obsidian Plugin

· 7 min read
Haril Song
Owner, Software Engineer at 42dot

Overview

Obsidian provides a graph view through links between Markdown files, making it convenient to store and navigate information. However, to achieve this, Obsidian enforces its own unique syntax in addition to the original Markdown syntax. This can lead to areas of incompatibility when reading Markdown documents from Obsidian on other platforms.

Currently, I use a Jekyll blog for posting, so when I write in Obsidian, I have to manually adjust the syntax later for blog publishing. Specifically, the workflow involves:

  • Using [[]] for file links, which is Obsidian's unique syntax
  • Resetting attachment paths, including image files
  • Renaming title.md to yyyy-MM-dd-title.md
  • Callout syntax

image Double-dashed arrows crossing layer boundaries require manual intervention.

As I use both Obsidian and Jekyll concurrently, there was a need to automate this syntax conversion process and attachment copying process.

Since Obsidian allows for functionality extension through community plugins unlike Notion, I decided to try creating one myself. After reviewing the official documentation, I found that Obsidian guides plugin development based on NodeJS. While the language options were limited, I had an interest in TypeScript, so I set up a NodeJS/TS environment to study.

Implementation Process

Naming

I first tackled the most important part of development.

It didn't take as long as I thought, as I came up with the project name 'O2' based on a sudden idea while writing a description, 'convert Obsidian syntax to Jekyll,' for the plugin.

image

Preparation for Conversion

With a suitable name in place, the next step was to determine how to convert which files.

The workflow for blog posting is as follows:

  1. Write drafts in a folder named ready.
  2. Once the manuscript is complete, copy the files, including attachments, to the Jekyll project, appropriately converting Obsidian syntax to Jekyll syntax in the process.
  3. Move the manuscript from the ready folder to published to indicate that it has been published.

I decided to program this workflow as is. However, instead of editing the original files in a Jekyll project open in VScode, I opted to create and modify copies internally in the plugin workspace to prevent modification of the original files and convert them to Jekyll syntax.

To summarize this step briefly:

  1. Copy the manuscript A.md from /ready to /published without modifying /published/A.md.
  2. Convert the title and syntax of /ready/A.md.
  3. Move /ready/yyyy-MM-dd-A.md to the path for Jekyll publishing.

Let's start the implementation.

Copying Original Files

// Get only Markdown files in the ready folder
function getFilesInReady(plugin: O2Plugin): TFile[] {
return this.app.vault.getMarkdownFiles()
.filter((file: TFile) => file.path.startsWith(plugin.settings.readyDir))
}

// Copy files to the published folder
async function copyToPublishedDirectory(plugin: O2Plugin) {
const readyFiles = getFilesInReady.call(this, plugin)
readyFiles.forEach((file: TFile) => {
return this.app.vault.copy(file, file.path.replace(plugin.settings.readyDir, plugin.settings.publishedDir))
})
}

By fetching Markdown files inside the /ready folder and replacing file.path with publishedDir, copying can be done easily.

Copying Attachments and Resetting Paths

function convertResourceLink(plugin: O2Plugin, title: string, contents: string) {
const absolutePath = this.app.vault.adapter.getBasePath()
const resourcePath = `${plugin.settings.jekyllResourcePath}/${title}`
fs.mkdirSync(resourcePath, {recursive: true})

const relativeResourcePath = plugin.settings.jekyllRelativeResourcePath

// Copy resourceDir/image.png to assets/img/<title>/image.png before changing
extractImageName(contents)?.forEach((resourceName) => {
fs.copyFile(
`${absolutePath}/${plugin.settings.resourceDir}/${resourceName}`,
`${resourcePath}/${resourceName}`,
(err) => {
if (err) {
new Notice(err.message)
}
}
)
})
// Syntax conversion
return contents.replace(ObsidianRegex.IMAGE_LINK, `![image](/${relativeResourcePath}/${title}/$1)`)
}

Attachments require moving files outside the vault, which cannot be achieved using Obsidian's default APIs. Therefore, direct file system access using fs is necessary.

info

Direct file system access implies difficulty in mobile usage, so the Obsidian official documentation guides specifying isDesktopOnly as true in manifest.json in such cases.

Before moving Markdown files to the Jekyll project, the Obsidian image link syntax is parsed to identify image filenames, which are then moved to Jekyll's resource folder so that the Markdown default image links are converted correctly, allowing attachments to be found.

Callout Syntax Conversion

Obsidian callout

> [!NOTE] callout title
> callout contents

Supported keywords: tip, info, note, warning, danger, error, etc.

Jekyll chirpy callout

> callout contents
{: .promt-info}

Supported keywords: tip, info, warning, danger

As the syntax of the two differs, regular expressions are used to substitute this part, requiring implementation of a replacer.

export function convertCalloutSyntaxToChirpy(content: string) {
function replacer(match: string, p1: string, p2: string) {
if (p1.toLowerCase() === 'note') {
p1 = 'info'
}
if (p1.toLowerCase() === 'error') {
p1 = 'danger'
}
return `${p2}\n{: .prompt-${p1.toLowerCase()}}`
}

return content.replace(ObsidianRegex.CALLOUT, replacer)
}

Unsupported keywords in Jekyll are converted to other keywords with similar roles.

Moving Completed Files

The Jekyll-based blog I currently use has a specific path where posts need to be located for publishing. Since the Jekyll project location may vary per client, custom path handling is required. I decided to set this up through a settings tab and created an input form like the one below.

image

Once all conversions are done, moving the files to the _post path in Jekyll completes the conversion process.

async function moveFilesToChirpy(plugin: O2Plugin) {
// Absolute path is needed to move files outside the vault
const absolutePath = this.app.vault.adapter.getBasePath()
const sourceFolderPath = `${absolutePath}/${plugin.settings.readyDir}`
const targetFolderPath = plugin.settings.targetPath()

fs.readdir(sourceFolderPath, (err, files) => {
if (err) throw err

files.forEach((filename) => {
const sourceFilePath = path.join(sourceFolderPath, filename)
const targetFilePath = path.join(targetFolderPath, filename)

fs.rename(sourceFilePath, targetFilePath, (err) => {
if (err) {
console.error(err)
new Notice(err.message)
throw err
}
})
})
})
}

Regular Expressions

export namespace ObsidianRegex {
export const IMAGE_LINK = /!\[\[(.*?)]]/g
export const DOCUMENT_LINK = /(?<!!)\[\[(.*?)]]/g
export const CALLOUT = /> \[!(NOTE|WARNING|ERROR|TIP|INFO|DANGER)].*?\n(>.*)/ig
}

Special syntax unique to Obsidian was handled using regular expressions for parsing. By using groups, specific parts could be extracted for conversion, making the process more convenient.

Creating a PR for Community Plugin Release

Finally, to register the plugin in the community plugin repository, I conclude by creating a PR. It is essential to adhere to community guidelines; otherwise, the PR may be rejected. Obsidian provides guidance on what to be mindful of when developing plugins, so it's crucial to follow these guidelines as closely as possible.

image

Based on previous PRs, it seems that merging takes approximately 2-4 weeks. If feedback is received later, I will make the necessary adjustments and patiently wait for the merge.

Conclusion

I thought, 'This should be a quick job, maybe done in 3 days,' but trying to implement the plugin while traveling abroad ended up taking about a week, including creating the release PR 😂

image I wonder if Kent Beck and Erich Gamma, who developed JUnit, coded like this on a plane...

Switching to TypeScript from Java or Kotlin made things challenging, as I wasn't familiar with it, and I wasn't confident if the code I was writing was best practice. However, thanks to this, I delved into JS syntax like async-await in detail, adding another technology stack to my repertoire. It's a proud feeling. This also gave me a new topic to write about.

The best part is that there's almost no need for manual work in blog posting anymore! After converting the syntax with the plugin, I only need to do a spell check before pushing to GitHub. Of course, there are still many bugs...

Moving forward, I plan to continue studying TypeScript gradually to eliminate anti-patterns in the plugin and improve the design for cleaner modules.

If you're facing similar dilemmas, contributing to the project or collaborating in other ways to build it together would be great! You're welcome anytime 😄

info

You can check out the complete code on GitHub.

Next Steps 🤔

  • Fix minor bugs
  • Support footnote syntax
  • Support image resize syntax
  • Implement transaction handling for rollback in case of errors during conversion
  • Abstract processing for adding other modules

Release 🚀

After about 6 days of code review, the PR was merged. The plugin is now available for use in the Obsidian Community plugin repository. 🎉

image

Reference

Managing Google Kubernetes Engine through Local CLI

· 3 min read
Haril Song
Owner, Software Engineer at 42dot

Overview

While it is very convenient to be able to run kubectl through Google's Cloud Shell via the web from anywhere, there is a drawback of needing to go through the hassle of web access and authentication for simple query commands. This article shares a method for quickly managing Google Cloud Kubernetes using a local CLI.

Contents

Installing GCP CLI

First, you need to install the GCP CLI. Refer to the gcp-cli link to check for the appropriate operating system and install it.

Connection

Once the installation is complete, proceed with the authentication process using the following command:

gcloud init

You need to access the GCP Kubernetes Engine and fetch the connection information for the cluster.

GKE-connect

gke-cluster-connect-2

Copy the command for command-line access and execute it in the terminal.

gcloud container clusters get-credentials sv-dev-cluster --zone asia-northeast3-a --project {projectId}
Fetching cluster endpoint and auth data.
CRITICAL: ACTION REQUIRED: gke-gcloud-auth-plugin, which is needed for continued use of kubectl, was not found or is not executable. Install gke-gcloud-auth-plugin for use with kubectl by following https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
kubeconfig entry generated for sv-dev-cluster.

Plugin Installation

If the current Kubernetes version being used is below v1.26, you may encounter an error requesting the installation of gke-gcloud-auth-plugin. Install the plugin using the following command.

info

Prior to v1.26, client-specific code for managing authentication between the client and Google Kubernetes Engine was included in the existing versions of kubectl and custom Kubernetes clients. Starting from v1.26, this code is no longer included in the OSS kubectl. GKE users need to download and use a separate authentication plugin to generate GKE-specific tokens. The new binary, gke-gcloud-auth-plugin, extends the kubectl authentication for GKE using the Kubernetes Client-go user authentication information plugin mechanism. Since the plugin is already supported in kubectl, you can switch to this new mechanism before v1.26 is provided. - Google

gcloud components install gke-gcloud-auth-plugin
Your current Google Cloud CLI version is: 408.0.1
Installing components from version: 408.0.1

┌────────────────────────────────────────────┐
│ These components will be installed. │
├────────────────────────┬─────────┬─────────┤
│ Name │ Version │ Size │
├────────────────────────┼─────────┼─────────┤
│ gke-gcloud-auth-plugin │ 0.4.0 │ 7.1 MiB │
└────────────────────────┴─────────┴─────────┘

For the latest full release notes, please visit:
https://cloud.google.com/sdk/release_notes

Do you want to continue (Y/n)? y

╔════════════════════════════════════════════════════════════╗
╠═ Creating update staging area ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: gke-gcloud-auth-plugin ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: gke-gcloud-auth-plugin ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Creating backup and activating new installation ═╣
╚════════════════════════════════════════════════════════════╝

Performing post processing steps...done.

Update done!

re-run the connection command, and you should see that the cluster is connected without any error messages.

gcloud container clusters get-credentials sv-dev-cluster --zone asia-northeast3-a --project {projectId}
Fetching cluster endpoint and auth data.
kubeconfig entry generated for sv-dev-cluster.

Once the connection is successful, you will also notice changes in Docker Desktop. Specifically, new information will be displayed in the Kubernetes tab.

1.png

Afterwards, you can also directly check GKE resources locally using kubectl.

kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
my-application 1/1 1 1 20d

Conclusion

We have briefly explored efficient ways to manage GKE resources locally. Using kubectl locally enables extended features like autocomplete, making Kubernetes management much more convenient. If you are new to using GKE, I strongly recommend giving it a try.

Reference

k8s-plugin

Speeding up Test Execution, Spring Context Mocking

· 3 min read
Haril Song
Owner, Software Engineer at 42dot

Overview

Writing test code in every project has become a common practice. As projects grow, the number of tests inevitably increases, leading to longer overall test execution times. Particularly in projects based on the Spring framework, test execution can significantly slow down due to the loading of Spring Bean contexts. This article introduces methods to address this issue.

Write All Tests as Unit Tests

Tests need to be fast. The faster they are, the more frequently they can be run without hesitation. If running all tests once takes 10 minutes, it means feedback will only come after 10 minutes.

To achieve faster tests in Spring, it is essential to avoid using @SpringBootTest. Loading all Beans causes the time to load necessary Beans to be overwhelmingly longer than executing the code for testing business logic.

@SpringBootTest
class SpringApplicationTest {

@Test
void main() {
}
}

The above code is a basic test code for running a Spring application. All Beans configured by @SpringBootTest are loaded. How can we inject only the necessary Beans for testing?

Utilizing Annotations or Mockito

By using specific annotations, only the necessary Beans for related tests are automatically loaded. This way, instead of loading all Beans through Context loading, only the truly needed Beans are loaded, minimizing test execution time.

Let's briefly look at a few annotations.

  • @WebMvcTest: Loads only Web MVC related Beans.
  • @WebFluxTest: Loads only Web Flux related Beans. Allows the use of WebTestClient.
  • @DataJpaTest: Loads only JPA repository related Beans.
  • @WithMockUser: When using Spring Security, creates a fake User, skipping unnecessary authentication processes.

Additionally, by using Mockito, complex dependencies can be easily resolved to write tests. By appropriately utilizing these two concepts, most unit tests are not overly difficult.

warning

If excessive mocking is required, there is a high possibility that the dependency design is flawed. Be cautious not to overuse mocking.

What about SpringApplication?

For SpringApplication to run, SpringApplication.run() must be executed. Instead of inefficiently loading all Spring contexts to verify the execution of this method, we can mock the SpringApplication where context loading occurs and verify only if run() is called without using @SpringBootTest.

class DemoApplicationTests {  

@Test
void main() {
try (MockedStatic<SpringApplication> springApplication = mockStatic(SpringApplication.class)) {
when(SpringApplication.run(DemoApplication.class)).thenReturn(null);

DemoApplication.main(new String[]{});

springApplication.verify(
() -> SpringApplication.run(DemoApplication.class), only()
);
}
}
}

Conclusion

In Robert C. Martin's Clean Code, Chapter 9 discusses the 'FIRST principle'.

Reflecting on the first letter, F, for Fast, as mentioned in this article, we briefly introduced considerations on speed. Once again, emphasizing the importance of fast tests, we conclude with the quote:

Tests must be fast enough. - Robert C. Martin

Reference

Fixture Monkey 0.4.x

· 3 min read
Haril Song
Owner, Software Engineer at 42dot
warning

As of May 2024, this post is no longer valid. Instead, please refer to Making Testing Easy and Convenient with Fixture Monkey.

Overview

With the update to FixtureMonkey version 0.4.x, there have been significant changes in functionality. It has only been a month since the previous post1, and there have been many modifications (ㅠ) which was a bit overwhelming, but taking comfort in the active community, I am writing a new post reflecting the updated features.

Changes

LabMonkey

An experimental feature has been added as an instance.

LabMonkey inherits from FixtureMonkey and supports existing features while adding several new methods. The overall usage is similar, so it seems that using LabMonkey instead of FixtureMonkey would be appropriate. It is said that after version 1.0.0, the functionality of LabMonkey will be deprecated, and the same features will be provided by FixtureMonkey.

private final LabMonkey fixture = LabMonkey.create();

Change in Object Creation Method

The responsibility has shifted from ArbitraryGenerator to ArbitraryIntrospector.

Record Support

Now, you can also create Record through FixtureMonkey.

public record LottoRecord(int number) {}
class LottoRecordTest {

private final LabMonkey fixture = LabMonkey.labMonkeyBuilder()
.objectIntrospector(ConstructorPropertiesArbitraryIntrospector.INSTANCE)
.build();

@Test
void shouldBetween1to45() {
LottoRecord lottoRecord = fixture.giveMeOne(LottoRecord.class);

System.out.println("lottoRecord: " + lottoRecord);

assertThat(lottoRecord).isNotNull();
}
}
lottoRecord: LottoRecord[number=-6]

By using ConstructorPropertiesArbitraryIntrospector to create objects, you can create Record objects. In version 0.3.x, the ConstructorProperties annotation was required, but now you don't need to make changes to the production code, which is quite a significant change.

In addition, various Introspectors exist to support object creation in a way that matches the shape of the object.

Plugin

private final LabMonkey fixture = LabMonkey.labMonkeyBuilder()
.objectIntrospector(ConstructorPropertiesArbitraryIntrospector.INSTANCE)
.plugin(new JavaxValidationPlugin())
.build();

Through the fluent API plugin(), you can easily add plugins. By adding JavaxValidationPlugin, you can apply Java Bean Validation functionality to create objects.

It seems like a kind of decorator pattern, making it easier to develop and apply various third-party plugins.

public record LottoRecord(
@Min(1)
@Max(45)
int number
) {
public LottoRecord {
if (number < 1 || number > 45) {
throw new IllegalArgumentException("The lotto number must be between 1 and 45.");
}
}
}
@RepeatedTest(100)
void shouldBetween1to45() {
LottoRecord lottoRecord = fixture.giveMeOne(LottoRecord.class);

assertThat(lottoRecord.number()).isBetween(1, 45);
}

Conclusion

Most of the areas that were mentioned as lacking in the previous post have been improved, and I am very satisfied with using it. But somehow, the documentation2 seems a bit lacking compared to before...

Reference

Footnotes

  1. FixtureMonkey 0.3.0 - Object Creation Strategy

  2. FixtureMonkey

Using Date Type as URL Parameter in WebFlux

· 4 min read
Haril Song
Owner, Software Engineer at 42dot

Overview

When using time formats like LocalDateTime as URL parameters, if they do not match the default format, you may encounter an error message like the following:

Exception: Failed to convert value of type 'java.lang.String' to required type 'java.time.LocalDateTime';

What settings do you need to make to allow conversion for specific formats? This article explores the conversion methods.

Contents

Let's create a simple sample example.

public record Event(
String name,
LocalDateTime time
) {
}

This is a simple object that contains the name and occurrence time of an event, created using record.

@RestController
public class EventController {

@GetMapping("/event")
public Mono<Event> helloEvent(Event event) {
return Mono.just(event);
}

}

The handler is created using the traditional Controller model.

tip

In Spring WebFlux, you can manage requests using Router functions, but this article focuses on using @RestController as it is not about WebFlux.

Let's write a test code.

@WebFluxTest
class EventControllerTest {

@Autowired
private WebTestClient webTestClient;

@Test
void helloEvent() {
webTestClient.get().uri("/event?name=Spring&time=2021-08-01T12:00:00")
.exchange()
.expectStatus().isOk()
.expectBody()
.jsonPath("$.name").isEqualTo("Spring")
.jsonPath("$.time").isEqualTo("2021-08-01T12:00:00");
}

}

image1

When running the test code, it simulates the following request.

$ http localhost:8080/event Accept=application/stream+json name==Spring time==2021-08-01T12:00
HTTP/1.1 200 OK
Content-Length: 44
Content-Type: application/stream+json

{
"name": "Spring",
"time": "2021-08-01T12:00:00"
}

If the request is made in the default format, a successful response is received. But what if the request format is changed?

image2

image3

$ http localhost:8080/event Accept=application/stream+json name==Spring time==2021-08-01T12:00:00Z
HTTP/1.1 500 Internal Server Error
Content-Length: 131
Content-Type: application/stream+json

{
"error": "Internal Server Error",
"path": "/event",
"requestId": "ecc1792e-3",
"status": 500,
"timestamp": "2022-11-28T10:04:52.784+00:00"
}

As seen above, additional settings are required to receive responses in specific formats.

1. @DateTimeFormat

The simplest solution is to add an annotation to the field you want to convert. By defining the format you want to convert to, you can request in the desired format.

public record Event(
String name,

@DateTimeFormat(pattern = "yyyy-MM-dd'T'HH:mm:ss'Z'")
LocalDateTime time
) {
}

Running the test again will confirm that it passes successfully.

info

Changing the request format does not change the response format. Response format changes can be set using annotations like @JsonFormat, but this is not covered in this article.

While this is a simple solution, it may not always be the best. If there are many fields that need conversion, manually adding annotations can be quite cumbersome and may lead to bugs if an annotation is accidentally omitted. Using test libraries like ArchUnit1 to check for this is possible, but it increases the effort required to understand the code.

2. WebFluxConfigurer

By implementing WebFluxConfigurer and registering a formatter, you can avoid the need to add annotations to each LocalDateTime field individually.

Remove the @DateTimeFormat from Event and configure the settings as follows.

@Configuration
public class WebFluxConfig implements WebFluxConfigurer {

@Override
public void addFormatters(FormatterRegistry registry) {
DateTimeFormatterRegistrar registrar = new DateTimeFormatterRegistrar();
registrar.setUseIsoFormat(true);
registrar.registerFormatters(registry);
}
}
danger

Using @EnableWebFlux can override the mapper, causing the application to not behave as intended.2

Running the test again will show that it passes without any annotations.

image4

Applying Different Formats to Specific Fields

This is simple. Since the method of directly adding @DateTimeFormat to the field takes precedence, you can add @DateTimeFormat to the desired field.

public record Event(
String name,

LocalDateTime time,

@DateTimeFormat(pattern = "yyyy-MM-dd'T'HH")
LocalDateTime anotherTime
) {
}
    @Test
void helloEvent() {
webTestClient.get().uri("/event?name=Spring&time=2021-08-01T12:00:00Z&anotherTime=2021-08-01T12")
.exchange()
.expectStatus().isOk()
.expectBody()
.jsonPath("$.name").isEqualTo("Spring")
.jsonPath("$.time").isEqualTo("2021-08-01T12:00:00")
.jsonPath("$.anotherTime").isEqualTo("2021-08-01T12:00:00");
}

image5

tip

When the URI becomes long, using UriComponentsBuilder is a good approach.

String uri = UriComponentsBuilder.fromUriString("/event")
.queryParam("name", "Spring")
.queryParam("time", "2021-08-01T12:00:00Z")
.queryParam("anotherTime", "2021-08-01T12")
.build()
.toUriString();

Conclusion

Using WebFluxConfigurer allows for globally consistent formats. If there are multiple fields across different classes that require specific formats, using WebFluxConfigurer is much easier than applying @DateTimeFormat to each field individually. Choose the appropriate method based on the situation.

  • @DateTimeFormat: Simple to apply. Has higher precedence than global settings, allowing for targeting specific fields to use different formats.
  • WebFluxConfigurer: Relatively complex to apply, but advantageous in larger projects where consistent settings are needed. Helps prevent human errors like forgetting to add annotations to some fields compared to @DateTimeFormat.
info

You can find all the example code on GitHub.

Reference

Footnotes

  1. ArchUnit

  2. LocalDateTime is representing in array format

Precautions when using ZonedDateTime - Object.equals vs Assertions.isEqualTo

· 3 min read
Haril Song
Owner, Software Engineer at 42dot

Overview

In Java, there are several objects that can represent time. In this article, we will discuss how time comparison is done with ZonedDateTime, which is one of the objects that contains the most information.

Different but the same time?

Let's write a simple test code to find any peculiarities.

ZonedDateTime seoulZonedTime = ZonedDateTime.parse("2021-10-10T10:00:00+09:00[Asia/Seoul]");
ZonedDateTime utcTime = ZonedDateTime.parse("2021-10-10T01:00:00Z[UTC]");

assertThat(seoulZonedTime.equals(utcTime)).isFalse();
assertThat(seoulZonedTime).isEqualTo(utcTime);

This code passes the test. Although equals returns false, isEqualTo passes. Why is that?

In reality, the two ZonedDateTime objects in the above code represent the same time. However, since ZonedDateTime internally contains LocalDateTime, ZoneOffset, and ZoneId, when compared using equals, it checks if the objects have the same values rather than an absolute time.

Therefore, equals returns false.

image1 ZonedDateTime#equals

However, it seems that isEqualTo works differently in terms of how it operates in time objects.

In fact, when comparing ZonedDateTime, isEqualTo calls ChronoZonedDateTimeByInstantComparator#compare instead of invoking ZonedDateTime's equals.

image2

image3 Comparator#compare is called.

By looking at the internal implementation, it can be seen that the comparison is done by converting to seconds using toEpochSecond(). This means that it compares absolute time through compare rather than comparing objects through equals.

Based on this, the comparison of ZonedDateTime can be summarized as follows:

equals : Compares objects

isEqualTo : Compares absolute time

Therefore, when comparing objects that include ZonedDateTime indirectly, equals is called, so if you want to compare based on the absolute value of ZonedDateTime, you need to override the equals method inside the object.

public record Event(
String name,
ZonedDateTime eventDateTime
) {
@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
Event event = (Event) o;
return Objects.equals(name, event.name)
&& Objects.equals(eventDateTime.toEpochSecond(), event.eventDateTime.toEpochSecond());
}

@Override
public int hashCode() {
return Objects.hash(name, eventDateTime.toEpochSecond());
}
}
@Test
void equals() {
ZonedDateTime time1 = ZonedDateTime.parse("2021-10-10T10:00:00+09:00[Asia/Seoul]");
ZonedDateTime time2 = ZonedDateTime.parse("2021-10-10T01:00:00Z[UTC]");

Event event1 = new Event("event", time1);
Event event2 = new Event("event", time2);

assertThat(event1).isEqualTo(event2); // pass
}

Conclusion

  • If you want to compare absolute time when equals is called between ZonedDateTime, you need to convert it, such as using toEpochSecond().
  • When directly comparing ZonedDateTime with isEqualTo in test code or similar scenarios, equals is not called, and internal conversion is performed, so no separate conversion is needed.
  • If there is a ZonedDateTime inside an object, you may need to override the object's equals method as needed.

Operating Jenkins with Docker

· 3 min read
Haril Song
Owner, Software Engineer at 42dot

Overview

This article explains how to install and operate Jenkins using Docker.

Contents

Install

Docker

docker run --name jenkins-docker -d -p 8080:8080 -p 50000:50000 -v /home/jenkins:/var/jenkins_home -u root jenkins/jenkins:lts 

Mount a volume to persist Jenkins data on the host machine. Unlike TeamCity, Jenkins manages all configurations in files. Setting up a mount makes authentication information and data management much more convenient, so be sure to configure it. Common target paths are /home/jenkins or /var/lib/jenkins.

For the purpose of this article, it is assumed that the path /home/jenkins has been created.

Authentication

To ensure security and access control for both the master and nodes, create a user named 'jenkins' and proceed as follows.

Setting User Access Permissions

chown -R jenkins /var/lib/jenkins

Managing SSH Keys

If you don't have keys, generate one using ssh-keygen to prepare a private key and a public key.

When prompted for a path, enter /home/jenkins/.ssh/id_rsa to ensure the key is created under /home/jenkins/.ssh.

GitLab

In GitLab's personal settings, there is an SSH setting tab. Add the public key.

When selecting Git in the pipeline, a repository path input field is displayed. Entering an SSH path starting with git@~ will show a red error. To resolve this, create a credential. Choose SSH credential to create one, and the ID value can be a useful value, so it is recommended to enter it.

Node Configuration

Nodes are a way to efficiently distribute Jenkins roles.

To communicate with the node, generate a key on the master using ssh-keygen. If you already have one that you are using, you can reuse it.

image

  • ID: This value allows Jenkins to identify the SSH key internally, making it easier to use credentials in Jenkinsfiles, so it's best to set a meaningful value. If not set, a UUID value will be generated.
  • Username: The Linux user. Typically, 'jenkins' is used as the user, so enter 'jenkins'. Be cautious as not entering this may result in a reject key error.

Docker Access Permissions

If the docker group does not exist, create it. Usually, it is automatically created when installing Docker.

sudo groupadd docker

Grant Jenkins user permission to run Docker by running the following command.

sudo gpasswd -a jenkins docker
# Adding user jenkins to group docker
sudo chmod 666 /var/run/docker.sock

Restart the Docker daemon to apply the changes.

systemctl restart docker

You should now be able to run the docker ps command.

Restart

When updating Jenkins version or installing, removing, or updating plugins, Jenkins restarts. However, if you are managing it with Docker, the container goes down, preventing Jenkins from starting. To enable restart, you need to set a restart policy on the container.

docker update --restart=always jenkins-docker

After this, the jenkins-docker container will always remain in a running state.

Caution

When updating plugins, carefully check if they are compatible with the current version of Jenkins in operation. Mismatched versions between Jenkins and plugins can often lead to pipeline failures.

Reference

Managing Jenkins with Docker