Sequence allocation size question

Hi, I have a question on what my allocation size should be for my call to batch insert records into a table. I'm using sequence generator and I have a batch process that has to insert or update 50,000 records. I'm noticing the performance for this process isn't great and I read online I might need to adjust allocation size (currently have it as 1). So if I'm batch inserting in batches of 500 records, does that mean my allocation size should be 500?
33 Replies
JavaBot
JavaBot2mo ago
This post has been reserved for your question.
Hey @Kale Vivi! Please use /close or the Close Post button above when your problem is solved. Please remember to follow the help guidelines. This post will be automatically marked as dormant after 300 minutes of inactivity.
TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here. 💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived. If your question was not answered yet, feel free to re-open this post or create a new one. In case your post is not getting any attention, you can try to use /help ping. Warning: abusing this will result in moderative actions taken against you.
puggy
puggy2mo ago
Hi @Kale Vivi , If you're batch inserting 50,000 records in chunks of 500 and you're using a sequence generator with allocationSize set to 1, that’s almost certainly the root cause of your performance issues. Setting allocationSize to 1 forces the database to fetch a new sequence value for every single record, which is incredibly inefficient for batch operations. You should align your allocationSize with your batch size. In this case, setting allocationSize to 500 will allow the sequence generator to allocate 500 IDs in a single call. This drastically reduces the number of round trips to the database for sequence generation. Here’s how it works: - When allocationSize is 500, the sequence generator pre-fetches 500 values in one call and assigns them to your records as needed. - For a batch of 500 inserts, this means the database only hits the sequence once per batch, instead of 500 times.
Kale Vivi
Kale ViviOP2mo ago
Thanks, I adjusted first to 500 and saw a good improvement. Trying 1000 and 1000 batch size now to see if it's even better
JavaBot
JavaBot2mo ago
If you are finished with your post, please close it. If you are not, please ignore this message. Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.
Kale Vivi
Kale ViviOP2mo ago
My other concern is I'm also going one by one in the list to convert the domain object to entity before persistence and I'm worried about that also slowing performance
puggy
puggy2mo ago
Use java streams instead. You can use parallelStream there and it will boost a lot the processing
Kale Vivi
Kale ViviOP2mo ago
Ok let me take a look there. Here's kind of what I have so far and I'm feeling like I'm definitely going to need to refactor this @PersistenceContext private final EntityManager entityManager; private final RepoOne repoOne; private final RepoTwo repoTwo; private final RepoThree repoThree; private final RepoFour repoFour; private final RepoFive repoFive; private final RepoSix repoSix; private final RepoSeven repoSeven; private final RepoEight repoEight; private final RepoNine repoNine; private static final int BATCH_SIZE = 1000; @Override @Transactional public void saveAll(Set<Person> persons) { Iterable<List<Person>> peopleBatches = ListUtils.partition(persons.stream().toList(), BATCH_SIZE); List<Person> newPeople = new ArrayList<>(); List<Person> existingPeople = new ArrayList<>(); AtomicInteger indexPersonBatches = new AtomicInteger(1); for (List<Person> batch : peopleBatches) { for (Person person : batch) { if (existsBy(person.getIdNumber())) { existingPeople.add(person); } else { newPeople.add(person); } } updatePeople(existingPeople); savePeople(newPeople); existingPeople.clear(); newPeople.clear(); } entityManager.flush(); entityManager.clear(); } private void savePeople(List<Person> newPeople) { for (Person person : newPeople) { repoOne.save(One.fromPerson(person)); repoTwo.save(Two.fromPerson(person)); repoThree.save(Three.fromPerson(person)); repoFour.save(Four.fromPerson(person)); repoFive.saveAll(Five.fromPerson(person)); repoSix.saveAll(Six.fromPerson(person)); repoSeven.saveAll(Seven.fromPerson(person)); repoEight.saveAll(Eight.fromPerson(person)); repoNine.saveAll(Nine.fromPerson(person)); } } public boolean existsBy(String number) { return repoOne.findByIdNumber(number) != null; } I think the issue is the savePeople above where I go one by one here. In repo fromPerson, that's where I take that domain object and map it to entity for each field. Should I just parallel streams there? It's crazy that I have to break down the one object above into smaller objects to persist in multiple tables too which is also what I feel impacts performance
puggy
puggy2mo ago
OK, I see. What i can suggest you here 1. instead of checking existsBy one by one. DO it for batch of ids you already have from splitted array
private Set<String> existsBy(List<String> idNumbers) {
return new HashSet<>(repoOne.findAllByIdNumbers(idNumbers));
}
private Set<String> existsBy(List<String> idNumbers) {
return new HashSet<>(repoOne.findAllByIdNumbers(idNumbers));
}
It will reduce DB interaction from N calls to M calls where N - number of persons, M - number of lists (after splitting) 2. Improve exists/not exists logic with splitting into 2 arrays from the result you would have after point 1.
for (List<Person> batch : peopleBatches) {
List<String> idNumbers = batch.stream()
.map(Person::getIdNumber)
.collect(Collectors.toList());
Set<String> existingIds = fetchExistingIds(batch);

List<Person> newPeople = batch.stream()
.filter(person -> !existingIds.contains(person.getIdNumber()))
.collect(Collectors.toList());

List<Person> existingPeople = batch.stream()
.filter(person -> existingIds.contains(person.getIdNumber()))
.collect(Collectors.toList());

if (!existingPeople.isEmpty()) {
updatePeople(existingPeople);
}

if (!newPeople.isEmpty()) {
savePeople(newPeople);
}
}
for (List<Person> batch : peopleBatches) {
List<String> idNumbers = batch.stream()
.map(Person::getIdNumber)
.collect(Collectors.toList());
Set<String> existingIds = fetchExistingIds(batch);

List<Person> newPeople = batch.stream()
.filter(person -> !existingIds.contains(person.getIdNumber()))
.collect(Collectors.toList());

List<Person> existingPeople = batch.stream()
.filter(person -> existingIds.contains(person.getIdNumber()))
.collect(Collectors.toList());

if (!existingPeople.isEmpty()) {
updatePeople(existingPeople);
}

if (!newPeople.isEmpty()) {
savePeople(newPeople);
}
}
3. With this code I suggest to use saveAll instead of single save From
private void savePeople(List<Person> newPeople) {
for (Person person : newPeople) {
repoOne.save(One.fromPerson(person));
repoTwo.save(Two.fromPerson(person));
repoThree.save(Three.fromPerson(person));
repoFour.save(Four.fromPerson(person));
repoFive.saveAll(Five.fromPerson(person));
repoSix.saveAll(Six.fromPerson(person));
repoSeven.saveAll(Seven.fromPerson(person));
repoEight.saveAll(Eight.fromPerson(person));
repoNine.saveAll(Nine.fromPerson(person));
}
}
private void savePeople(List<Person> newPeople) {
for (Person person : newPeople) {
repoOne.save(One.fromPerson(person));
repoTwo.save(Two.fromPerson(person));
repoThree.save(Three.fromPerson(person));
repoFour.save(Four.fromPerson(person));
repoFive.saveAll(Five.fromPerson(person));
repoSix.saveAll(Six.fromPerson(person));
repoSeven.saveAll(Seven.fromPerson(person));
repoEight.saveAll(Eight.fromPerson(person));
repoNine.saveAll(Nine.fromPerson(person));
}
}
To
private void savePeople(List<Person> newPeople) {
repoOne.saveAll(newPeople.stream().map(One::fromPerson).collect(Collectors.toList()));
repoTwo.saveAll(newPeople.stream().map(Two::fromPerson).collect(Collectors.toList()));
repoThree.saveAll(newPeople.stream().map(Three::fromPerson).collect(Collectors.toList()));
// Other your saves
}
private void savePeople(List<Person> newPeople) {
repoOne.saveAll(newPeople.stream().map(One::fromPerson).collect(Collectors.toList()));
repoTwo.saveAll(newPeople.stream().map(Two::fromPerson).collect(Collectors.toList()));
repoThree.saveAll(newPeople.stream().map(Three::fromPerson).collect(Collectors.toList()));
// Other your saves
}
let me know if this helped to speedup processing the way you need it
Kale Vivi
Kale ViviOP2mo ago
Thanks a lot Puggy! I got distracted by writing some tests but will take a look at the above in a minute and give it a try. Thank you. Just adding here in case the above wasn't clear for the savePeople(). I have a a list of Person objects where Person has single fields then under Person is a nested List of other objects (Person then for Person object there's a List<Six> six), so thats' why I had to use the saveAll for the other repos but not the first several where there's only one field/value for that person versus the others where person can have multiple records of that (e.g. Six here). Sorry for the names (just masked them, they're bad).
JavaBot
JavaBot2mo ago
If you are finished with your post, please close it. If you are not, please ignore this message. Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.
puggy
puggy2mo ago
So you can move repos with saveAll outside of for loop, and use my suggestion for it. For other repos you can continue use your for 🙂
Kale Vivi
Kale ViviOP2mo ago
Not sure I follow the last part here. Let me share the builders to confirm understanding: One entity: public static One fromPerson(Person person) { return One.builder() .idNumber(person.getIdNumber()) .build(); } Six entity: public static List<Six> fromPerson(Person person) { return person.getSix().stream().map(s -> Six.builder() .idNumber(person.getIdNumber()) .someField(s.getSomeField()) .build()).collect(Collectors.toList()); } } Should I just do saveAll for all and not do a loop or do a loop for entities like Six here? Thanks again
JavaBot
JavaBot2mo ago
If you are finished with your post, please close it. If you are not, please ignore this message. Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.
puggy
puggy2mo ago
Thank you for sharing more code. Suggestion
private void savePeople(List<Person> newPeople) {
// Single saves
repoOne.saveAll(newPeople.stream().map(One::fromPerson).collect(Collectors.toList()));
repoTwo.saveAll(newPeople.stream().map(Two::fromPerson).collect(Collectors.toList()));
repoThree.saveAll(newPeople.stream().map(Three::fromPerson).collect(Collectors.toList()));
// others save single in your example

// BULK
repoSix.saveAll(newPeople.stream().flatMap(person -> Six.fromPerson(person).stream()).collect(Collectors.toList()));
// others save all
}
private void savePeople(List<Person> newPeople) {
// Single saves
repoOne.saveAll(newPeople.stream().map(One::fromPerson).collect(Collectors.toList()));
repoTwo.saveAll(newPeople.stream().map(Two::fromPerson).collect(Collectors.toList()));
repoThree.saveAll(newPeople.stream().map(Three::fromPerson).collect(Collectors.toList()));
// others save single in your example

// BULK
repoSix.saveAll(newPeople.stream().flatMap(person -> Six.fromPerson(person).stream()).collect(Collectors.toList()));
// others save all
}
Kale Vivi
Kale ViviOP2mo ago
Thank you, giving it a try now. You're right the for each loop code was slow
JavaBot
JavaBot2mo ago
If you are finished with your post, please close it. If you are not, please ignore this message. Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts. 💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived. If your question was not answered yet, feel free to re-open this post or create a new one. In case your post is not getting any attention, you can try to use /help ping. Warning: abusing this will result in moderative actions taken against you.
puggy
puggy2mo ago
Great! 🫶
JavaBot
JavaBot2mo ago
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived. If your question was not answered yet, feel free to re-open this post or create a new one. In case your post is not getting any attention, you can try to use /help ping. Warning: abusing this will result in moderative actions taken against you.
Kale Vivi
Kale ViviOP2mo ago
Hey Puggy, while it is a dramatic improvement over the previous implementation, it is still slow. (at least when running the job locally). Other approaches where would maybe include leveraging @Async and ThreadExecutor or parallel streams? Does that seem like the right approach forward?
puggy
puggy2mo ago
Hey @Kale Vivi , Could you please share time expectations and what is the execution time right now?
JavaBot
JavaBot2mo ago
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived. If your question was not answered yet, feel free to re-open this post or create a new one. In case your post is not getting any attention, you can try to use /help ping. Warning: abusing this will result in moderative actions taken against you.
Kale Vivi
Kale ViviOP2mo ago
It's still taking hours and exception is thrown with db io error. Db timeouts I guess before process can finish. I just want it done in 30 to 45 mins if possible?
ayylmao123xdd
ayylmao123xdd2mo ago
if you wanna insert 50k records it should take seconds to process try giving the process method an async annotation make sure its called from another file class and you should probably not use exists by id because that generates race condition instead you could mark the column with @column(unique = true) and to avoid having six repositories you could use one with a single table for all the objects if they share a lot of similar fields also from what i was testing 10k batch size is pretty good for performance
JavaBot
JavaBot2mo ago
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived. If your question was not answered yet, feel free to re-open this post or create a new one. In case your post is not getting any attention, you can try to use /help ping. Warning: abusing this will result in moderative actions taken against you.
Kale Vivi
Kale ViviOP2mo ago
Hey sorry to respond late. I was able to resolve it by adding hibernate props for batch and the @async annotation earlier. That seemed to work wonders for perf. Thanks all!
JavaBot
JavaBot2mo ago
If you are finished with your post, please close it. If you are not, please ignore this message. Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.
Kale Vivi
Kale ViviOP2mo ago
One thing I've been thinking about though are different approaches to handling batch updates of large data sizes. There's things like @async, completable futures, spring batch, parallel streams, ... What are some considerations I should think about in understanding when to use what?
ayylmao123xdd
ayylmao123xdd2mo ago
so if you dont want to block other processes in the application you should use the async annotation for sure when i was doing file processing it was taking me about 10k objects to be saved in db per 0.6 second but i was using the async annotation with a normal loop to generate the objects from the lines and yes also with spring batch you would need to test if parallel streams will be faster in your case but in mine there wasnt really any advantage and in some tests they seemed to be slower than just a normal stream/loop as for the competable future you can use it when you need info from the thread if its job has finished
Kale Vivi
Kale ViviOP2mo ago
So in the case of async vs completable future. If I need info, I go with CF, otherwise, just use async?
ayylmao123xdd
ayylmao123xdd2mo ago
async just launches the method asynchronically in a separate thread and completable launches asynchronically with returning something you can read more about it here
ayylmao123xdd
ayylmao123xdd2mo ago
Baeldung
Guide To CompletableFuture | Baeldung
Quick and practical guide to Java 8's CompletableFuture.
ayylmao123xdd
ayylmao123xdd2mo ago
Baeldung
How To Do @Async in Spring | Baeldung
How to enable and use @Async in Spring - from the very simple config and basic usage to the more complex executors and exception handling strategies.
JavaBot
JavaBot2mo ago
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived. If your question was not answered yet, feel free to re-open this post or create a new one. In case your post is not getting any attention, you can try to use /help ping. Warning: abusing this will result in moderative actions taken against you.

Did you find this page helpful?