Configure Bullhorn Data Replication
Bullhorn Data Replication copies data from Bullhorn databases to a Microsoft SQL Server database hosted on your organization’s network. This gives your team direct SQL access to Bullhorn data for reporting, analysis, and integration with other systems.
This article explains how Data Replication works, what you need to set it up, and how to configure and monitor it after implementation.
Common Reasons You Might Need This Article
-
You are setting up Data Replication for the first time and need to understand the requirements and configuration steps.
-
You want to understand how Bullhorn tracks changes and syncs them to your replicated database.
-
You need to configure the application.properties file or Windows Task Scheduler for your environment.
-
You want to know how to monitor your Data Replication environment and identify sync delays or errors.
-
You are troubleshooting data that is not appearing in your replicated database.
-
You want to understand maintenance best practices to keep your replication database healthy.
Understand How Data Replication Works
Data Replication is made up of two processes that work together to keep your database current and accurate.
Replication Process
The replication process uses an event service to detect changes in Bullhorn. When a record is added, modified, or deleted, the system creates a DataSyncObject in a Data Sync Staging database. The Data Replication client retrieves that object through the Data Sync API and applies the change to your local database. When a record is successfully updated, the DateLastSync field on that row is updated.
Self-Healing Process
The self-healing process runs as a safeguard to catch any records that the event-driven process may have missed, ensuring 100% data accuracy in your replicated database.
Every 24 hours, Data Replication compares its record counts for each entity against the counts in Bullhorn’s Data Sync API. If Bullhorn has more records than the local database, self-healing is triggered for that entity. The system then requests the missing records in batches of 100.
Self-healing for a given entity runs only when all of the following conditions are met:
-
The entity was last healed at least one day ago.
-
There are fewer than 10,000 unprocessed records in the Data Sync queue.
-
The current time is outside of business hours.
DateLastSync vs. DateLastModified
These two date fields serve different purposes and are stored in different places.
-
DateLastSync (DLS) exists only in your Data Replication database. It reflects the date and time that a record received and processed an event. If a sub-entity is updated (for example, a candidate category), the DateLastSync on the parent record (the candidate) is also updated.
-
DateLastModified (DLM) exists in the Bullhorn operational database. It reflects the date and time a record was last changed through the UI or API, and is used by the Entity Model Streamer (EMS) self-healing process for many record types.
Understand Potential Sync Delays
Data Replication runs near real-time but is asynchronous, meaning a short delay between a change in Bullhorn and the update in your replicated database is expected. Common causes of delays include:
-
Natural delay: A few seconds may pass between when data is updated in Bullhorn and when replication runs.
-
Errors: When an error occurs, a log entry is created and the affected data is not replicated until the error is resolved.
-
Fatal errors: If the server hosting Data Replication goes down, Windows Task Scheduler will restart the process automatically every two hours. If your replicated database is mission-critical, consider implementing additional logging and monitoring within your environment.
Review System Requirements Before You Begin
Before configuring Data Replication, confirm that your environment meets these requirements. Bullhorn will work with you to determine appropriate hardware specifications based on your data volume.
|
Category |
Requirement |
Owner |
|---|---|---|
|
Java |
Java 20 or higher (64-bit) |
Client / Bullhorn |
|
Operating System |
Microsoft Windows Server 2016 or higher |
Client |
|
Database |
Microsoft SQL Server 2022 or higher (Standard or Enterprise) |
Client |
|
Database User |
Write and delete access; dbo full access to the environment. One user must be used for both installation and running the JAR via Task Scheduler. |
Client |
|
Database URL |
JDBC-based SQL Server URL |
Client |
|
RAM |
Minimum 16 GB (Bullhorn may recommend more based on volume) |
Client |
|
Hard Drive |
Minimum 500 GB or 60% of your Bullhorn database size, whichever is larger |
Client |
|
Bandwidth |
Minimum 1 MB upload/download speed |
Client |
|
Internet Access |
Server must be able to reach https://bh-datamirror.s3.amazonaws.com/ |
Client |
Configure Data Replication
Bullhorn handles initial installation and configuration with you. A seed copy of your data is delivered via FTP. Once you download it and set it up as a SQL instance on your server, Bullhorn will walk you through the steps below.
Understand the Application Components
Data Replication runs through three JAR files that work together:
-
data_replicator_manager.jar: Queries Data Sync Services for configuration, validates checksums, and manages file downloads from Bullhorn’s private Amazon S3 bucket.
-
data_replicator_launcher.jar: A utility JAR that launches the data_mirror_client-ems.jar and keeps it updated on an ongoing basis.
-
data-mirror-client-ems.jar: The executable JAR that performs the actual replication work. The launcher downloads the latest version at startup.
Log files are generated daily. You will see files named DataMirrorClient_YYYY-MM-DD.x and DataReplicatorManager_YYYY-MM-DD.x. Log files are capped at 50 MB.
Configure the application.properties File
The application.properties file controls how Data Replication connects to Bullhorn and your local database. Some fields are set by Bullhorn; others require your input. Always have Bullhorn review any changes you make to this file.
|
Field |
Entered By |
Value / Comments |
|---|---|---|
|
rest.endpoint.retrieval |
BH |
Endpoint based on your data center:
|
|
rest.endpoint.retrieval.corp.id |
BH |
Your Bullhorn Corporation ID number |
|
thread.sleep.seconds |
BH |
3 |
|
max.eventWorker.threads |
BH |
16 |
|
subscription.name |
BH |
CorpName_CorpId_DateCreated |
|
subscription.entities |
BH |
Leave blank unless instructed otherwise. If set, update the subscription.name so a new subscription is created. |
|
subscription.daystogoback |
BH |
Days since the seed was generated. Runs one time during initial setup to populate data since the seed. Maximum is 30 days. |
|
spring.datasource.url |
Client |
jdbc:sqlserver://{DB_SERVER};lockTimeout=5000;instanceName={DB_INSTANCE};databaseName={DB_NAME} Set lockTimeout to prevent database locks from blocking replication. 5000 (5 seconds) is the default; adjust as needed. |
|
spring.datasource.username |
Client |
SQL Server username |
|
spring.datasource.password |
Client |
SQL Server password (SQL Server Authentication required) |
|
auth.clientId |
BH |
Unique client ID |
|
auth.clientSecret |
BH |
Unique client secret |
|
auth.clientPassword |
BH |
Unique client password |
|
reInitializeOnStartup |
BH |
True: System checks and adds any required tables, views, or stored procedures before syncing. False: Skips re-initialization and begins syncing immediately. |
|
dataReplicatorPath |
Client |
Sets where files are downloaded on the server. To use defaults, enter: ./data-mirror-client-ems.jar |
Any changes to the application.properties file should be reviewed by Bullhorn before the process is restarted. This file controls access to your Data Replication environment.
Set Up Windows Task Scheduler
Task Scheduler runs the replication process automatically. Keep the task disabled until replication is ready to begin. To configure it:
-
Open Task Scheduler (search for "Task Scheduler" in Windows Search).
-
In the Action menu, select Create Task.
-
On the General tab, enter a name for the task and set yourself as the user.
-
On the Triggers tab, select New and configure the trigger to run every two hours.
-
On the Actions tab, select New and set the following:
-
Action: Start a program
-
Program/script: Path to java.exe (example: C:\\DataMirror\\jdk-20\\bin\\java.exe)
-
Add arguments: -Duser.timezone=EST5EDT -Xms512m -Xmx4096m -jar C:\\DataMirror\\data-replicator-manager.jar
-
Start in: Base folder path (example: C:\\DataMirror\\)
-
-
On the Conditions tab, configure conditions as directed by Bullhorn.
-
On the Settings tab, configure settings as directed by Bullhorn.
-
Click OK to save the task.
Monitor Your Data Replication Environment
After setup, you are responsible for monitoring your server, database, and replication process to ensure it is running as expected. When you first go live, there may be a period where replication is catching up to real-time.
Check the Logs Regularly
Review the daily log files for errors. Log files are located in your Data Replication directory and are named DataMirrorClient_YYYY-MM-DD.x. The DataReplicatorManager_YYYY-MM-DD.x log tracks version changes and updates.
Run a Sync Status Query
Use the following SQL query to check the most recent sync date for each entity in your replicated database. If you know data has been updated in Bullhorn recently and the sync date does not reflect that, contact Bullhorn Support and include the query results.
Sync Status Query
-- Update to the appropriate DB Name
DECLARE @table_catalog nvarchar(50) = '' -- <-- UPDATE TO DB NAME
DECLARE @SQL nvarchar(max) = ''
DECLARE @tableList table (Num int IDENTITY (1,1), TABLE_NAME nvarchar(100))
INSERT INTO @tableList (TABLE_NAME)
SELECT DISTINCT ts.TABLE_NAME FROM INFORMATION_SCHEMA.TABLES ts
JOIN INFORMATION_SCHEMA.COLUMNS cs ON ts.TABLE_NAME = cs.TABLE_NAME
WHERE ts.TABLE_TYPE = 'BASE TABLE' AND ts.TABLE_CATALOG = @table_catalog
AND cs.COLUMN_NAME = 'dateLastSync'
DECLARE @endLoop int = (SELECT MAX(Num) FROM @tableList)
DECLARE @Pointer int = 1
WHILE @endLoop > @Pointer BEGIN
DECLARE @tableName nvarchar(100) = (SELECT TABLE_NAME FROM @tableList WHERE Num = @Pointer)
SET @SQL = @SQL + ' SELECT ''' + @tableName + ''' AS entity, MAX(dateLastSync) AS dateLastSync FROM ' + @tableName + ' (NOLOCK) UNION ALL '
SET @Pointer = @Pointer + 1
END
SET @SQL = LEFT(@SQL, LEN(@SQL)-9)
EXECUTE (@SQL)
Maintain Your Data Replication Database
Keeping your replication database healthy requires proactive maintenance. The recommendations below are tailored for single-source data replication systems and are designed to protect data integrity, minimize latency, and prevent performance issues over time.
Manage Data and Log File Growth
Uncontrolled file growth can degrade performance and cause unexpected downtime. Set auto-growth as a backstop, not a primary sizing strategy.
-
Configure auto-growth with a fixed size increment rather than a percentage-based increment. Avoid increments over 1 GB, as large growth events can temporarily impact performance.
-
Monitor file growth over time to predict future needs, then manually increase file sizes during off-peak hours before they are needed.
-
Confirm that Instant File Initialization is configured on the server.
-
Disable auto-shrink. If it is currently enabled, run: ALTER DATABASE MyDatabase SET AUTO_SHRINK OFF;
-
Schedule index rebuilds based on fragmentation at least weekly.
-
Update statistics at least weekly.
-
Run integrity checks (corruption detection) at least weekly.
Plan for Database Restoration
Data Replication is treated as an ETL database, meaning the source system can stream missing data back in. Bullhorn can restore data through several methods:
-
Data inserts
-
Self-Heal
-
Self-Seed
-
A backup copy of the ATS system database
Bullhorn’s timeline for restoring data ranges from a minimum of 2 business days (self-heal or self-seed) to up to 2 weeks (database backup). It is strongly recommended that you maintain your own comprehensive backup and recovery strategy based on how critical Data Replication is to your business operations.
Allocate Memory Appropriately
Allocate as much available server memory as possible to the SQL instance, reserving only a few GB for the operating system itself.
-
SQL Server 2016, 2017, and 2019 Standard Edition supports up to 128 GB per SQL instance, plus 32 GB for in-memory OLTP and 32 GB for columnstore indexes (if used).
-
SQL Server Enterprise Edition is limited only by the operating system’s memory capacity.
Configure Parallelism Options
Review and configure the max degree of parallelism setting based on your hardware. Microsoft’s guidance for this setting is available in the SQL Server documentation.
Use the Database Tuning Advisor
Run the Database Tuning Advisor at least quarterly. As new entities are added and transaction patterns shift, the advisor can surface index and statistics recommendations that improve replication performance.
Troubleshooting
If data in your replicated database does not match Bullhorn
Run the sync status query in the Monitor section above. If the DateLastSync for an entity is significantly behind and you know changes have occurred in Bullhorn, contact Bullhorn Support and include the query output.
If the replication process stops running
Check the log files for error messages. Windows Task Scheduler is configured to restart the process automatically every two hours. If the process is still not running after two hours, verify that the Task Scheduler task is enabled and that the java.exe path in the Action tab is correct. Also check that the scheduled task user password has not expired or changed.
If checksums do not match on startup
If the data_replicator_manager.jar detects a checksum mismatch on a downloaded file, it deletes the file and notifies Bullhorn automatically. If this happens repeatedly, contact Bullhorn Support.
FAQs
How often does Data Replication sync?
Replication runs near real-time using an event-driven process. There may be a delay of a few seconds under normal conditions. Longer delays occur if an error is logged or the hosting server restarts.
Who is responsible for monitoring the replication environment?
The client is responsible for monitoring the server, database server, and replication process. Bullhorn assists with setup and can help troubleshoot issues, but ongoing operational monitoring is the client’s responsibility.
What is the difference between DateLastSync and DateLastModified?
DateLastSync (DLS) exists only in your replication database and tracks when a record was last processed by the replication client. DateLastModified (DLM) exists in the Bullhorn operational database and tracks when a record was last changed through the UI or API.
Can Bullhorn restore data if my replication database is corrupted or behind?
Yes. Bullhorn can restore data through self-heal, self-seed, or a database backup copy. Restoration timelines range from 2 business days to 2 weeks depending on the method. Maintain your own backup strategy if Data Replication is mission-critical.
What happens if I change the application.properties file?
Always have Bullhorn review any changes before restarting the process. This file controls how your environment connects to Bullhorn, and an incorrect setting can stop replication entirely.
What should I do if the self-healing process is not triggering?
Self-healing only runs when all three conditions are met: the entity was last healed more than one day ago, fewer than 10,000 records are in the unprocessed queue, and the current time is outside of business hours. If you believe records are consistently missing, contact Bullhorn Support with the sync status query results.