Main Organiser

Julius Centre University of Malaya

Co-organiser

Department of Social and Preventive Medicine, Faculty of Medicine, University of Malay

Supported by

University of Malaya

LINKING TWO DATABASES; SOME SOFTWARE CONSIDERATIONS

Author

Rohayu Sarani, Hizal Hanis Hashim, Tassha Hilda

Institution

Malaysian Institute of Road Safety Research

Abstract

Objectives: This paper attempts to highlight some experience in using different software (SPSS, Microsoft Access, and Stata) in deterministic match with unique identifier linking procedures.

Methods: Two sets of databases are available for linking based on collaboration work on Road Traffic Injury Prevention Study (R-TRIPS). Linking procedures involves all road accident data from police and injury data set from six hospitals in Klang Valley. Malaysian IC has been identified as key identifier to merge the two databases. Data segmentation on IC type and data cleaning works is needed for both databases. IC column is standardized using Microsoft Excel and copied to SPSS. Data need to be sorted first before 'Add Variable' function can be done in SPSS. As for linking using Microsoft Access, databases were converted to Microsoft Access using StatTransfer software. Duplication is detected using Stata and also Microsoft Access.

Results: Out of 35,996 hospital data used, SPSS managed to get 1,194 exact matched data, compared to Microsoft Access, the number of matched cases is slightly higher (1,211).

Conclusion: Data standardization is important before any exact match linking can be done. In SPSS, the data standardization, cleaning, and sorting process prior to the merging takes huge amount of time. The difference in matched cases using SPSS and Microsoft Access is small.