According to a recent survey done in 2008, most Americans (up to 80%) rely on the Internet to find health information in order to make their healthcare decisions. This trend rivaled physicians as of 2008. With the heightened reliance on the Internet, increasing number of patients is turning to the Internet for emotional support and to acquire clinical knowledge for self-care. The massive production of social media enabled by Web 2.0 drives wealth clinical knowledge providing an efficient platform for patients to support each other. The platform, also known as Health Social Network or Health 2.0, has fuelled great interest and shown massive potential to empower patients’ self-care. Some prominent examples include PatientsLikeMe and the IBM Patient Empowerment System. These newly emerged patient-driven health care services are to harmonize the plethora of existing frameworks and new ideas. It aims to serve as an established communication channel for information exchange and better collaboration among patients and doctors. The services provided by health social networks include (a) emotional support and information sharing, (b) physician Q&As, and (c) self-tracking of a condition, its symptoms, treatment options and other biological information.
The advancement of Internet and the proliferation of social media such as blogs and social networking sites are overwhelming until the point that it is difficult to keep abreast of current affairs and growing information. With the vast information on the internet, conventional methods of searching keywords or descriptions became laborious and seemingly mundane. Medical assessments or discussions are often rich in jargons for new patients to explore and find relevant communities. In particular, mental health descriptions and medical assessment reports contain sensitive information and therefore most existing document similarity measures, such as the popular cosine-based similarity measure and latent semantic analysis (LSA) are not suitable since those methods either require the entire source medical report or a large number of features that are confidential and can possibly give away patients’ sensitive information.
We have developed a method of generating a small number of relevant keywords or codes that can distinguish each patient based on their health conditions and that can be used for similarity measure of the patient’s underlying health conditions to reduce the risk of revealing sensitive personal information.