EmbeddingGemma-300m trained on vectorranger/med (Filtered medical Q/A from MIRIAD)
This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the med dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: google/embeddinggemma-300m
- Maximum Sequence Length: 2048 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(4): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("embeddinggemma-300m-medical-300k")
# Run inference
queries = [
"What are the benefits of using laparoscopy in the management of blunt abdominal trauma?\n",
]
documents = [
'The use of laparoscopy in trauma has lagged behind in the otherwise rapid progression of this groundbreaking surgical tool. Although reports exist of the use of laparoscopy for the diagnosis of hemoperitoneum as far back as the 1920s, there is still a paucity of literature on this subject to this day. 1, 2 There is no doubt that this is related to the nature of trauma. There is often anxiety and concern to optimize the patient with the quickest possible intervention. It should be stated early in this discourse that there is no role for laparoscopy in the management of the patient with abdominal trauma who is hemodynamically unstable. The priority in this situation follows the standard life-saving principles of resuscitation, with quick access for hemostasis, which must in those situations be open surgery. Associated extraabdominal injuries like head injuries may also be worsened by the hemodynamic effects of carbon dioxide pneumoperitoneum and may preclude laparoscopy. The gasless laparoscopy technique has been described to attenuate this as well as to prevent air embolism and also pneumothorax in patients with occult diaphragmatic injuries. 3 Laparoscopy can be safely used when an intraabdominal injury is suspected in a patient, i.e., hemodynamically stable. These are patients with a systolic blood pressure of >100 mm Hg, diastolic blood pressure of >60 mm Hg, a heart rate of <110 beats per minute, and crystalloid resuscitation requirements of <2 L. 4 The objective of this review is to determine the scope of the diagnostic and therapeutic uses of laparoscopy in blunt abdominal trauma, and also to delineate the benefits, complications, as well as prospects of laparoscopy in patients with blunt abdominal trauma.\n\n The PubMed search engine was used to search for peer-reviewed articles. The keywords entered were laparoscopy, blunt, abdominal, and trauma. The search was filtered to include only articles written in the last 5 years. All 55 articles obtained from the database were then reviewed for relevance and sample size. Case reports were excluded.\n\n Several articles discussed the uses of laparoscopy in blunt abdominal trauma. The role of laparoscopy as the most sensitive detector of a breach of the peritoneum in penetrating abdominal trauma is immediately apparent. 5 It is instructive that the authors reviewed equally acknowledged the role of laparoscopy in diagnosis in blunt abdominal trauma. Johnson et al 5 started their study on the established premise that diagnostic laparoscopy (DL) had decreased the rate of nontherapeutic laparotomies in patients with penetrating abdominal injuries. They sort to determine whether DL similarly lowered nontherapeutic laparotomy in blunt abdominal injury. They found that coupled with diagnostic computed tomography (CT) scan, DL yielded a nontherapeutic laparotomy rate of 0% in patients with blunt abdominal trauma. They concluded that when combined with CT scan, DL is a useful tool in the initial evaluation of patients with blunt abdominal trauma. Lee et al 6 had similar findings demonstrating that the use of laparoscopy in patients with abdominal trauma safely decreased the laparotomy rate. 14, 15 Lin et al 16 have described a new approach for management of high-grade splenic injury laparoscopically. They, however, emphasize the need for adequate training on laparoscopy in trauma.\n\n Evaluation of diagnostic tools in blunt abdominal trauma remains a contemporary issue to clarify the need for appropriate surgical intervention. 17 This study clearly describes the safety of DL as an approach in blunt abdominal trauma. With the increasing trend for limited intervention in appropriately selected hemodynamically stable patients with blunt abdominal trauma, the role of DL is brought to the fore. [18] [19] [20] As minimal access surgery becomes more prominent, laparoscopic surgeons should equally remain aware of the potential complications that could arise when this approach is adopted in the management of patients with blunt abdominal trauma.\n\n \n\n Laparoscopy can be safely used both diagnostically and therapeutically in hemodynamically stable patients with blunt abdominal trauma.',
"The study here complements those results; demonstrating that texture-based analysis of the StO 2 -contrast may yield similar statistical differences between response groups (P ¼ 0.044). Grey-level co-occurrence matrices analyses here, provided discriminant features by using volumetric tumour analysis, in addition to second-order statistical analyses that examined the pixel-by-pixel relationships of tumour heterogeneities within the parametric maps. Measures of spatial heterogeneity in tumour Table 2A reports the percentage of the statistical power. The numbers inside parentheses in this column indicate the number of non-responders (n2) required in this study to achieve a statistical power of minimum 80% in case that the number of responders (n1) is fixed at 27. physiology as conducted here, could potentially provide good characterisation of biological traits that influence tumour response to treatment. Such features include tumour hypoxia (Hockel and Vaupel, 2001) , and haematological characteristics such as blood flow and vascular density (Folkman, 2002) . These features have been shown to influence tumour cell proliferation and metabolism, and therefore may also affect chemosensitivity (Folkman, 2002) .\n\n The use of such measures better reflects tumour physiology, which is not homogeneous but rather spatially heterogeneous.\n\n Additionally, multiparametric analysis resulted in sensitive and specific combined markers for response classification. Logistic regression analysis demonstrated B10% improvement in all performance measures by using pairwise features compared to the case of using only one single feature. However, the naive Bayes and k-NN did not show a significant improvement. This may be related to the small sample size used and peaking phenomena (Jain et al, 2000) . Features into the pairwise models included: HbO 2 -cor, HbO 2 -hom, Hb-cor, HbO 2 -con, Hb-hom, and Hb-con. Individually, those non-texture DOS parameters were previously correlated to tumour vasculature (Intes, 2005) . Additionally, the heterogenic tumour vasculature has been linked to mediating drug resistance; caused by structural scaffolds that inhibit effective drug delivery (Teicher et al, 1990; Galmarini et al, 2000; Tredan et al, 2007) . These include poor vascular flow, increased interstitial fluid, and a tightly bound cellular matrix that may constrain drugs from reaching into the tumour stroma thereby affecting the efficacy of chemotherapies.\n\n In comparison to other studies, texture analysis of MRI ( spectroscopy (Sadeghi-Naini et al, 2014) , and DOS (Sadeghi-Naini et al, 2015) images have been used to assess and monitor chemotherapy response in breast tumours during the course of treatment. Textural analysis of pretreatment MRI-based kinetic maps have indicated positive results for predicting chemotherapy response in 'triple-negative' breast tumours (Golden et al, 2013) . Those results also strongly suggest that pretreatment tumour heterogeneity can influence drug resistance (Golden et al, 2013) . Other similar studies have examined texture features of dynamic contrast-enhanced MRI images to predict NAC response (Ahmed et al, 2013; Teruel et al, 2014) . Results have indicated significant differences in GLCM texture features between responders and non-responders at pretreatment (Ahmed et al, 2013) and have reported an increase in textural heterogeneity caused by necrotic tumour areas (Ahmed et al, 2013) . Those studies demonstrated comparable frameworks to the present study. Specifically, that heterogeneous tumour features caused by pathophysiology, and initial biochemical composition might play an important role in chemoresistance.\n\n In terms of novelty, the results indicate that selecting volumetric tumour-based ROIs may improve the method for DOS texture analysis to predict NAC response. Additionally, we compared the performance of several classification methods and found that using naive Bayes classifier demonstrated high accuracy in predicting chemotherapy treatment response. The preliminary work in this study highlights an important phase in the 'imaging biomarker roadmap' outlined by Cancer Research UK (CRUK) and the European Organisation for Research and Treatment of Cancer (EORTC) (O'Connor et al, 2017) . Diffuse optical spectroscopybased biomarkers have surpassed the initial translational gap outlined within this roadmap; specifically, as a useful tool in medical research (O'Connor et al, 2017) .",
'When the motor end plates are reinnervated electromyography shows polyphasic action potentials 15 .\n\n In circumstances where electrophysiological studies do not detect a loss of axonal continuity or Wallerian degeneration it is advisable to have a period of "watchful waiting" with regular nerve conduction studies to confirm that nerve transmission is not deteriorating 15, 16 .\n\n In any of the cases described above, patients presenting with a nerve injury should always be referred to a specialist in order to start the most appropriate treatment as early as possible.\n\n In facial surgery nerve injuries have been reported following procedures such as blepharoplasties, rhinoplasties, genioplasties and most commonly in rhytidectomies 16 . There have been some distressing reports of blindness following blepharoplasties. Data collected regarding rhinoplasties has reported cases of sensory loss of the nose-tip and injuries resulting from genioplasties have caused anesthesia or dysesthesia affecting the lips, chin and in some cases, paresthesia or paralysis of the lower lip. However rhytidectomies are the commonest cause of facial nerve injuries. Patients can present with paresis with loss of function of the facial nerve-an event which can have a significant psychological impact for the patient 14 .\n\n The majority of nerve injuries following rhytidectomies show sensory loss with the great auricular nerve being the most commonly affected. This is followed by injuries resulting in loss of motor function affecting in decreasing order the following divisions of the facial nerve: temporal, marginal mandibular, buccal and zygomatic.\n\n There are some reports that rhytidectomies performed endoscopically on the upper third and upper half of the face can lead to complications such as transitory paresis of the temporal and zygomatic branches of the facial nerve showing recovery within six months after the procedure. When the procedure is carried out using ultrasound-assisted liposuction the incidence of motor nerve injuries is 7.6% (affecting the marginal mandibular branch) 17 .\n\n Although an uncommon outcome from aesthetic surgery of the neck, injury to the spinal accessory nerve has been documented following cervicofacial lift and is most likely due to scar formation developing around the nerve. 32 About 20% of injuries affecting the motor function of the facial nerve following rhytidectomies fail to show any spontaneous recovery of function.\n\n The facial nerve and its branches travel along the anteromedial aspect of the parotid gland, running in a deep plane towards the superficial muscular and aponeurotic system (SMAS). The facial muscles are therefore innervated by the facial nerve from a deep position with the exception of the muscles elevating the corner of the mouth: buccinator and mentalis. With this in mind it is therefore necessary to perform a superficial dissection of the SMAS in order to avoid nerve-related complications 2, 14, 16 . Fig. 1 . Zeckel´s nerve risk zones during face lift; major to minor risk; 1= great auricular nerve, 2= frontal branch of facial nerve, 3= marginal branch of facial nerve, 4= buccal branch of facial nerve, 5= supraorbital nerve, 6= infrorbital nerve, 7= mental nerve Furthermore dissections of the posterior aspect of the sternocleidomastoid muscle ought to be undertaken with caution from beneath the mastoid process where the great auricular nerve runs more superficially thus increasing the risk of injury. Care must therefore be taken when using electrocautery while dissecting the superficial nerves.\n\n Permanent damage to the nerve results in hypoesthesia or, in patients with a neuroma, painful dysesthesia in the lower two thirds of the ear and the skin of the neck and cheek. The temporal branch of the facial nerve poses the greatest risk of motor damage followed by the marginal mandibular and buccal branches. In terms of anatomical regions, the temporofrontal region, the angle of the mandible and the pre-parotid region are the riskiest areas in terms of nerve injury 4, 8 .\n\n The temporal branch of the facial nerve is the thickest and is located anterior and caudal to the frontal branch of the superficial temporal artery in 91% of cases. Seckel locates the temporal branch in an area he describes as Facial Zone 2, where the nerve branch originates below the parotid gland at the level of the zygomatic arch before innervating the frontal muscle. Injury to the nerve results in paralysis of this muscle but orbicular function remains intact owing to the dual innervation it receives from the inferior zygomatic branches.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.8161, 0.0512, 0.0688]])
Evaluation
Metrics
Information Retrieval
- Dataset:
med-eval-500q-10kd - Evaluated with
InformationRetrievalEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.878 |
| cosine_accuracy@3 | 0.966 |
| cosine_accuracy@5 | 0.986 |
| cosine_accuracy@10 | 0.99 |
| cosine_precision@1 | 0.878 |
| cosine_precision@3 | 0.322 |
| cosine_precision@5 | 0.1972 |
| cosine_precision@10 | 0.099 |
| cosine_recall@1 | 0.878 |
| cosine_recall@3 | 0.966 |
| cosine_recall@5 | 0.986 |
| cosine_recall@10 | 0.99 |
| cosine_ndcg@10 | 0.9394 |
| cosine_mrr@10 | 0.9225 |
| cosine_map@100 | 0.9229 |
Training Details
Training Dataset
med
- Dataset: med at 7d144a0
- Size: 307,200 training samples
- Columns:
questionandpassage_text - Approximate statistics based on the first 1000 samples:
question passage_text type string string details - min: 9 tokens
- mean: 22.67 tokens
- max: 53 tokens
- min: 484 tokens
- mean: 969.41 tokens
- max: 1515 tokens
- Samples:
question passage_text What are the potential challenges for patients transitioning from once or twice-daily basal insulin injections to a more complex regimen with multiple daily injections of bolus insulin?However, the immediate transition from once-or twice-daily injection of basal insulin to a complex regimen consisting of 4 -5 total daily injections with the addition of three injections of bolus insulin may be initially challenging for some patients, thus making the stepwise approach a more attractive option. The success of the MDI approach outlined above also assumes consistent carbohydrate intake at each meal. For patients who wish to vary their carbohydrate intake from meal to meal and day to day, carbohydrate counting is recommended.
Bergenstal et al. conducted a study in patients with T2DM that compared the effectiveness of a simple algorithm to adjust bolus insulin dosing based on a weekly average of pre-meal SMBG levels versus an algorithm based on mealtime carbohydrate counting (41). Both approaches resulted in similar levels of glycemic control (approximately a 1.5% reduction in HbA 1C ), with a low risk for severe hypoglycemia (4.9 versus 8.0 events/patient-year, respectiv...What is the preferred approach for the treatment of Alzheimer's disease?20 The risk
Alzheimer's disease cannot be cured, it can only be managed and doing so is a big challenge. If the patient has no significant comorbidities, the patient will be relatively healthy in the physical sense and will be ambulatory until late in the progression of the disease. But for both patient and the patient's caretakers, the cognitive deficits and the behavioral and emotional issues associated with Alzheimer's disease -which in many instances are inextricably linked -can create enormously frustrating situations.
It is impossible to improve a patient's memory, executive function, language difficulties; cognitive deficits cannot be reversed; the patient's condition always declines; and behavioral and emotional problems will always occur. But with skillful application of environmental and psychosocial interventions and targeted, judicious use of medications, the patient who has Alzheimer's disease can be safely and effectively cared for.
Drug therapy for the treatment o...What are the common symptoms and misdiagnoses associated with paradoxical vocal cord motion (PVCM)?P aradoxical vocal cord motion (PVCM) is a rare disease that is characterized by vocal cord adduction during inspiration and/or FIGURE 2. The well-capsulated mass was easily enucleated after mobilization of the facial nerve.
expiration. It was first described 1 by Dunglison and was named as Bhysteric croup[ in 1842. The initial presentation may include shortness of breath, wheezing, respiratory stridor, or breathy dysphonia. 2 Paradoxical vocal cord motion may be expedited by exercise and emotional mood, and it is usually misdiagnosed and mistreated as asthma. It affects mainly children and young adults within middle age. It has a reported 2:1 female predominance. 1 In its long-term treatment, speech therapy, psychologic counseling, or other modalities may be used to avoid reattack. 2 There are many treatment modalities for acute attack of PVCM, including reassurance and onsite maneuvers, benzodiazepines, heliox (gaseous mixture of oxygen and helium), and nebulized lignocaine. 1 In t... - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 16, "gather_across_devices": false }
Evaluation Dataset
med
- Dataset: med at 7d144a0
- Size: 36,720 evaluation samples
- Columns:
questionandpassage_text - Approximate statistics based on the first 1000 samples:
question passage_text type string string details - min: 10 tokens
- mean: 22.84 tokens
- max: 59 tokens
- min: 481 tokens
- mean: 972.34 tokens
- max: 1460 tokens
- Samples:
question passage_text How can the use of autogenous bone, demineralized freeze-dried bone, or hydroxyapatite support the function of membranes in guided bone regeneration procedures?All the other patients had excellent results.
In the group of donor site and other cavernous defects, there was one perforation after a week, and Guided bone regeneration with titanium membranes 313 the procedure failed. The membrane was exposed and most of the graft material had disappeared five months postoperatively. In the group of peri-implant defects there were three exposed membranes (two weeks to five months after surgery), and one of them failed. Peri-implantitis had been treated by curettage and grafting with Algipore and a titanium membrane was used to cover it.
There were most problems in the onlay graft group, eight membranes were exposed one week to five months postoperatively. In three cases (exposed at 2-4 weeks) there was considerable loss of the grafted material, and in one of these patients a fixture applied at the same time was lost. The remaining patients had satisfactory results.
Routinely used non-absorbable PTFE membranes or resorbable membranes (Bio Gide®...How does L-T4 treatment work for hypothyroidism and what factors can affect its bioavailability?The desire for better individualized treatment for hypothyroid patients has led to research to clarify the role of genetic polymorphisms on L-T4 bioavailability. P-gp is a well-known transport pro-tein found mostly in the cellular membrane of different cell types in the intestine, kidney, blood-brain barrier and parathyroid glands (Thiebaut et al., 1987; Borst and Schinkel, 1997) . P-gp, an ATPdependent efflux transporter, acts as a physiological barrier by extruding a wide range of substances, from xenobiotics to endogenous compounds such as pesticides, anticancer drugs, antibiotics, cardiac glycosides, small proteins and hormones (Schinkel, 1997) . P-gp is encoded by the MDR1 gene, which is located in the region 7q21.12 of chromosome 7 in humans (Wolking et al., 2015) .
MDR1 has a crucial role in drug disposition, and genetic polymorphisms in this gene might alter the pharmacokinetics and bioavailability of a diverse range of P-gp substrates (Kurose et al., 2008) . Although many va...Can pharmacological agents be used to induce myocardial preconditioning and protect against ischemia-reperfusion injury?The landmark study by Murry et al.
14 exposed anesthetized, open-chest dogs to four cycles of 5 min coronary artery occlusions followed by 5 min of reperfusion before the onset of 40 min of coronary occlusion and 4 days of reperfusion. The animals receiving the 'IPC' displayed significantly smaller infarct sizes when compared with the control animals. The original paper by Murry et al.
14 has been cited over 3200 times, demonstrating the importance of this paradoxical discovery that ischaemia protects from itself. Since this remarkable discovery in 1986, there has been a plethora of experimental investigations to define the cellular and molecular signals and pathways that elicit the reduction in infarct size. Numerous studies have provided tremendous insights into the mechanisms of IPC in a variety of animal species including both in vitro and in vivo model systems. Please see the elegant reviews by Downey et al., 17 Das and Das, 18 Hausenloy et al., 19 and Bolli et al. 20 for a de... - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 16, "gather_across_devices": false }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 192per_device_eval_batch_size: 192learning_rate: 2e-05num_train_epochs: 1warmup_ratio: 0.1bf16: Trueload_best_model_at_end: Trueprompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}batch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 192per_device_eval_batch_size: 192per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}batch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss | Validation Loss | med-eval-500q-10kd_cosine_ndcg@10 |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.8560 |
| 0.0058 | 20 | 0.1737 | - | - |
| 0.0116 | 40 | 0.0878 | - | - |
| 0.0174 | 60 | 0.0703 | - | - |
| 0.0232 | 80 | 0.0621 | - | - |
| 0.0290 | 100 | 0.0549 | - | - |
| 0.0349 | 120 | 0.0469 | - | - |
| 0.0407 | 140 | 0.0429 | - | - |
| 0.0465 | 160 | 0.0458 | - | - |
| 0.0523 | 180 | 0.0392 | - | - |
| 0.0581 | 200 | 0.0462 | 0.0491 | 0.9318 |
| 0.0639 | 220 | 0.0446 | - | - |
| 0.0697 | 240 | 0.049 | - | - |
| 0.0755 | 260 | 0.039 | - | - |
| 0.0813 | 280 | 0.0567 | - | - |
| 0.0871 | 300 | 0.0534 | - | - |
| 0.0929 | 320 | 0.053 | - | - |
| 0.0988 | 340 | 0.0568 | - | - |
| 0.1046 | 360 | 0.0589 | - | - |
| 0.1104 | 380 | 0.052 | - | - |
| 0.1162 | 400 | 0.0499 | 0.0532 | 0.9101 |
| 0.1220 | 420 | 0.0527 | - | - |
| 0.1278 | 440 | 0.0523 | - | - |
| 0.1336 | 460 | 0.0542 | - | - |
| 0.1394 | 480 | 0.0518 | - | - |
| 0.1452 | 500 | 0.0485 | - | - |
| 0.1510 | 520 | 0.0517 | - | - |
| 0.1568 | 540 | 0.0586 | - | - |
| 0.1626 | 560 | 0.0611 | - | - |
| 0.1685 | 580 | 0.0502 | - | - |
| 0.1743 | 600 | 0.056 | 0.0493 | 0.9145 |
| 0.1801 | 620 | 0.0536 | - | - |
| 0.1859 | 640 | 0.0584 | - | - |
| 0.1917 | 660 | 0.0494 | - | - |
| 0.1975 | 680 | 0.0499 | - | - |
| 0.2033 | 700 | 0.0496 | - | - |
| 0.2091 | 720 | 0.0578 | - | - |
| 0.2149 | 740 | 0.0454 | - | - |
| 0.2207 | 760 | 0.0586 | - | - |
| 0.2265 | 780 | 0.0466 | - | - |
| 0.2324 | 800 | 0.0538 | 0.0474 | 0.9287 |
| 0.2382 | 820 | 0.0463 | - | - |
| 0.2440 | 840 | 0.0376 | - | - |
| 0.2498 | 860 | 0.0478 | - | - |
| 0.2556 | 880 | 0.0406 | - | - |
| 0.2614 | 900 | 0.0463 | - | - |
| 0.2672 | 920 | 0.0546 | - | - |
| 0.2730 | 940 | 0.0417 | - | - |
| 0.2788 | 960 | 0.0448 | - | - |
| 0.2846 | 980 | 0.0483 | - | - |
| 0.2904 | 1000 | 0.0437 | 0.0438 | 0.9176 |
| 0.2963 | 1020 | 0.0411 | - | - |
| 0.3021 | 1040 | 0.0446 | - | - |
| 0.3079 | 1060 | 0.0405 | - | - |
| 0.3137 | 1080 | 0.0429 | - | - |
| 0.3195 | 1100 | 0.047 | - | - |
| 0.3253 | 1120 | 0.0413 | - | - |
| 0.3311 | 1140 | 0.0436 | - | - |
| 0.3369 | 1160 | 0.0386 | - | - |
| 0.3427 | 1180 | 0.0326 | - | - |
| 0.3485 | 1200 | 0.0402 | 0.0413 | 0.9290 |
| 0.3543 | 1220 | 0.0412 | - | - |
| 0.3602 | 1240 | 0.0354 | - | - |
| 0.3660 | 1260 | 0.0419 | - | - |
| 0.3718 | 1280 | 0.037 | - | - |
| 0.3776 | 1300 | 0.0405 | - | - |
| 0.3834 | 1320 | 0.0403 | - | - |
| 0.3892 | 1340 | 0.0337 | - | - |
| 0.3950 | 1360 | 0.0386 | - | - |
| 0.4008 | 1380 | 0.0368 | - | - |
| 0.4066 | 1400 | 0.037 | 0.0396 | 0.9238 |
| 0.4124 | 1420 | 0.0355 | - | - |
| 0.4182 | 1440 | 0.0387 | - | - |
| 0.4240 | 1460 | 0.0405 | - | - |
| 0.4299 | 1480 | 0.0477 | - | - |
| 0.4357 | 1500 | 0.0417 | - | - |
| 0.4415 | 1520 | 0.0346 | - | - |
| 0.4473 | 1540 | 0.0371 | - | - |
| 0.4531 | 1560 | 0.0391 | - | - |
| 0.4589 | 1580 | 0.0364 | - | - |
| 0.4647 | 1600 | 0.0379 | 0.0360 | 0.9394 |
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.1.2
- Transformers: 4.57.2
- PyTorch: 2.9.0+cu126
- Accelerate: 1.12.0
- Datasets: 4.0.0
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 14
Model tree for vectorranger/embeddinggemma-300m-medical-300k
Base model
google/embeddinggemma-300mDataset used to train vectorranger/embeddinggemma-300m-medical-300k
Space using vectorranger/embeddinggemma-300m-medical-300k 1
Evaluation results
- Cosine Accuracy@1 on med eval 500q 10kdself-reported0.878
- Cosine Accuracy@3 on med eval 500q 10kdself-reported0.966
- Cosine Accuracy@5 on med eval 500q 10kdself-reported0.986
- Cosine Accuracy@10 on med eval 500q 10kdself-reported0.990
- Cosine Precision@1 on med eval 500q 10kdself-reported0.878
- Cosine Precision@3 on med eval 500q 10kdself-reported0.322
- Cosine Precision@5 on med eval 500q 10kdself-reported0.197
- Cosine Precision@10 on med eval 500q 10kdself-reported0.099
- Cosine Recall@1 on med eval 500q 10kdself-reported0.878
- Cosine Recall@3 on med eval 500q 10kdself-reported0.966