EmbeddingGemma-300m trained on vectorranger/med (Filtered medical Q/A from MIRIAD)

This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the med dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google/embeddinggemma-300m
  • Maximum Sequence Length: 2048 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("embeddinggemma-300m-medical-300k")
# Run inference
queries = [
    "What are the benefits of using laparoscopy in the management of blunt abdominal trauma?\n",
]
documents = [
    'The use of laparoscopy in trauma has lagged behind in the otherwise rapid progression of this groundbreaking surgical tool. Although reports exist of the use of laparoscopy for the diagnosis of hemoperitoneum as far back as the 1920s, there is still a paucity of literature on this subject to this day. 1, 2 There is no doubt that this is related to the nature of trauma. There is often anxiety and concern to optimize the patient with the quickest possible intervention. It should be stated early in this discourse that there is no role for laparoscopy in the management of the patient with abdominal trauma who is hemodynamically unstable. The priority in this situation follows the standard life-saving principles of resuscitation, with quick access for hemostasis, which must in those situations be open surgery. Associated extraabdominal injuries like head injuries may also be worsened by the hemodynamic effects of carbon dioxide pneumoperitoneum and may preclude laparoscopy. The gasless laparoscopy technique has been described to attenuate this as well as to prevent air embolism and also pneumothorax in patients with occult diaphragmatic injuries. 3 Laparoscopy can be safely used when an intraabdominal injury is suspected in a patient, i.e., hemodynamically stable. These are patients with a systolic blood pressure of >100 mm Hg, diastolic blood pressure of >60 mm Hg, a heart rate of <110 beats per minute, and crystalloid resuscitation requirements of <2 L. 4 The objective of this review is to determine the scope of the diagnostic and therapeutic uses of laparoscopy in blunt abdominal trauma, and also to delineate the benefits, complications, as well as prospects of laparoscopy in patients with blunt abdominal trauma.\n\n The PubMed search engine was used to search for peer-reviewed articles. The keywords entered were laparoscopy, blunt, abdominal, and trauma. The search was filtered to include only articles written in the last 5 years. All 55 articles obtained from the database were then reviewed for relevance and sample size. Case reports were excluded.\n\n Several articles discussed the uses of laparoscopy in blunt abdominal trauma. The role of laparoscopy as the most sensitive detector of a breach of the peritoneum in penetrating abdominal trauma is immediately apparent. 5 It is instructive that the authors reviewed equally acknowledged the role of laparoscopy in diagnosis in blunt abdominal trauma. Johnson et al 5 started their study on the established premise that diagnostic laparoscopy (DL) had decreased the rate of nontherapeutic laparotomies in patients with penetrating abdominal injuries. They sort to determine whether DL similarly lowered nontherapeutic laparotomy in blunt abdominal injury. They found that coupled with diagnostic computed tomography (CT) scan, DL yielded a nontherapeutic laparotomy rate of 0% in patients with blunt abdominal trauma. They concluded that when combined with CT scan, DL is a useful tool in the initial evaluation of patients with blunt abdominal trauma. Lee et al 6 had similar findings demonstrating that the use of laparoscopy in patients with abdominal trauma safely decreased the laparotomy rate. 14, 15 Lin et al 16 have described a new approach for management of high-grade splenic injury laparoscopically. They, however, emphasize the need for adequate training on laparoscopy in trauma.\n\n Evaluation of diagnostic tools in blunt abdominal trauma remains a contemporary issue to clarify the need for appropriate surgical intervention. 17 This study clearly describes the safety of DL as an approach in blunt abdominal trauma. With the increasing trend for limited intervention in appropriately selected hemodynamically stable patients with blunt abdominal trauma, the role of DL is brought to the fore. [18] [19] [20] As minimal access surgery becomes more prominent, laparoscopic surgeons should equally remain aware of the potential complications that could arise when this approach is adopted in the management of patients with blunt abdominal trauma.\n\n \n\n Laparoscopy can be safely used both diagnostically and therapeutically in hemodynamically stable patients with blunt abdominal trauma.',
    "The study here complements those results; demonstrating that texture-based analysis of the StO 2 -contrast may yield similar statistical differences between response groups (P ¼ 0.044). Grey-level co-occurrence matrices analyses here, provided discriminant features by using volumetric tumour analysis, in addition to second-order statistical analyses that examined the pixel-by-pixel relationships of tumour heterogeneities within the parametric maps. Measures of spatial heterogeneity in tumour Table 2A reports the percentage of the statistical power. The numbers inside parentheses in this column indicate the number of non-responders (n2) required in this study to achieve a statistical power of minimum 80% in case that the number of responders (n1) is fixed at 27. physiology as conducted here, could potentially provide good characterisation of biological traits that influence tumour response to treatment. Such features include tumour hypoxia (Hockel and Vaupel, 2001) , and haematological characteristics such as blood flow and vascular density (Folkman, 2002) . These features have been shown to influence tumour cell proliferation and metabolism, and therefore may also affect chemosensitivity (Folkman, 2002) .\n\n The use of such measures better reflects tumour physiology, which is not homogeneous but rather spatially heterogeneous.\n\n Additionally, multiparametric analysis resulted in sensitive and specific combined markers for response classification. Logistic regression analysis demonstrated B10% improvement in all performance measures by using pairwise features compared to the case of using only one single feature. However, the naive Bayes and k-NN did not show a significant improvement. This may be related to the small sample size used and peaking phenomena (Jain et al, 2000) . Features into the pairwise models included: HbO 2 -cor, HbO 2 -hom, Hb-cor, HbO 2 -con, Hb-hom, and Hb-con. Individually, those non-texture DOS parameters were previously correlated to tumour vasculature (Intes, 2005) . Additionally, the heterogenic tumour vasculature has been linked to mediating drug resistance; caused by structural scaffolds that inhibit effective drug delivery (Teicher et al, 1990; Galmarini et al, 2000; Tredan et al, 2007) . These include poor vascular flow, increased interstitial fluid, and a tightly bound cellular matrix that may constrain drugs from reaching into the tumour stroma thereby affecting the efficacy of chemotherapies.\n\n In comparison to other studies, texture analysis of MRI ( spectroscopy (Sadeghi-Naini et al, 2014) , and DOS (Sadeghi-Naini et al, 2015) images have been used to assess and monitor chemotherapy response in breast tumours during the course of treatment. Textural analysis of pretreatment MRI-based kinetic maps have indicated positive results for predicting chemotherapy response in 'triple-negative' breast tumours (Golden et al, 2013) . Those results also strongly suggest that pretreatment tumour heterogeneity can influence drug resistance (Golden et al, 2013) . Other similar studies have examined texture features of dynamic contrast-enhanced MRI images to predict NAC response (Ahmed et al, 2013; Teruel et al, 2014) . Results have indicated significant differences in GLCM texture features between responders and non-responders at pretreatment (Ahmed et al, 2013) and have reported an increase in textural heterogeneity caused by necrotic tumour areas (Ahmed et al, 2013) . Those studies demonstrated comparable frameworks to the present study. Specifically, that heterogeneous tumour features caused by pathophysiology, and initial biochemical composition might play an important role in chemoresistance.\n\n In terms of novelty, the results indicate that selecting volumetric tumour-based ROIs may improve the method for DOS texture analysis to predict NAC response. Additionally, we compared the performance of several classification methods and found that using naive Bayes classifier demonstrated high accuracy in predicting chemotherapy treatment response. The preliminary work in this study highlights an important phase in the 'imaging biomarker roadmap' outlined by Cancer Research UK (CRUK) and the European Organisation for Research and Treatment of Cancer (EORTC) (O'Connor et al, 2017) . Diffuse optical spectroscopybased biomarkers have surpassed the initial translational gap outlined within this roadmap; specifically, as a useful tool in medical research (O'Connor et al, 2017) .",
    'When the motor end plates are reinnervated electromyography shows polyphasic action potentials 15 .\n\n In circumstances where electrophysiological studies do not detect a loss of axonal continuity or Wallerian degeneration it is advisable to have a period of "watchful waiting" with regular nerve conduction studies to confirm that nerve transmission is not deteriorating 15, 16 .\n\n In any of the cases described above, patients presenting with a nerve injury should always be referred to a specialist in order to start the most appropriate treatment as early as possible.\n\n In facial surgery nerve injuries have been reported following procedures such as blepharoplasties, rhinoplasties, genioplasties and most commonly in rhytidectomies 16 . There have been some distressing reports of blindness following blepharoplasties. Data collected regarding rhinoplasties has reported cases of sensory loss of the nose-tip and injuries resulting from genioplasties have caused anesthesia or dysesthesia affecting the lips, chin and in some cases, paresthesia or paralysis of the lower lip. However rhytidectomies are the commonest cause of facial nerve injuries. Patients can present with paresis with loss of function of the facial nerve-an event which can have a significant psychological impact for the patient 14 .\n\n The majority of nerve injuries following rhytidectomies show sensory loss with the great auricular nerve being the most commonly affected. This is followed by injuries resulting in loss of motor function affecting in decreasing order the following divisions of the facial nerve: temporal, marginal mandibular, buccal and zygomatic.\n\n There are some reports that rhytidectomies performed endoscopically on the upper third and upper half of the face can lead to complications such as transitory paresis of the temporal and zygomatic branches of the facial nerve showing recovery within six months after the procedure. When the procedure is carried out using ultrasound-assisted liposuction the incidence of motor nerve injuries is 7.6% (affecting the marginal mandibular branch) 17 .\n\n Although an uncommon outcome from aesthetic surgery of the neck, injury to the spinal accessory nerve has been documented following cervicofacial lift and is most likely due to scar formation developing around the nerve. 32 About 20% of injuries affecting the motor function of the facial nerve following rhytidectomies fail to show any spontaneous recovery of function.\n\n The facial nerve and its branches travel along the anteromedial aspect of the parotid gland, running in a deep plane towards the superficial muscular and aponeurotic system (SMAS). The facial muscles are therefore innervated by the facial nerve from a deep position with the exception of the muscles elevating the corner of the mouth: buccinator and mentalis. With this in mind it is therefore necessary to perform a superficial dissection of the SMAS in order to avoid nerve-related complications 2, 14, 16 . Fig. 1 . Zeckel´s nerve risk zones during face lift; major to minor risk; 1= great auricular nerve, 2= frontal branch of facial nerve, 3= marginal branch of facial nerve, 4= buccal branch of facial nerve, 5= supraorbital nerve, 6= infrorbital nerve, 7= mental nerve Furthermore dissections of the posterior aspect of the sternocleidomastoid muscle ought to be undertaken with caution from beneath the mastoid process where the great auricular nerve runs more superficially thus increasing the risk of injury. Care must therefore be taken when using electrocautery while dissecting the superficial nerves.\n\n Permanent damage to the nerve results in hypoesthesia or, in patients with a neuroma, painful dysesthesia in the lower two thirds of the ear and the skin of the neck and cheek. The temporal branch of the facial nerve poses the greatest risk of motor damage followed by the marginal mandibular and buccal branches. In terms of anatomical regions, the temporofrontal region, the angle of the mandible and the pre-parotid region are the riskiest areas in terms of nerve injury 4, 8 .\n\n The temporal branch of the facial nerve is the thickest and is located anterior and caudal to the frontal branch of the superficial temporal artery in 91% of cases. Seckel locates the temporal branch in an area he describes as Facial Zone 2, where the nerve branch originates below the parotid gland at the level of the zygomatic arch before innervating the frontal muscle. Injury to the nerve results in paralysis of this muscle but orbicular function remains intact owing to the dual innervation it receives from the inferior zygomatic branches.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.8161, 0.0512, 0.0688]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.878
cosine_accuracy@3 0.966
cosine_accuracy@5 0.986
cosine_accuracy@10 0.99
cosine_precision@1 0.878
cosine_precision@3 0.322
cosine_precision@5 0.1972
cosine_precision@10 0.099
cosine_recall@1 0.878
cosine_recall@3 0.966
cosine_recall@5 0.986
cosine_recall@10 0.99
cosine_ndcg@10 0.9394
cosine_mrr@10 0.9225
cosine_map@100 0.9229

Training Details

Training Dataset

med

  • Dataset: med at 7d144a0
  • Size: 307,200 training samples
  • Columns: question and passage_text
  • Approximate statistics based on the first 1000 samples:
    question passage_text
    type string string
    details
    • min: 9 tokens
    • mean: 22.67 tokens
    • max: 53 tokens
    • min: 484 tokens
    • mean: 969.41 tokens
    • max: 1515 tokens
  • Samples:
    question passage_text
    What are the potential challenges for patients transitioning from once or twice-daily basal insulin injections to a more complex regimen with multiple daily injections of bolus insulin? However, the immediate transition from once-or twice-daily injection of basal insulin to a complex regimen consisting of 4 -5 total daily injections with the addition of three injections of bolus insulin may be initially challenging for some patients, thus making the stepwise approach a more attractive option. The success of the MDI approach outlined above also assumes consistent carbohydrate intake at each meal. For patients who wish to vary their carbohydrate intake from meal to meal and day to day, carbohydrate counting is recommended.

    Bergenstal et al. conducted a study in patients with T2DM that compared the effectiveness of a simple algorithm to adjust bolus insulin dosing based on a weekly average of pre-meal SMBG levels versus an algorithm based on mealtime carbohydrate counting (41). Both approaches resulted in similar levels of glycemic control (approximately a 1.5% reduction in HbA 1C ), with a low risk for severe hypoglycemia (4.9 versus 8.0 events/patient-year, respectiv...
    What is the preferred approach for the treatment of Alzheimer's disease? 20 The risk

    Alzheimer's disease cannot be cured, it can only be managed and doing so is a big challenge. If the patient has no significant comorbidities, the patient will be relatively healthy in the physical sense and will be ambulatory until late in the progression of the disease. But for both patient and the patient's caretakers, the cognitive deficits and the behavioral and emotional issues associated with Alzheimer's disease -which in many instances are inextricably linked -can create enormously frustrating situations.

    It is impossible to improve a patient's memory, executive function, language difficulties; cognitive deficits cannot be reversed; the patient's condition always declines; and behavioral and emotional problems will always occur. But with skillful application of environmental and psychosocial interventions and targeted, judicious use of medications, the patient who has Alzheimer's disease can be safely and effectively cared for.

    Drug therapy for the treatment o...
    What are the common symptoms and misdiagnoses associated with paradoxical vocal cord motion (PVCM)?
    P aradoxical vocal cord motion (PVCM) is a rare disease that is characterized by vocal cord adduction during inspiration and/or FIGURE 2. The well-capsulated mass was easily enucleated after mobilization of the facial nerve.

    expiration. It was first described 1 by Dunglison and was named as Bhysteric croup[ in 1842. The initial presentation may include shortness of breath, wheezing, respiratory stridor, or breathy dysphonia. 2 Paradoxical vocal cord motion may be expedited by exercise and emotional mood, and it is usually misdiagnosed and mistreated as asthma. It affects mainly children and young adults within middle age. It has a reported 2:1 female predominance. 1 In its long-term treatment, speech therapy, psychologic counseling, or other modalities may be used to avoid reattack. 2 There are many treatment modalities for acute attack of PVCM, including reassurance and onsite maneuvers, benzodiazepines, heliox (gaseous mixture of oxygen and helium), and nebulized lignocaine. 1 In t...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 16,
        "gather_across_devices": false
    }
    

Evaluation Dataset

med

  • Dataset: med at 7d144a0
  • Size: 36,720 evaluation samples
  • Columns: question and passage_text
  • Approximate statistics based on the first 1000 samples:
    question passage_text
    type string string
    details
    • min: 10 tokens
    • mean: 22.84 tokens
    • max: 59 tokens
    • min: 481 tokens
    • mean: 972.34 tokens
    • max: 1460 tokens
  • Samples:
    question passage_text
    How can the use of autogenous bone, demineralized freeze-dried bone, or hydroxyapatite support the function of membranes in guided bone regeneration procedures?
    All the other patients had excellent results.

    In the group of donor site and other cavernous defects, there was one perforation after a week, and Guided bone regeneration with titanium membranes 313 the procedure failed. The membrane was exposed and most of the graft material had disappeared five months postoperatively. In the group of peri-implant defects there were three exposed membranes (two weeks to five months after surgery), and one of them failed. Peri-implantitis had been treated by curettage and grafting with Algipore and a titanium membrane was used to cover it.

    There were most problems in the onlay graft group, eight membranes were exposed one week to five months postoperatively. In three cases (exposed at 2-4 weeks) there was considerable loss of the grafted material, and in one of these patients a fixture applied at the same time was lost. The remaining patients had satisfactory results.

    Routinely used non-absorbable PTFE membranes or resorbable membranes (Bio Gide®...
    How does L-T4 treatment work for hypothyroidism and what factors can affect its bioavailability? The desire for better individualized treatment for hypothyroid patients has led to research to clarify the role of genetic polymorphisms on L-T4 bioavailability. P-gp is a well-known transport pro-tein found mostly in the cellular membrane of different cell types in the intestine, kidney, blood-brain barrier and parathyroid glands (Thiebaut et al., 1987; Borst and Schinkel, 1997) . P-gp, an ATPdependent efflux transporter, acts as a physiological barrier by extruding a wide range of substances, from xenobiotics to endogenous compounds such as pesticides, anticancer drugs, antibiotics, cardiac glycosides, small proteins and hormones (Schinkel, 1997) . P-gp is encoded by the MDR1 gene, which is located in the region 7q21.12 of chromosome 7 in humans (Wolking et al., 2015) .

    MDR1 has a crucial role in drug disposition, and genetic polymorphisms in this gene might alter the pharmacokinetics and bioavailability of a diverse range of P-gp substrates (Kurose et al., 2008) . Although many va...
    Can pharmacological agents be used to induce myocardial preconditioning and protect against ischemia-reperfusion injury?
    The landmark study by Murry et al.

    14 exposed anesthetized, open-chest dogs to four cycles of 5 min coronary artery occlusions followed by 5 min of reperfusion before the onset of 40 min of coronary occlusion and 4 days of reperfusion. The animals receiving the 'IPC' displayed significantly smaller infarct sizes when compared with the control animals. The original paper by Murry et al.

    14 has been cited over 3200 times, demonstrating the importance of this paradoxical discovery that ischaemia protects from itself. Since this remarkable discovery in 1986, there has been a plethora of experimental investigations to define the cellular and molecular signals and pathways that elicit the reduction in infarct size. Numerous studies have provided tremendous insights into the mechanisms of IPC in a variety of animal species including both in vitro and in vivo model systems. Please see the elegant reviews by Downey et al., 17 Das and Das, 18 Hausenloy et al., 19 and Bolli et al. 20 for a de...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 16,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 192
  • per_device_eval_batch_size: 192
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True
  • prompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 192
  • per_device_eval_batch_size: 192
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss med-eval-500q-10kd_cosine_ndcg@10
-1 -1 - - 0.8560
0.0058 20 0.1737 - -
0.0116 40 0.0878 - -
0.0174 60 0.0703 - -
0.0232 80 0.0621 - -
0.0290 100 0.0549 - -
0.0349 120 0.0469 - -
0.0407 140 0.0429 - -
0.0465 160 0.0458 - -
0.0523 180 0.0392 - -
0.0581 200 0.0462 0.0491 0.9318
0.0639 220 0.0446 - -
0.0697 240 0.049 - -
0.0755 260 0.039 - -
0.0813 280 0.0567 - -
0.0871 300 0.0534 - -
0.0929 320 0.053 - -
0.0988 340 0.0568 - -
0.1046 360 0.0589 - -
0.1104 380 0.052 - -
0.1162 400 0.0499 0.0532 0.9101
0.1220 420 0.0527 - -
0.1278 440 0.0523 - -
0.1336 460 0.0542 - -
0.1394 480 0.0518 - -
0.1452 500 0.0485 - -
0.1510 520 0.0517 - -
0.1568 540 0.0586 - -
0.1626 560 0.0611 - -
0.1685 580 0.0502 - -
0.1743 600 0.056 0.0493 0.9145
0.1801 620 0.0536 - -
0.1859 640 0.0584 - -
0.1917 660 0.0494 - -
0.1975 680 0.0499 - -
0.2033 700 0.0496 - -
0.2091 720 0.0578 - -
0.2149 740 0.0454 - -
0.2207 760 0.0586 - -
0.2265 780 0.0466 - -
0.2324 800 0.0538 0.0474 0.9287
0.2382 820 0.0463 - -
0.2440 840 0.0376 - -
0.2498 860 0.0478 - -
0.2556 880 0.0406 - -
0.2614 900 0.0463 - -
0.2672 920 0.0546 - -
0.2730 940 0.0417 - -
0.2788 960 0.0448 - -
0.2846 980 0.0483 - -
0.2904 1000 0.0437 0.0438 0.9176
0.2963 1020 0.0411 - -
0.3021 1040 0.0446 - -
0.3079 1060 0.0405 - -
0.3137 1080 0.0429 - -
0.3195 1100 0.047 - -
0.3253 1120 0.0413 - -
0.3311 1140 0.0436 - -
0.3369 1160 0.0386 - -
0.3427 1180 0.0326 - -
0.3485 1200 0.0402 0.0413 0.9290
0.3543 1220 0.0412 - -
0.3602 1240 0.0354 - -
0.3660 1260 0.0419 - -
0.3718 1280 0.037 - -
0.3776 1300 0.0405 - -
0.3834 1320 0.0403 - -
0.3892 1340 0.0337 - -
0.3950 1360 0.0386 - -
0.4008 1380 0.0368 - -
0.4066 1400 0.037 0.0396 0.9238
0.4124 1420 0.0355 - -
0.4182 1440 0.0387 - -
0.4240 1460 0.0405 - -
0.4299 1480 0.0477 - -
0.4357 1500 0.0417 - -
0.4415 1520 0.0346 - -
0.4473 1540 0.0371 - -
0.4531 1560 0.0391 - -
0.4589 1580 0.0364 - -
0.4647 1600 0.0379 0.0360 0.9394

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.2
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
14
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vectorranger/embeddinggemma-300m-medical-300k

Finetuned
(153)
this model

Dataset used to train vectorranger/embeddinggemma-300m-medical-300k

Space using vectorranger/embeddinggemma-300m-medical-300k 1

Evaluation results