EmbeddingGemma-300m trained on vectorranger/med (Filtered medical Q/A from MIRIAD)

This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the med dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: google/embeddinggemma-300m
Maximum Sequence Length: 2048 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- med
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("embeddinggemma-300m-medical-300k")
# Run inference
queries = [
    "What are the benefits of using laparoscopy in the management of blunt abdominal trauma?\n",
]
documents = [
    'The use of laparoscopy in trauma has lagged behind in the otherwise rapid progression of this groundbreaking surgical tool. Although reports exist of the use of laparoscopy for the diagnosis of hemoperitoneum as far back as the 1920s, there is still a paucity of literature on this subject to this day. 1, 2 There is no doubt that this is related to the nature of trauma. There is often anxiety and concern to optimize the patient with the quickest possible intervention. It should be stated early in this discourse that there is no role for laparoscopy in the management of the patient with abdominal trauma who is hemodynamically unstable. The priority in this situation follows the standard life-saving principles of resuscitation, with quick access for hemostasis, which must in those situations be open surgery. Associated extraabdominal injuries like head injuries may also be worsened by the hemodynamic effects of carbon dioxide pneumoperitoneum and may preclude laparoscopy. The gasless laparoscopy technique has been described to attenuate this as well as to prevent air embolism and also pneumothorax in patients with occult diaphragmatic injuries. 3 Laparoscopy can be safely used when an intraabdominal injury is suspected in a patient, i.e., hemodynamically stable. These are patients with a systolic blood pressure of >100 mm Hg, diastolic blood pressure of >60 mm Hg, a heart rate of <110 beats per minute, and crystalloid resuscitation requirements of <2 L. 4 The objective of this review is to determine the scope of the diagnostic and therapeutic uses of laparoscopy in blunt abdominal trauma, and also to delineate the benefits, complications, as well as prospects of laparoscopy in patients with blunt abdominal trauma.\n\n The PubMed search engine was used to search for peer-reviewed articles. The keywords entered were laparoscopy, blunt, abdominal, and trauma. The search was filtered to include only articles written in the last 5 years. All 55 articles obtained from the database were then reviewed for relevance and sample size. Case reports were excluded.\n\n Several articles discussed the uses of laparoscopy in blunt abdominal trauma. The role of laparoscopy as the most sensitive detector of a breach of the peritoneum in penetrating abdominal trauma is immediately apparent. 5 It is instructive that the authors reviewed equally acknowledged the role of laparoscopy in diagnosis in blunt abdominal trauma. Johnson et al 5 started their study on the established premise that diagnostic laparoscopy (DL) had decreased the rate of nontherapeutic laparotomies in patients with penetrating abdominal injuries. They sort to determine whether DL similarly lowered nontherapeutic laparotomy in blunt abdominal injury. They found that coupled with diagnostic computed tomography (CT) scan, DL yielded a nontherapeutic laparotomy rate of 0% in patients with blunt abdominal trauma. They concluded that when combined with CT scan, DL is a useful tool in the initial evaluation of patients with blunt abdominal trauma. Lee et al 6 had similar findings demonstrating that the use of laparoscopy in patients with abdominal trauma safely decreased the laparotomy rate. 14, 15 Lin et al 16 have described a new approach for management of high-grade splenic injury laparoscopically. They, however, emphasize the need for adequate training on laparoscopy in trauma.\n\n Evaluation of diagnostic tools in blunt abdominal trauma remains a contemporary issue to clarify the need for appropriate surgical intervention. 17 This study clearly describes the safety of DL as an approach in blunt abdominal trauma. With the increasing trend for limited intervention in appropriately selected hemodynamically stable patients with blunt abdominal trauma, the role of DL is brought to the fore. [18] [19] [20] As minimal access surgery becomes more prominent, laparoscopic surgeons should equally remain aware of the potential complications that could arise when this approach is adopted in the management of patients with blunt abdominal trauma.\n\n \n\n Laparoscopy can be safely used both diagnostically and therapeutically in hemodynamically stable patients with blunt abdominal trauma.',
    "The study here complements those results; demonstrating that texture-based analysis of the StO 2 -contrast may yield similar statistical differences between response groups (P ¼ 0.044). Grey-level co-occurrence matrices analyses here, provided discriminant features by using volumetric tumour analysis, in addition to second-order statistical analyses that examined the pixel-by-pixel relationships of tumour heterogeneities within the parametric maps. Measures of spatial heterogeneity in tumour Table 2A reports the percentage of the statistical power. The numbers inside parentheses in this column indicate the number of non-responders (n2) required in this study to achieve a statistical power of minimum 80% in case that the number of responders (n1) is fixed at 27. physiology as conducted here, could potentially provide good characterisation of biological traits that influence tumour response to treatment. Such features include tumour hypoxia (Hockel and Vaupel, 2001) , and haematological characteristics such as blood flow and vascular density (Folkman, 2002) . These features have been shown to influence tumour cell proliferation and metabolism, and therefore may also affect chemosensitivity (Folkman, 2002) .\n\n The use of such measures better reflects tumour physiology, which is not homogeneous but rather spatially heterogeneous.\n\n Additionally, multiparametric analysis resulted in sensitive and specific combined markers for response classification. Logistic regression analysis demonstrated B10% improvement in all performance measures by using pairwise features compared to the case of using only one single feature. However, the naive Bayes and k-NN did not show a significant improvement. This may be related to the small sample size used and peaking phenomena (Jain et al, 2000) . Features into the pairwise models included: HbO 2 -cor, HbO 2 -hom, Hb-cor, HbO 2 -con, Hb-hom, and Hb-con. Individually, those non-texture DOS parameters were previously correlated to tumour vasculature (Intes, 2005) . Additionally, the heterogenic tumour vasculature has been linked to mediating drug resistance; caused by structural scaffolds that inhibit effective drug delivery (Teicher et al, 1990; Galmarini et al, 2000; Tredan et al, 2007) . These include poor vascular flow, increased interstitial fluid, and a tightly bound cellular matrix that may constrain drugs from reaching into the tumour stroma thereby affecting the efficacy of chemotherapies.\n\n In comparison to other studies, texture analysis of MRI ( spectroscopy (Sadeghi-Naini et al, 2014) , and DOS (Sadeghi-Naini et al, 2015) images have been used to assess and monitor chemotherapy response in breast tumours during the course of treatment. Textural analysis of pretreatment MRI-based kinetic maps have indicated positive results for predicting chemotherapy response in 'triple-negative' breast tumours (Golden et al, 2013) . Those results also strongly suggest that pretreatment tumour heterogeneity can influence drug resistance (Golden et al, 2013) . Other similar studies have examined texture features of dynamic contrast-enhanced MRI images to predict NAC response (Ahmed et al, 2013; Teruel et al, 2014) . Results have indicated significant differences in GLCM texture features between responders and non-responders at pretreatment (Ahmed et al, 2013) and have reported an increase in textural heterogeneity caused by necrotic tumour areas (Ahmed et al, 2013) . Those studies demonstrated comparable frameworks to the present study. Specifically, that heterogeneous tumour features caused by pathophysiology, and initial biochemical composition might play an important role in chemoresistance.\n\n In terms of novelty, the results indicate that selecting volumetric tumour-based ROIs may improve the method for DOS texture analysis to predict NAC response. Additionally, we compared the performance of several classification methods and found that using naive Bayes classifier demonstrated high accuracy in predicting chemotherapy treatment response. The preliminary work in this study highlights an important phase in the 'imaging biomarker roadmap' outlined by Cancer Research UK (CRUK) and the European Organisation for Research and Treatment of Cancer (EORTC) (O'Connor et al, 2017) . Diffuse optical spectroscopybased biomarkers have surpassed the initial translational gap outlined within this roadmap; specifically, as a useful tool in medical research (O'Connor et al, 2017) .",
    'When the motor end plates are reinnervated electromyography shows polyphasic action potentials 15 .\n\n In circumstances where electrophysiological studies do not detect a loss of axonal continuity or Wallerian degeneration it is advisable to have a period of "watchful waiting" with regular nerve conduction studies to confirm that nerve transmission is not deteriorating 15, 16 .\n\n In any of the cases described above, patients presenting with a nerve injury should always be referred to a specialist in order to start the most appropriate treatment as early as possible.\n\n In facial surgery nerve injuries have been reported following procedures such as blepharoplasties, rhinoplasties, genioplasties and most commonly in rhytidectomies 16 . There have been some distressing reports of blindness following blepharoplasties. Data collected regarding rhinoplasties has reported cases of sensory loss of the nose-tip and injuries resulting from genioplasties have caused anesthesia or dysesthesia affecting the lips, chin and in some cases, paresthesia or paralysis of the lower lip. However rhytidectomies are the commonest cause of facial nerve injuries. Patients can present with paresis with loss of function of the facial nerve-an event which can have a significant psychological impact for the patient 14 .\n\n The majority of nerve injuries following rhytidectomies show sensory loss with the great auricular nerve being the most commonly affected. This is followed by injuries resulting in loss of motor function affecting in decreasing order the following divisions of the facial nerve: temporal, marginal mandibular, buccal and zygomatic.\n\n There are some reports that rhytidectomies performed endoscopically on the upper third and upper half of the face can lead to complications such as transitory paresis of the temporal and zygomatic branches of the facial nerve showing recovery within six months after the procedure. When the procedure is carried out using ultrasound-assisted liposuction the incidence of motor nerve injuries is 7.6% (affecting the marginal mandibular branch) 17 .\n\n Although an uncommon outcome from aesthetic surgery of the neck, injury to the spinal accessory nerve has been documented following cervicofacial lift and is most likely due to scar formation developing around the nerve. 32 About 20% of injuries affecting the motor function of the facial nerve following rhytidectomies fail to show any spontaneous recovery of function.\n\n The facial nerve and its branches travel along the anteromedial aspect of the parotid gland, running in a deep plane towards the superficial muscular and aponeurotic system (SMAS). The facial muscles are therefore innervated by the facial nerve from a deep position with the exception of the muscles elevating the corner of the mouth: buccinator and mentalis. With this in mind it is therefore necessary to perform a superficial dissection of the SMAS in order to avoid nerve-related complications 2, 14, 16 . Fig. 1 . Zeckel´s nerve risk zones during face lift; major to minor risk; 1= great auricular nerve, 2= frontal branch of facial nerve, 3= marginal branch of facial nerve, 4= buccal branch of facial nerve, 5= supraorbital nerve, 6= infrorbital nerve, 7= mental nerve Furthermore dissections of the posterior aspect of the sternocleidomastoid muscle ought to be undertaken with caution from beneath the mastoid process where the great auricular nerve runs more superficially thus increasing the risk of injury. Care must therefore be taken when using electrocautery while dissecting the superficial nerves.\n\n Permanent damage to the nerve results in hypoesthesia or, in patients with a neuroma, painful dysesthesia in the lower two thirds of the ear and the skin of the neck and cheek. The temporal branch of the facial nerve poses the greatest risk of motor damage followed by the marginal mandibular and buccal branches. In terms of anatomical regions, the temporofrontal region, the angle of the mandible and the pre-parotid region are the riskiest areas in terms of nerve injury 4, 8 .\n\n The temporal branch of the facial nerve is the thickest and is located anterior and caudal to the frontal branch of the superficial temporal artery in 91% of cases. Seckel locates the temporal branch in an area he describes as Facial Zone 2, where the nerve branch originates below the parotid gland at the level of the zygomatic arch before innervating the frontal muscle. Injury to the nerve results in paralysis of this muscle but orbicular function remains intact owing to the dual innervation it receives from the inferior zygomatic branches.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.8161, 0.0512, 0.0688]])

Evaluation

Metrics

Information Retrieval

Dataset: med-eval-500q-10kd
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.878
cosine_accuracy@3	0.966
cosine_accuracy@5	0.986
cosine_accuracy@10	0.99
cosine_precision@1	0.878
cosine_precision@3	0.322
cosine_precision@5	0.1972
cosine_precision@10	0.099
cosine_recall@1	0.878
cosine_recall@3	0.966
cosine_recall@5	0.986
cosine_recall@10	0.99
cosine_ndcg@10	0.9394
cosine_mrr@10	0.9225
cosine_map@100	0.9229

Training Details

Training Dataset

med

Dataset: med at 7d144a0
Size: 307,200 training samples
Columns: question and passage_text
Approximate statistics based on the first 1000 samples:
question passage_text
type string string
details
min: 9 tokens
mean: 22.67 tokens
max: 53 tokens

min: 484 tokens
mean: 969.41 tokens
max: 1515 tokens

	question	passage_text
type	string	string
details	min: 9 tokens mean: 22.67 tokens max: 53 tokens	min: 484 tokens mean: 969.41 tokens max: 1515 tokens

Samples:

question	passage_text
`What are the potential challenges for patients transitioning from once or twice-daily basal insulin injections to a more complex regimen with multiple daily injections of bolus insulin?`	However, the immediate transition from once-or twice-daily injection of basal insulin to a complex regimen consisting of 4 -5 total daily injections with the addition of three injections of bolus insulin may be initially challenging for some patients, thus making the stepwise approach a more attractive option. The success of the MDI approach outlined above also assumes consistent carbohydrate intake at each meal. For patients who wish to vary their carbohydrate intake from meal to meal and day to day, carbohydrate counting is recommended. Bergenstal et al. conducted a study in patients with T2DM that compared the effectiveness of a simple algorithm to adjust bolus insulin dosing based on a weekly average of pre-meal SMBG levels versus an algorithm based on mealtime carbohydrate counting (41). Both approaches resulted in similar levels of glycemic control (approximately a 1.5% reduction in HbA 1C ), with a low risk for severe hypoglycemia (4.9 versus 8.0 events/patient-year, respectiv...
`What is the preferred approach for the treatment of Alzheimer's disease?`	20 The risk Alzheimer's disease cannot be cured, it can only be managed and doing so is a big challenge. If the patient has no significant comorbidities, the patient will be relatively healthy in the physical sense and will be ambulatory until late in the progression of the disease. But for both patient and the patient's caretakers, the cognitive deficits and the behavioral and emotional issues associated with Alzheimer's disease -which in many instances are inextricably linked -can create enormously frustrating situations. It is impossible to improve a patient's memory, executive function, language difficulties; cognitive deficits cannot be reversed; the patient's condition always declines; and behavioral and emotional problems will always occur. But with skillful application of environmental and psychosocial interventions and targeted, judicious use of medications, the patient who has Alzheimer's disease can be safely and effectively cared for. Drug therapy for the treatment o...
`What are the common symptoms and misdiagnoses associated with paradoxical vocal cord motion (PVCM)?`	P aradoxical vocal cord motion (PVCM) is a rare disease that is characterized by vocal cord adduction during inspiration and/or FIGURE 2. The well-capsulated mass was easily enucleated after mobilization of the facial nerve. expiration. It was first described 1 by Dunglison and was named as Bhysteric croup[ in 1842. The initial presentation may include shortness of breath, wheezing, respiratory stridor, or breathy dysphonia. 2 Paradoxical vocal cord motion may be expedited by exercise and emotional mood, and it is usually misdiagnosed and mistreated as asthma. It affects mainly children and young adults within middle age. It has a reported 2:1 female predominance. 1 In its long-term treatment, speech therapy, psychologic counseling, or other modalities may be used to avoid reattack. 2 There are many treatment modalities for acute attack of PVCM, including reassurance and onsite maneuvers, benzodiazepines, heliox (gaseous mixture of oxygen and helium), and nebulized lignocaine. 1 In t...

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 16,
    "gather_across_devices": false
}

Evaluation Dataset

med

Dataset: med at 7d144a0
Size: 36,720 evaluation samples
Columns: question and passage_text
Approximate statistics based on the first 1000 samples:
question passage_text
type string string
details
min: 10 tokens
mean: 22.84 tokens
max: 59 tokens

min: 481 tokens
mean: 972.34 tokens
max: 1460 tokens

	question	passage_text
type	string	string
details	min: 10 tokens mean: 22.84 tokens max: 59 tokens	min: 481 tokens mean: 972.34 tokens max: 1460 tokens

Samples:

question	passage_text
`How can the use of autogenous bone, demineralized freeze-dried bone, or hydroxyapatite support the function of membranes in guided bone regeneration procedures?`	All the other patients had excellent results. In the group of donor site and other cavernous defects, there was one perforation after a week, and Guided bone regeneration with titanium membranes 313 the procedure failed. The membrane was exposed and most of the graft material had disappeared five months postoperatively. In the group of peri-implant defects there were three exposed membranes (two weeks to five months after surgery), and one of them failed. Peri-implantitis had been treated by curettage and grafting with Algipore and a titanium membrane was used to cover it. There were most problems in the onlay graft group, eight membranes were exposed one week to five months postoperatively. In three cases (exposed at 2-4 weeks) there was considerable loss of the grafted material, and in one of these patients a fixture applied at the same time was lost. The remaining patients had satisfactory results. Routinely used non-absorbable PTFE membranes or resorbable membranes (Bio Gide®...
`How does L-T4 treatment work for hypothyroidism and what factors can affect its bioavailability?`	The desire for better individualized treatment for hypothyroid patients has led to research to clarify the role of genetic polymorphisms on L-T4 bioavailability. P-gp is a well-known transport pro-tein found mostly in the cellular membrane of different cell types in the intestine, kidney, blood-brain barrier and parathyroid glands (Thiebaut et al., 1987; Borst and Schinkel, 1997) . P-gp, an ATPdependent efflux transporter, acts as a physiological barrier by extruding a wide range of substances, from xenobiotics to endogenous compounds such as pesticides, anticancer drugs, antibiotics, cardiac glycosides, small proteins and hormones (Schinkel, 1997) . P-gp is encoded by the MDR1 gene, which is located in the region 7q21.12 of chromosome 7 in humans (Wolking et al., 2015) . MDR1 has a crucial role in drug disposition, and genetic polymorphisms in this gene might alter the pharmacokinetics and bioavailability of a diverse range of P-gp substrates (Kurose et al., 2008) . Although many va...
`Can pharmacological agents be used to induce myocardial preconditioning and protect against ischemia-reperfusion injury?`	The landmark study by Murry et al. 14 exposed anesthetized, open-chest dogs to four cycles of 5 min coronary artery occlusions followed by 5 min of reperfusion before the onset of 40 min of coronary occlusion and 4 days of reperfusion. The animals receiving the 'IPC' displayed significantly smaller infarct sizes when compared with the control animals. The original paper by Murry et al. 14 has been cited over 3200 times, demonstrating the importance of this paradoxical discovery that ischaemia protects from itself. Since this remarkable discovery in 1986, there has been a plethora of experimental investigations to define the cellular and molecular signals and pathways that elicit the reduction in infarct size. Numerous studies have provided tremendous insights into the mechanisms of IPC in a variety of animal species including both in vitro and in vivo model systems. Please see the elegant reviews by Downey et al., 17 Das and Das, 18 Hausenloy et al., 19 and Bolli et al. 20 for a de...

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 16,
    "gather_across_devices": false
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 192
per_device_eval_batch_size: 192
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
load_best_model_at_end: True
prompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 192
per_device_eval_batch_size: 192
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: {'question': 'task: search result | query: ', 'passage_text': 'title: none | text: '}
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss	med-eval-500q-10kd_cosine_ndcg@10
-1	-1	-	-	0.8560
0.0058	20	0.1737	-	-
0.0116	40	0.0878	-	-
0.0174	60	0.0703	-	-
0.0232	80	0.0621	-	-
0.0290	100	0.0549	-	-
0.0349	120	0.0469	-	-
0.0407	140	0.0429	-	-
0.0465	160	0.0458	-	-
0.0523	180	0.0392	-	-
0.0581	200	0.0462	0.0491	0.9318
0.0639	220	0.0446	-	-
0.0697	240	0.049	-	-
0.0755	260	0.039	-	-
0.0813	280	0.0567	-	-
0.0871	300	0.0534	-	-
0.0929	320	0.053	-	-
0.0988	340	0.0568	-	-
0.1046	360	0.0589	-	-
0.1104	380	0.052	-	-
0.1162	400	0.0499	0.0532	0.9101
0.1220	420	0.0527	-	-
0.1278	440	0.0523	-	-
0.1336	460	0.0542	-	-
0.1394	480	0.0518	-	-
0.1452	500	0.0485	-	-
0.1510	520	0.0517	-	-
0.1568	540	0.0586	-	-
0.1626	560	0.0611	-	-
0.1685	580	0.0502	-	-
0.1743	600	0.056	0.0493	0.9145
0.1801	620	0.0536	-	-
0.1859	640	0.0584	-	-
0.1917	660	0.0494	-	-
0.1975	680	0.0499	-	-
0.2033	700	0.0496	-	-
0.2091	720	0.0578	-	-
0.2149	740	0.0454	-	-
0.2207	760	0.0586	-	-
0.2265	780	0.0466	-	-
0.2324	800	0.0538	0.0474	0.9287
0.2382	820	0.0463	-	-
0.2440	840	0.0376	-	-
0.2498	860	0.0478	-	-
0.2556	880	0.0406	-	-
0.2614	900	0.0463	-	-
0.2672	920	0.0546	-	-
0.2730	940	0.0417	-	-
0.2788	960	0.0448	-	-
0.2846	980	0.0483	-	-
0.2904	1000	0.0437	0.0438	0.9176
0.2963	1020	0.0411	-	-
0.3021	1040	0.0446	-	-
0.3079	1060	0.0405	-	-
0.3137	1080	0.0429	-	-
0.3195	1100	0.047	-	-
0.3253	1120	0.0413	-	-
0.3311	1140	0.0436	-	-
0.3369	1160	0.0386	-	-
0.3427	1180	0.0326	-	-
0.3485	1200	0.0402	0.0413	0.9290
0.3543	1220	0.0412	-	-
0.3602	1240	0.0354	-	-
0.3660	1260	0.0419	-	-
0.3718	1280	0.037	-	-
0.3776	1300	0.0405	-	-
0.3834	1320	0.0403	-	-
0.3892	1340	0.0337	-	-
0.3950	1360	0.0386	-	-
0.4008	1380	0.0368	-	-
0.4066	1400	0.037	0.0396	0.9238
0.4124	1420	0.0355	-	-
0.4182	1440	0.0387	-	-
0.4240	1460	0.0405	-	-
0.4299	1480	0.0477	-	-
0.4357	1500	0.0417	-	-
0.4415	1520	0.0346	-	-
0.4473	1540	0.0371	-	-
0.4531	1560	0.0391	-	-
0.4589	1580	0.0364	-	-
0.4647	1600	0.0379	0.0360	0.9394

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.1.2
Transformers: 4.57.2
PyTorch: 2.9.0+cu126
Accelerate: 1.12.0
Datasets: 4.0.0
Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Downloads last month: 14

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for vectorranger/embeddinggemma-300m-medical-300k

Base model

google/embeddinggemma-300m

Finetuned

(153)

this model

Dataset used to train vectorranger/embeddinggemma-300m-medical-300k

Space using vectorranger/embeddinggemma-300m-medical-300k 1

Evaluation results

Cosine Accuracy@1 on med eval 500q 10kd
self-reported

0.878
Cosine Accuracy@3 on med eval 500q 10kd
self-reported

0.966
Cosine Accuracy@5 on med eval 500q 10kd
self-reported

0.986
Cosine Accuracy@10 on med eval 500q 10kd
self-reported

0.990
Cosine Precision@1 on med eval 500q 10kd
self-reported

0.878
Cosine Precision@3 on med eval 500q 10kd
self-reported

0.322
Cosine Precision@5 on med eval 500q 10kd
self-reported

0.197
Cosine Precision@10 on med eval 500q 10kd
self-reported

0.099
Cosine Recall@1 on med eval 500q 10kd
self-reported

0.878
Cosine Recall@3 on med eval 500q 10kd
self-reported

0.966

View on Papers With Code