Skip to content

Instantly share code, notes, and snippets.

@astellon
Last active June 5, 2022 12:53
Show Gist options
  • Save astellon/fbbdacbafd46a061281cb7e4ef2a43f4 to your computer and use it in GitHub Desktop.
Save astellon/fbbdacbafd46a061281cb7e4ef2a43f4 to your computer and use it in GitHub Desktop.
A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder http://arxiv.org/abs/2201.09875v1
A COMPARISON OF DISCRETE AND SOFT SPEECH UNITS FOR IMPROVED VOICE CONVERSION http://arxiv.org/abs/2111.02392v1
A Configurable Multilingual Model is All You Need to Recognize All Languages http://arxiv.org/abs/2107.05876v1
A Framework for Private Communication with Secret Block Structure http://arxiv.org/abs/2110.04345v1
A GENERAL FRAMEWORK FOR DISTRIBUTED INFERENCE WITH UNCERTAIN MODELS http://arxiv.org/abs/2011.10669v1
A GENERALIZED HIERARCHICAL NONNEGATIVE TENSOR DECOMPOSITION http://arxiv.org/abs/2109.14820v2
A likelihood ratio based domain adaptation method for E2E models http://arxiv.org/abs/2201.03655v1
A Maximal Correlation Approach to Imposing Fairness in Machine Learning http://arxiv.org/abs/2012.15259v1
A METHOD TO REVEAL SPEAKER IDENTITY IN DISTRIBUTED ASR TRAINING, AND HOW TO COUNTER IT http://arxiv.org/abs/2104.07815v1
A NEW DATA AUGMENTATION METHOD FOR INTENT CLASSIFICATION ENHANCEMENT AND ITS APPLICATION ON SPOKEN CONVERSATION DATASETS http://arxiv.org/abs/2202.10137v1
A NOVEL 1D STATE SPACE FOR EFFICIENT MUSIC RHYTHMIC ANALYSIS http://arxiv.org/abs/2111.00704v2
A STUDY OF THE ROBUSTNESS OF RAW WAVEFORM BASED SPEAKER EMBEDDINGS UNDER MISMATCHED CONDITIONS http://arxiv.org/abs/2110.04265v2
A TIME ENCODING APPROACH TO TRAINING SPIKING NEURAL NETWORKS http://arxiv.org/abs/2110.06735v1
A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer http://arxiv.org/abs/2110.08598v2
ACCELERATED INTRAVASCULAR ULTRASOUND IMAGING USING DEEP REINFORCEMENT LEARNING http://arxiv.org/abs/2201.09522v1
ADAPTIVE GROUP TESTING WITH MISMATCHED MODELS http://arxiv.org/abs/2110.02265v1
ADIMA: ABUSE DETECTION IN MULTILINGUAL AUDIO http://arxiv.org/abs/2202.07991v1
ADVERSARIAL ROBUSTNESS BY DESIGN THROUGH ANALOG COMPUTING AND SYNTHETIC GRADIENTS http://arxiv.org/abs/2101.02115v1
Adversarial sample detection for speaker verification by neural vocoders http://arxiv.org/abs/2107.00309v4
AECMOS: A SPEECH QUALITY ASSESSMENT METRIC FOR ECHO IMPAIRMENT http://arxiv.org/abs/2110.03010v3
AERIAL BASE STATION PLACEMENT LEVERAGING RADIO TOMOGRAPHIC MAPS http://arxiv.org/abs/2109.07372v2
AISHELL-NER: NAMED ENTITY RECOGNITION FROM CHINESE SPEECH http://arxiv.org/abs/2202.08533v1
AMICABLE EXAMPLES FOR INFORMED SOURCE SEPARATION http://arxiv.org/abs/2110.05059v2
AN APPROACH TO MISPRONUNCIATION DETECTION AND DIAGNOSIS WITH ACOUSTIC, PHONETIC AND LINGUISTIC (APL) EMBEDDINGS http://arxiv.org/abs/2110.07274v2
AN ASYMPTOTICALLY OPTIMAL APPROXIMATION OF THE CONDITIONAL MEAN CHANNEL ESTIMATOR BASED ON GAUSSIAN MIXTURE MODELS http://arxiv.org/abs/2111.11064v1
An Embarrassingly Simple Model for Dialogue Relation Extraction http://arxiv.org/abs/2012.13873v2
AN INFORMATION MAXIMIZATION BASED BLIND SOURCE SEPARATION APPROACH FOR DEPENDENT AND INDEPENDENT SOURCES http://arxiv.org/abs/2205.00794v1
AN INVESTIGATION OF THE EFFECTIVENESS OF PHASE FOR AUDIO CLASSIFICATION http://arxiv.org/abs/2110.02878v3
ANALYZING THE ROBUSTNESS OF UNSUPERVISED SPEECH RECOGNITION http://arxiv.org/abs/2110.03509v5
APPLYING DIFFERENTIAL PRIVACY TO TENSOR COMPLETION http://arxiv.org/abs/2110.00539v4
APPROACHES TOWARD PHYSICAL AND GENERAL VIDEO ANOMALY DETECTION http://arxiv.org/abs/2112.07661v1
ASSEM-VC: REALISTIC VOICE CONVERSION BY ASSEMBLING MODERN SPEECH SYNTHESIS TECHNIQUES http://arxiv.org/abs/2104.00931v2
ATTENTIVE MAX FEATURE MAP AND JOINT TRAINING FOR ACOUSTIC SCENE CLASSIFICATION http://arxiv.org/abs/2104.07213v2
ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS http://arxiv.org/abs/2202.08900v2
AUDIOCLIP: EXTENDING CLIP TO IMAGE, TEXT AND AUDIO http://arxiv.org/abs/2106.13043v1
AUGMENTING MOLECULAR DEEP GENERATIVE MODELS WITH TOPOLOGICAL DATA ANALYSIS REPRESENTATIONS http://arxiv.org/abs/2106.04464v2
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization http://arxiv.org/abs/2108.04692v1
AUTOMATIC DJ TRANSITIONS WITH DIFFERENTIABLE AUDIO EFFECTS AND GENERATIVE ADVERSARIAL NETWORKS http://arxiv.org/abs/2110.06525v2
BLIND EXTRACTION OF EQUITABLE PARTITIONS FROM GRAPH SIGNALS http://arxiv.org/abs/2203.05407v1
BLIND REVERBERATION TIME ESTIMATION IN DYNAMIC ACOUSTIC CONDITIONS http://arxiv.org/abs/2202.11790v1
BLOOM-NET: BLOCKWISE OPTIMIZATION FOR MASKING NETWORKS TOWARD SCALABLE AND EFFICIENT SPEECH ENHANCEMENT http://arxiv.org/abs/2111.09372v2
BONA FIDE RIESZ PROJECTIONS FOR DENSITY ESTIMATION http://arxiv.org/abs/2204.13606v1
BUILDING ROBUST SPOKEN LANGUAGE UNDERSTANDING BY CROSS ATTENTION BETWEEN PHONEME SEQUENCE AND ASR HYPOTHESIS http://arxiv.org/abs/2203.12067v1
Camera Calibration through Camera Projection Loss http://arxiv.org/abs/2110.03479v3
CAN AUDIO CAPTIONS BE EVALUATED WITH IMAGE CAPTION METRICS? http://arxiv.org/abs/2110.04684v2
CAPITALIZATION NORMALIZATION FOR LANGUAGE MODELING WITH AN ACCURATE AND EFFICIENT HIERARCHICAL RNN MODEL http://arxiv.org/abs/2202.08171v1
CASCADING BANDIT UNDER DIFFERENTIAL PRIVACY http://arxiv.org/abs/2105.11126v2
CAUSAL LINEAR TOPOLOGICAL FILTERS OVER A 2-SIMPLEX http://arxiv.org/abs/2110.02362v1
CLIMATE AND WEATHER: INSPECTING DEPRESSION DETECTION VIA EMOTION RECOGNITION http://arxiv.org/abs/2204.14099v2
Cloning one's voice using very limited data in the wild http://arxiv.org/abs/2110.03347v2
CLSEG: Contrastive Learning of Story Ending Generation http://arxiv.org/abs/2202.09049v1
COGNITIVE CODING OF SPEECH http://arxiv.org/abs/2110.04241v1
COLLABORATIVE OBJECT DETECTORS ADAPTIVE TO BANDWIDTH AND COMPUTATION http://arxiv.org/abs/2105.00591v2
Compressive Scanning Transmission Electron Microscopy http://arxiv.org/abs/2112.11955v1
CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT http://arxiv.org/abs/2202.05256v1
Considering user agreement in learning to predict the aesthetic quality http://arxiv.org/abs/2110.06956v1
CONTEXT MODELING WITH EVIDENCE FILTER FOR MULTIPLE CHOICE QUESTION ANSWERING http://arxiv.org/abs/2010.02649v1
CONTEXTUAL ADAPTERS FOR PERSONALIZED SPEECH RECOGNITION IN NEURAL TRANSDUCERS http://arxiv.org/abs/2205.13660v1
CONTINUOUS SPEECH SEPARATION WITH RECURRENT SELECTIVE ATTENTION NETWORK http://arxiv.org/abs/2110.14838v1
CONTRASTIVE PREDICTION STRATEGIES FOR UNSUPERVISED SEGMENTATION AND CATEGORIZATION OF PHONEMES AND WORDS http://arxiv.org/abs/2110.15909v2
CONVOLUTIONAL FILTERING IN SIMPLICIAL COMPLEXES http://arxiv.org/abs/2201.12584v1
CSI CLUSTERING WITH VARIATIONAL AUTOENCODING http://arxiv.org/abs/2111.09758v2
Data Agnostic Filter Gating for Efficient Deep Networks http://arxiv.org/abs/2010.15041v1
DEEP DETERMINISTIC INDEPENDENT COMPONENT ANALYSIS FOR HYPERSPECTRAL UNMIXING http://arxiv.org/abs/2202.02951v2
DEEP HASHING WITH HASH CENTER UPDATE FOR EFFICIENT IMAGE RETRIEVAL http://arxiv.org/abs/2106.06269v1
DEEP IMPULSE RESPONSES: ESTIMATING AND PARAMETERIZING FILTERS WITH DEEP NETWORKS http://arxiv.org/abs/2202.03416v1
DEEP ITERATIVE PHASE RETRIEVAL FOR PTYCHOGRAPHY http://arxiv.org/abs/2202.10573v1
DEEP LEARNING FOR LOCATION BASED BEAMFORMING WITH NLOS CHANNELS http://arxiv.org/abs/2201.01386v1
DEEP LEARNING FOR PROMINENCE DETECTION IN CHILDREN'S READ SPEECH http://arxiv.org/abs/2110.14273v1
Deformable VisTR: Spatio temporal deformable attention for video instance segmentation http://arxiv.org/abs/2203.06318v1
DEMON: IMPROVED NEURAL NETWORK TRAINING WITH MOMENTUM DECAY http://arxiv.org/abs/1910.04952v4
DEPTH PRUNING WITH AUXILIARY NETWORKS FOR TINYML http://arxiv.org/abs/2204.10546v1
DETECTING BACKDOOR ATTACKS AGAINST POINT CLOUD CLASSIFIERS http://arxiv.org/abs/2110.10354v1
DICTIONARY LEARNING WITH UNIFORM SPARSE REPRESENTATIONS FOR ANOMALY DETECTION http://arxiv.org/abs/2201.03869v1
DIFFERENTIABLE DIGITAL SIGNAL PROCESSING MIXTURE MODEL FOR SYNTHESIS PARAMETER EXTRACTION FROM MIXTURE OF HARMONIC SOUNDS http://arxiv.org/abs/2202.00200v1
DIFFERENTIABLE PROGRAMMING A LA MOREAU http://arxiv.org/abs/2012.15458v1
DIFFERENTIABLE WAVETABLE SYNTHESIS http://arxiv.org/abs/2111.10003v4
DIGRAPH SIGNAL PROCESSING WITH GENERALIZED BOUNDARY CONDITIONS http://arxiv.org/abs/2005.09762v3
DIRECT DESIGN OF BIQUAD FILTER CASCADES WITH DEEP LEARNING BY SAMPLING RANDOM POLYNOMIALS http://arxiv.org/abs/2110.03691v2
DISTRIBUTED GRAPH LEARNING WITH SMOOTH DATA PRIORS http://arxiv.org/abs/2112.05887v1
DISTRIBUTED LINK SPARSIFICATION FOR SCALABLE SCHEDULING USING GRAPH NEURAL NETWORKS http://arxiv.org/abs/2203.14339v1
DIVERSE AUDIO CAPTIONING VIA ADVERSARIAL TRAINING http://arxiv.org/abs/2110.06691v2
DO YOU LIVE A HEALTHY LIFE? ANALYZING LIFESTYLE BY VISUAL LIFE LOGGING http://arxiv.org/abs/2011.12102v1
DYNAMICALLY PRUNING SEGFORMER FOR EFFICIENT SEMANTIC SEGMENTATION http://arxiv.org/abs/2111.09499v1
ECONOMICS OF SEMANTIC COMMUNICATION SYSTEM IN WIRELESS POWERED INTERNET OF THINGS http://arxiv.org/abs/2110.01423v1
EFFECT OF NOISE SUPPRESSION LOSSES ON SPEECH DISTORTION AND ASR PERFORMANCE http://arxiv.org/abs/2111.11606v1
EFFICIENT SEQUENCE TRAINING OF ATTENTION MODELS USING APPROXIMATIVE RECOMBINATION http://arxiv.org/abs/2110.09245v2
EFFICIENT UNIVERSAL SHUFFLE ATTACK FOR VISUAL OBJECT TRACKING http://arxiv.org/abs/2203.06898v1
EFFICIENTLY AND GLOBALLY SOLVING JOINT BEAMFORMING AND COMPRESSION PROBLEM IN THE COOPERATIVE CELLULAR NETWORK VIA LAGRANGIAN DUALITY http://arxiv.org/abs/2110.05085v2
EMGSE: ACOUSTIC/EMG FUSION FOR MULTIMODAL SPEECH ENHANCEMENT http://arxiv.org/abs/2202.06507v1
ENHANCING AND DISSECTING CROWD COUNTING BY SYNTHETIC DATA http://arxiv.org/abs/2201.08992v1
ENHANCING UTILITY IN THE WATCHDOG PRIVACY MECHANISM http://arxiv.org/abs/2110.04724v1
ENVIRONMENTAL SOUND EXTRACTION USING ONOMATOPOEIC WORDS http://arxiv.org/abs/2112.00209v4
EPIGRAPHICAL RELAXATION FOR MINIMIZING LAYERED MIXED NORMS http://arxiv.org/abs/2008.04565v2
ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET http://arxiv.org/abs/2111.14706v2
ESTIMATING THE CONFIDENCE OF SPEECH SPOOFING COUNTERMEASURE http://arxiv.org/abs/2110.04775v2
EVALUATION OF ORTHOGONAL CHIRP DIVISION MULTIPLEXING FOR AUTOMOTIVE INTEGRATED SENSING AND COMMUNICATIONS http://arxiv.org/abs/2111.04975v1
EVALUATION OF VIDEO CODING FOR MACHINES WITHOUT GROUND TRUTH http://arxiv.org/abs/2205.06519v1
EXPLAINING DEEP LEARNING MODELS FOR SPOOFING AND DEEPFAKE DETECTION WITH SHAPLEY ADDITIVE EXPLANATIONS http://arxiv.org/abs/2110.03309v1
Exploiting Language Model for Efficient Linguistic Steganalysis http://arxiv.org/abs/2107.12168v3
EXPLORING HETEROGENEOUS CHARACTERISTICS OF LAYERS IN ASR MODELS FOR MORE EFFICIENT TRAINING http://arxiv.org/abs/2110.04267v2
Exploring Machine Speech Chain for Domain Adaptation http://arxiv.org/abs/2104.03815v1
Factorized Neural Transducer for Efficient Language Model Adaptation http://arxiv.org/abs/2110.01500v5
Fast Graph Filters for Decentralized Subspace Projection http://arxiv.org/abs/2011.07579v1
FAST GRAPH SAMPLING FOR SHORT VIDEO SUMMARIZATION USING GERSHGORIN DISC ALIGNMENT http://arxiv.org/abs/2110.11420v2
FAST MULTISCALE DIFFUSION ON GRAPHS http://arxiv.org/abs/2104.14652v1
FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR http://arxiv.org/abs/2110.04057v2
Feature Imitating Networks http://arxiv.org/abs/2110.04831v2
Federated Learning Challenges and Opportunities: An Outlook http://arxiv.org/abs/2202.00807v1
FilterAugment: An Acoustic Environmental Data Augmentation Method http://arxiv.org/abs/2110.03282v4
FLDP: Flexible strategy for local differential privacy http://arxiv.org/abs/2203.14875v1
FORENSIC ANALYSIS AND LOCALIZATION OF MULTIPLY COMPRESSED MP3 AUDIO USING TRANSFORMERS http://arxiv.org/abs/2203.16499v2
FrAUG: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals http://arxiv.org/abs/2202.05912v1
FUSING ASR OUTPUTS IN JOINT TRAINING FOR SPEECH EMOTION RECOGNITION http://arxiv.org/abs/2110.15684v2
GENERALIZATION ABILITY OF MOS PREDICTION NETWORKS http://arxiv.org/abs/2110.02635v3
GENERALIZED TIME DOMAIN VELOCITY VECTOR http://arxiv.org/abs/2110.06304v4
GENERALIZING AUC OPTIMIZATION TO MULTICLASS CLASSIFICATION FOR AUDIO SEGMENTATION WITH LIMITED TRAINING DATA http://arxiv.org/abs/2110.14425v1
GENERATING DISENTANGLED ARGUMENTS WITH PROMPTS: A SIMPLE EVENT EXTRACTION FRAMEWORK THAT WORKS http://arxiv.org/abs/2110.04525v2
GRAPH SIGNAL PROCESSING: VERTEX MULTIPLICATION http://arxiv.org/abs/2007.04723v1
Hand Gesture Recognition Using Temporal Convolutions and Attention Mechanism http://arxiv.org/abs/2110.08717v1
HGCN: HARMONIC GATED COMPENSATION NETWORK FOR SPEECH ENHANCEMENT http://arxiv.org/abs/2201.12755v2
HISTOKT: CROSS KNOWLEDGE TRANSFER IN COMPUTATIONAL PATHOLOGY http://arxiv.org/abs/2201.11246v1
HODGELETS: LOCALIZED SPECTRAL REPRESENTATIONS OF FLOWS ON SIMPLICIAL COMPLEXES http://arxiv.org/abs/2109.08728v1
HOW CAN A COGNITIVE RADAR MASK ITS COGNITION? http://arxiv.org/abs/2110.08608v1
HOW NEURAL PROCESSES IMPROVE GRAPH LINK PREDICTION http://arxiv.org/abs/2109.14894v1
Identification of Edge Disconnections in Networks Based on Graph Filter Outputs http://arxiv.org/abs/2102.06428v2
IMPORTANTAUG: A DATA AUGMENTATION AGENT FOR SPEECH http://arxiv.org/abs/2112.07156v2
IMPROVED META LEARNING FOR LOW RESOURCE SPEECH RECOGNITION http://arxiv.org/abs/2205.06182v1
IMPROVING ADVERSARIAL WAVEFORM GENERATION BASED SINGING VOICE CONVERSION WITH HARMONIC SIGNALS http://arxiv.org/abs/2201.10130v1
IMPROVING BIRD CLASSIFICATION WITH UNSUPERVISED SOUND SEPARATION http://arxiv.org/abs/2110.03209v1
Improving Contextual Coherence in Variational Personalized and Empathetic Dialogue Agents http://arxiv.org/abs/2202.05971v1
Improving Factored Hybrid HMM Acoustic Modeling without State Tying http://arxiv.org/abs/2201.09692v1
IMPROVING FEATURE GENERALIZABILITY WITH MULTITASK LEARNING IN CLASS INCREMENTAL LEARNING http://arxiv.org/abs/2204.12915v1
IMPROVING LYRICS ALIGNMENT THROUGH JOINT PITCH DETECTION http://arxiv.org/abs/2202.01646v1
Improving Maximum Likelihood Difference Scaling method to measure inter content scale http://arxiv.org/abs/2203.13186v1
IMPROVING NOISE ROBUSTNESS OF CONTRASTIVE SPEECH REPRESENTATION LEARNING WITH SPEECH RECONSTRUCTION http://arxiv.org/abs/2110.15430v1
IMPROVING SOURCE SEPARATION BY EXPLICITLY MODELING DEPENDENCIES BETWEEN SOURCES http://arxiv.org/abs/2203.15140v1
IMPROVING THE FUSION OF ACOUSTIC AND TEXT REPRESENTATIONS IN RNN-T http://arxiv.org/abs/2201.10240v1
Increasing Loudness in Audio Signals: a perceptually motivated approach to preserve audio quality http://arxiv.org/abs/2202.08183v1
INCREMENTAL USER EMBEDDING MODELING FOR PERSONALIZED TEXT CLASSIFICATION http://arxiv.org/abs/2202.06369v1
INFERGRAD: IMPROVING DIFFUSION MODELS FOR VOCODER BY CONSIDERING INFERENCE IN TRAINING http://arxiv.org/abs/2202.03751v1
INTEGRATING TEXT INPUTS FOR TRAINING AND ADAPTING RNN TRANSDUCER ASR MODELS http://arxiv.org/abs/2202.13155v1
INTERPRETING INTERMEDIATE CONVOLUTIONAL LAYERS IN UNSUPERVISED ACOUSTIC WORD CLASSIFICATION http://arxiv.org/abs/2110.02375v2
ISOMETRIC MT: NEURAL MACHINE TRANSLATION FOR AUTOMATIC DUBBING http://arxiv.org/abs/2112.08682v3
ITOWAVE: ITO STOCHASTIC DIFFERENTIAL EQUATION IS ALL YOU NEED FOR WAVE GENERATION http://arxiv.org/abs/2201.12519v2
Joint calibration and mapping of satellite altimetry data using trainable variational models http://arxiv.org/abs/2110.03405v1
JOINT INFERENCE OF MULTIPLE GRAPHS WITH HIDDEN VARIABLES FROM STATIONARY GRAPH SIGNALS http://arxiv.org/abs/2110.03666v2
JOINT LEARNING OF FEATURE EXTRACTION AND COST AGGREGATION FOR SEMANTIC CORRESPONDENCE http://arxiv.org/abs/2204.02164v2
JOINT SPEECH RECOGNITION AND AUDIO CAPTIONING http://arxiv.org/abs/2202.01405v1
JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR MULTILINGUAL ASR http://arxiv.org/abs/2111.08137v1
LABEL PROPAGATION ACROSS GRAPHS: NODE CLASSIFICATION USING GRAPH NEURAL TANGENT KERNELS http://arxiv.org/abs/2110.03763v1
LDNET: UNIFIED LISTENER DEPENDENT MODELING IN MOS PREDICTION FOR SYNTHETIC SPEECH http://arxiv.org/abs/2110.09103v1
Learnable Hypergraph Laplacian for Hypergraph Learning http://arxiv.org/abs/2106.05701v1
LEARNABLE NONLINEAR COMPRESSION FOR ROBUST SPEAKER VERIFICATION http://arxiv.org/abs/2202.05236v1
LEARNING CONTINUOUS REPRESENTATION OF AUDIO FOR ARBITRARY SCALE SUPER RESOLUTION http://arxiv.org/abs/2111.00195v2
LEARNING DECOUPLING FEATURES THROUGH ORTHOGONALITY REGULARIZATION http://arxiv.org/abs/2203.16772v1
Learning Expanding Graphs for Signal Interpolation http://arxiv.org/abs/2203.07966v1
LEARNING MUSIC AUDIO REPRESENTATIONS VIA WEAK LANGUAGE SUPERVISION http://arxiv.org/abs/2112.04214v2
LEARNING SOUND LOCALIZATION BETTER FROM SEMANTICALLY SIMILAR SAMPLES http://arxiv.org/abs/2202.03007v1
LEARNING TO INTEGRATE VISION DATA INTO ROAD NETWORK DATA http://arxiv.org/abs/2112.10624v2
LEARNINGS FROM FEDERATED LEARNING IN THE REAL WORLD http://arxiv.org/abs/2202.03925v1
LEVERAGING LOCAL TEMPORAL INFORMATION FOR MULTIMODAL SCENE CLASSIFICATION http://arxiv.org/abs/2110.13992v1
Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition http://arxiv.org/abs/2110.03435v1
LiteHAR: LIGHTWEIGHT HUMAN ACTIVITY RECOGNITION FROM WIFI SIGNALS WITH RANDOM CONVOLUTION KERNELS http://arxiv.org/abs/2201.09310v1
LOCALIZATION BASED SEQUENTIAL GROUPING FOR CONTINUOUS SPEECH SEPARATION http://arxiv.org/abs/2107.06853v1
LOCUNET: FAST URBAN POSITIONING USING RADIO MAPS AND DEEP LEARNING http://arxiv.org/abs/2202.00738v2
LOW COMPLEXITY EQUALIZATION FOR AFDM IN DOUBLY DISPERSIVE CHANNELS http://arxiv.org/abs/2203.01875v2
L-SpEx: Localized Target Speaker Extraction http://arxiv.org/abs/2202.09995v1
MASKED ACOUSTIC UNIT FOR MISPRONUNCIATION DETECTION AND CORRECTION http://arxiv.org/abs/2108.05517v2
Matching Point Sets with Quantum Circuit Learning http://arxiv.org/abs/2102.06697v2
MAXIMIZING AUDIO EVENT DETECTION MODEL PERFORMANCE ON SMALL DATASETS THROUGH KNOWLEDGE TRANSFER, DATA AUGMENTATION, AND PRETRAINING: AN ABLATION STUDY http://arxiv.org/abs/2202.03514v1
METRICGAN-U: UNSUPERVISED SPEECH ENHANCEMENT/ DEREVERBERATION BASED ONLY ON NOISY/ REVERBERATED SPEECH http://arxiv.org/abs/2110.05866v1
MIXED TRANSFORMER U-NET FOR MEDICAL IMAGE SEGMENTATION http://arxiv.org/abs/2111.04734v2
MIXTURE MODEL AUTO-ENCODERS: DEEP CLUSTERING THROUGH DICTIONARY LEARNING http://arxiv.org/abs/2110.04683v2
MM-DFN: Multimodal Dynamic Fusion Network For Emotion Recognition in Conversations http://arxiv.org/abs/2203.02385v1
Modeling Intention, Emotion and External World in Dialogue Systems http://arxiv.org/abs/2202.06476v1
MULTI-ACCDOA: LOCALIZING AND DETECTING OVERLAPPING SOUNDS FROM THE SAME CLASS WITH AUXILIARY DUPLICATING PERMUTATION INVARIANT TRAINING http://arxiv.org/abs/2110.07124v2
Multichannel Speech Enhancement without Beamforming http://arxiv.org/abs/2110.13130v2
MULTIMODAL DEPRESSION CLASSIFICATION USING ARTICULATORY COORDINATION FEATURES AND HIERARCHICAL ATTENTION BASED TEXT EMBEDDINGS http://arxiv.org/abs/2202.06238v1
MULTIPLE OFFSETS MULTILATERATION: A NEW PARADIGM FOR SENSOR NETWORK CALIBRATION WITH UNSYNCHRONIZED REFERENCE NODES http://arxiv.org/abs/2205.11299v1
MULTISCALE CROWD COUNTING AND LOCALIZATION BY MULTITASK POINT SUPERVISION http://arxiv.org/abs/2202.09942v1
Multitask Gaussian Process with Hierarchical Latent Interactions http://arxiv.org/abs/1808.01132v7
MUSIC ENHANCEMENT VIA IMAGE TRANSLATION AND VOCODING http://arxiv.org/abs/2204.13289v1
MUSIC SOURCE SEPARATION WITH DEEP EQUILIBRIUM MODELS http://arxiv.org/abs/2110.06494v2
NEAREST SUBSPACE SEARCH IN THE SIGNED CUMULATIVE DISTRIBUTION TRANSFORM SPACE FOR 1D SIGNAL CLASSIFICATION http://arxiv.org/abs/2110.05606v2
Neural Architecture Search for Speech Emotion Recognition http://arxiv.org/abs/2203.16928v1
Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet http://arxiv.org/abs/2202.11169v1
No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds http://arxiv.org/abs/2203.02502v1
NONVERBAL SOUND DETECTION FOR DISORDERED SPEECH http://arxiv.org/abs/2202.07750v1
ON DATA AUGMENTATION FOR GAN TRAINING http://arxiv.org/abs/2006.05338v3
ON FEDERATED LEARNING WITH ENERGY HARVESTING CLIENTS http://arxiv.org/abs/2202.06105v1
ON IDENTIFIABLE POLYTOPE CHARACTERIZATION FOR POLYTOPIC MATRIX FACTORIZATION http://arxiv.org/abs/2204.11534v1
On Language Model Integration for RNN Transducer based Speech Recognition http://arxiv.org/abs/2110.06841v2
ON LOSS FUNCTIONS AND EVALUATION METRICS FOR MUSIC SOURCE SEPARATION http://arxiv.org/abs/2202.07968v1
ON STABILITY AND CONVERGENCE OF DISTRIBUTED FILTERS http://arxiv.org/abs/2102.11250v1
ON THE ACQUISITION OF STATIONARY SIGNALS USING UNIFORM ADCS http://arxiv.org/abs/2202.05143v2
ON THE INTERPLAY BETWEEN SPARSITY, NATURALNESS, INTELLIGIBILITY, AND PROSODY IN SPEECH SYNTHESIS http://arxiv.org/abs/2110.01147v2
ON THE STABILITY OF LOW PASS GRAPH FILTER WITH A LARGE NUMBER OF EDGE REWIRES http://arxiv.org/abs/2110.07234v1
ONE TTS ALIGNMENT TO RULE THEM ALL http://arxiv.org/abs/2108.10447v1
OPTIMIZING THE CONSUMPTION OF SPIKING NEURAL NETWORKS WITH ACTIVITY REGULARIZATION http://arxiv.org/abs/2204.01460v1
PARAMETRIC MODELS FOR DOA TRAJECTORY LOCALIZATION http://arxiv.org/abs/2204.09647v1
PEER COLLABORATIVE LEARNING FOR POLYPHONIC SOUND EVENT DETECTION http://arxiv.org/abs/2110.03511v1
PERSONALIZED AUTOMATIC SPEECH RECOGNITION TRAINED ON SMALL DISORDERED SPEECH DATASETS http://arxiv.org/abs/2110.04612v1
Personalized PageRank Graph Attention Networks http://arxiv.org/abs/2205.14259v1
PERSONALIZED SPEECH ENHANCEMENT: NEW MODELS AND COMPREHENSIVE EVALUATION http://arxiv.org/abs/2110.09625v1
PHASE CONTINUITY: LEARNING DERIVATIVES OF PHASE SPECTRUM FOR SPEECH ENHANCEMENT http://arxiv.org/abs/2202.11918v1
PHONOLOGY RECOGNITION IN AMERICAN SIGN LANGUAGE http://arxiv.org/abs/2110.00453v1
PIXINWAV: RESIDUAL STEGANOGRAPHY FOR HIDING PIXELS IN AUDIO http://arxiv.org/abs/2106.09814v1
POPO: PESSIMISTIC OFFLINE POLICY OPTIMIZATION http://arxiv.org/abs/2012.13682v2
Power allocation for wireless federated learning using graph neural networks http://arxiv.org/abs/2111.07480v2
PRIVACY ATTACKS FOR AUTOMATIC SPEECH RECOGNITION ACOUSTIC MODELS IN A FEDERATED LEARNING FRAMEWORK http://arxiv.org/abs/2111.03777v2
PRIVACY SENSITIVE SPEECH ANALYSIS USING FEDERATED LEARNING TO ASSESS DEPRESSION http://arxiv.org/abs/2205.00111v2
PROGRESSIVE CONTINUAL LEARNING FOR SPOKEN KEYWORD SPOTTING http://arxiv.org/abs/2201.12546v2
PROTOTYPE LEARNING FOR INTERPRETABLE RESPIRATORY SOUND ANALYSIS http://arxiv.org/abs/2110.03536v4
PSEUDO STRONG LABELS FOR LARGE SCALE WEAKLY SUPERVISED AUDIO TAGGING http://arxiv.org/abs/2204.13430v1
PSLA: IMPROVING AUDIO TAGGING WITH PRETRAINING, SAMPLING, LABELING, AND AGGREGATION http://arxiv.org/abs/2102.01243v3
QUANTUM FEDERATED LEARNING WITH QUANTUM DATA http://arxiv.org/abs/2106.00005v1
RADAR TARGET DETECTION AIDED BY RECONFIGURABLE INTELLIGENT SURFACES http://arxiv.org/abs/2104.00768v3
REAL ADDITIVE MARGIN SOFTMAX FOR SPEAKER VERIFICATION http://arxiv.org/abs/2110.09116v1
REAL-M: TOWARDS SPEECH SEPARATION ON REAL MIXTURES http://arxiv.org/abs/2110.10812v1
RECOVERY OF GRAPH SIGNALS FROM SIGN MEASUREMENTS http://arxiv.org/abs/2109.12576v1
Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure http://arxiv.org/abs/2204.12112v1
RESCOREBERT: DISCRIMINATIVE SPEECH RECOGNITION RESCORING WITH BERT http://arxiv.org/abs/2202.01094v3
RESIDUAL RECOVERY ALGORITHM FOR MODULO SAMPLING http://arxiv.org/abs/2110.03335v1
RETRIEVING SPEAKER INFORMATION FROM PERSONALIZED ACOUSTIC MODELS FOR SPEECH RECOGNITION http://arxiv.org/abs/2111.04194v1
R-G2P: EVALUATING AND ENHANCING ROBUSTNESS OF GRAPHEME TO PHONEME CONVERSION BY CONTROLLED NOISE INTRODUCING AND CONTEXTUAL INFORMATION INCORPORATION http://arxiv.org/abs/2202.11194v1
ROBUST CLASSIFICATION WITH FLEXIBLE DISCRIMINANT ANALYSIS IN HETEROGENEOUS DATA http://arxiv.org/abs/2201.02967v1
RTSNET: DEEP LEARNING AIDED KALMAN SMOOTHING http://arxiv.org/abs/2110.04717v2
SAFEGUARDING UAV NETWORKS THROUGH INTEGRATED SENSING, JAMMING, AND COMMUNICATIONS http://arxiv.org/abs/2110.04733v1
SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays http://arxiv.org/abs/2111.08192v2
SA-SDR: A NOVEL LOSS FUNCTION FOR SEPARATION OF MEETING STYLE DATA http://arxiv.org/abs/2110.15581v2
Scattering Statistics of Generalized Spatial Poisson Point Processes http://arxiv.org/abs/1902.03537v2
SCORE DIFFICULTY ANALYSIS FOR PIANO PERFORMANCE EDUCATION BASED ON FINGERING http://arxiv.org/abs/2203.13010v1
S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement http://arxiv.org/abs/2111.08387v1
SEED: SOUND EVENT EARLY DETECTION VIA EVIDENTIAL UNCERTAINTY http://arxiv.org/abs/2202.02441v2
SIGNAL PROCESSING ON CELL COMPLEXES http://arxiv.org/abs/2110.05614v2
Simple Attention Module based Speaker Verification with Iterative noisy label detection http://arxiv.org/abs/2110.06534v1
SIMPLICIAL CONVOLUTIONAL NEURAL NETWORKS http://arxiv.org/abs/2110.02585v1
SKETCHED RT3D: HOW TO RECONSTRUCT BILLIONS OF PHOTONS PER SECOND http://arxiv.org/abs/2203.00952v1
SLUE: NEW BENCHMARK TASKS FOR SPOKEN LANGUAGE UNDERSTANDING EVALUATION ON NATURAL SPEECH http://arxiv.org/abs/2111.10367v2
SOUND EVENT DETECTION GUIDED BY SEMANTIC CONTEXTS OF SCENES http://arxiv.org/abs/2110.03243v3
SOUND EVENT DETECTION: A TUTORIAL http://arxiv.org/abs/2107.05463v1
SOURCE MIXING AND SEPARATION ROBUST AUDIO STEGANOGRAPHY http://arxiv.org/abs/2110.05054v2
SOURCE SEPARATION BY STEERING PRETRAINED MUSIC MODELS http://arxiv.org/abs/2110.13071v1
SPATIAL ACTIVE NOISE CONTROL BASED ON INDIVIDUAL KERNEL INTERPOLATION OF PRIMARY AND SECONDARY SOUND FIELDS http://arxiv.org/abs/2202.04807v1
SPATIAL DATA AUGMENTATION WITH SIMULATED ROOM IMPULSE RESPONSES FOR SOUND EVENT LOCALIZATION AND DETECTION http://arxiv.org/abs/2110.06501v2
SPATIAL MIXUP: DIRECTIONAL LOUDNESS MODIFICATION AS DATA AUGMENTATION FOR SOUND EVENT LOCALIZATION AND DETECTION http://arxiv.org/abs/2110.06126v1
SPEAKER GENERATION http://arxiv.org/abs/2111.05095v1
SPEAKER IDENTITY PRESERVATION IN DYSARTHRIC SPEECH RECONSTRUCTION BY ADVERSARIAL SPEAKER ADAPTATION http://arxiv.org/abs/2202.09082v1
SPEAKER REINFORCEMENT USING TARGET SOURCE EXTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION http://arxiv.org/abs/2205.04433v1
SPEECH TASKS RELEVANT TO SLEEPINESS DETERMINED WITH DEEP TRANSFER LEARNING http://arxiv.org/abs/2111.14684v1
SPELL MY NAME: KEYWORD BOOSTED SPEECH RECOGNITION http://arxiv.org/abs/2110.02791v1
STABILITY ANALYSIS OF UNFOLDED WMMSE FOR POWER ALLOCATION http://arxiv.org/abs/2110.07471v2
STABILITY OF NEURAL NETWORKS ON MANIFOLDS TO RELATIVE PERTURBATIONS http://arxiv.org/abs/2110.04702v1
STABLE AND TRANSFERABLE WIRELESS RESOURCE ALLOCATION POLICIES VIA MANIFOLD NEURAL NETWORKS http://arxiv.org/abs/2110.04706v1
STUDY OF POSITIONAL ENCODING APPROACHES FOR AUDIO SPECTROGRAM TRANSFORMERS http://arxiv.org/abs/2110.06999v1
Subjective and Objective Quality Assessment of Mobile Gaming Video http://arxiv.org/abs/2103.05099v1
Supervised Learning based Sparse Channel Estimation for RIS aided Communications http://arxiv.org/abs/2202.11997v1
TARGETDROP: A TARGETED REGULARIZATION METHOD FOR CONVOLUTIONAL NEURAL NETWORKS http://arxiv.org/abs/2010.10716v1
THE DAWN OF QUANTUM NATURAL LANGUAGE PROCESSING http://arxiv.org/abs/2110.06510v1
THE MIRRORNET : LEARNING AUDIO SYNTHESIZER CONTROLS INSPIRED BY SENSORIMOTOR INTERACTION http://arxiv.org/abs/2110.05695v4
Threshold Independent Evaluation of Sound Event Detection Scores http://arxiv.org/abs/2201.13148v1
T-NGA: TEMPORAL NETWORK GRAFTING ALGORITHM FOR LEARNING TO PROCESS SPIKING AUDIO SENSOR EVENTS http://arxiv.org/abs/2202.03204v1
TO CATCH A CHORUS, VERSE, INTRO, OR ANYTHING ELSE: ANALYZING A SONG WITH STRUCTURAL FUNCTIONS http://arxiv.org/abs/2205.14700v1
TORCHAUDIO: BUILDING BLOCKS FOR AUDIO AND SPEECH PROCESSING http://arxiv.org/abs/2110.15018v2
TOWARDS A COMMON SPEECH ANALYSIS ENGINE http://arxiv.org/abs/2203.00613v1
TOWARDS EXPRESSIVE SPEAKING STYLE MODELLING WITH HIERARCHICAL CONTEXT INFORMATION FOR MANDARIN SPEECH SYNTHESIS http://arxiv.org/abs/2203.12201v2
TOWARDS IDENTITY PRESERVING NORMAL TO DYSARTHRIC VOICE CONVERSION http://arxiv.org/abs/2110.08213v1
Towards Interpretability of Speech Pause in Dementia Detection using Adversarial Learning http://arxiv.org/abs/2111.07454v1
TOWARDS LEARNING UNIVERSAL AUDIO REPRESENTATIONS http://arxiv.org/abs/2111.12124v2
TOWARDS MEASURING FAIRNESS IN SPEECH RECOGNITION: CASUAL CONVERSATIONS DATASET TRANSCRIPTIONS http://arxiv.org/abs/2111.09983v1
TOWARDS REDUCING THE NEED FOR SPEECH TRAINING DATA TO BUILD SPOKEN LANGUAGE UNDERSTANDING SYSTEMS http://arxiv.org/abs/2203.00006v1
TOWARDS SPEAKER AGE ESTIMATION WITH LABEL DISTRIBUTION LEARNING http://arxiv.org/abs/2202.11424v1
TRAINING STABLE GRAPH NEURAL NETWORKS THROUGH CONSTRAINED LEARNING http://arxiv.org/abs/2110.03576v2
UNDERWATER IMAGE ENHANCEMENT VIA LEARNING WATER TYPE DESENSITIZED REPRESENTATIONS http://arxiv.org/abs/2102.00676v2
UNROLLING PARTICLES: UNSUPERVISED LEARNING OF SAMPLING DISTRIBUTIONS http://arxiv.org/abs/2110.02915v1
UNSUPERVISED SPEECH ENHANCEMENT WITH SPEECH RECOGNITION EMBEDDING AND DISENTANGLEMENT LOSSES http://arxiv.org/abs/2111.08678v2
Upmixing via style transfer: a variational autoencoder for disentangling spatial images and musical content http://arxiv.org/abs/2203.12053v1
USING MULTIPLE REFERENCE AUDIOS AND STYLE EMBEDDING CONSTRAINTS FOR SPEECH SYNTHESIS http://arxiv.org/abs/2110.04451v1
VISION TRANSFORMER EQUIPPED WITH NEURAL RESIZER ON FACIAL EXPRESSION RECOGNITION TASK http://arxiv.org/abs/2204.02181v1
VOCALSOUND: A DATASET FOR IMPROVING HUMAN VOCAL SOUNDS RECOGNITION http://arxiv.org/abs/2205.03433v1
VOCBENCH: A NEURAL VOCODER BENCHMARK FOR SPEECH SYNTHESIS http://arxiv.org/abs/2112.03099v1
VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK http://arxiv.org/abs/2102.02599v2
VU-BERT: A UNIFIED FRAMEWORK FOR VISUAL DIALOG http://arxiv.org/abs/2202.10787v1
WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP http://arxiv.org/abs/2110.11499v2
WAVEBENDER GAN: AN ARCHITECTURE FOR PHONETICALLY MEANINGFUL SPEECH MANIPULATION http://arxiv.org/abs/2202.10973v1
WEARABLE SELD DATASET: DATASET FOR SOUND EVENT LOCALIZATION AND DETECTION USING WEARABLE DEVICES AROUND HEAD http://arxiv.org/abs/2202.08458v1
When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing http://arxiv.org/abs/2203.03550v1
Win the Lottery Ticket via Fourier Analysis: Frequencies Guided Network Pruning http://arxiv.org/abs/2201.12712v1
WORD ORDER DOES NOT MATTER FOR SPEECH RECOGNITION http://arxiv.org/abs/2110.05994v2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment