ElkarOla
Hizkuntza- eta hizketa-teknologiak lurralde adimendun, industrial, inklusibo eta eleaniztun baten zerbitzura
Gehiago jakin

Zer den

Zer da ElkarOla proiektua?

ElkarOla bi urteko (2015-2016) ikerketa estrategikoko proiektu bat da, hikuntza- eta hizketa-teknologien alorrekoa. Elhuyar Fundazioak, EHUko Ixa eta Aholab ikerketa-taldeek, Vicomtech-IK4 teknologia-zentroak eta Tecnalia Research & Innovation fundazioak osatzen dute proiektu hori gauzatzeko partzuergoa. Eusko Jaurlaritzako Ekonomiaren Garapen eta Azpiegitura sailak finantzatzen du ElkarOla proiektuaren aurrekontuaren zati bat, eta Kultura eta Hizkuntza Politika sailaren sostengua dauka.

Hizkuntza- eta hizketa-teknologiak dira ElkarOlak lantzen duen alorra. Itzulpengintzarako tresnak (itzulpen automatikoa, itzulpen-memoriak...), informazioaren kudeaketa (bilatzaileak, informazioa-erauzketa, sentimenduen analisia...), hizkuntza-baliabideak (hiztegiak, corpusak, zuzentzaileak...) eta hizketarentzako tresnak (hizketaren ezagutza, hizketaren sorkuntza...) dira teknologia hauen adibideetako batzuk. Proiektuak bereziki euskararako lantzen ditu teknologiok, baina baita bertako eta inguruetako beste hizkuntza batzuentzat ere.

Nagusiki ikerketara bideratua dagoen arren, ElkarOlan oinarrizko ikerketaz gain ikerketa aplikatua ere egiten da, transferentzia teknologikoari garrantzia ematen zaio eta tresnak gizarteratu eta merkaturatzera ere iristen da. Lantzen diren aplikazio alorrak aunitz dira, eta RIS3ko arloetan jartzen da enfasia: Fabrikazio Aurreratua, Biozientziak eta Osasuna eta Lurraldea (hizkuntzen industria, turismoa, kultura, hezkuntza...).

ElkarOlaren beste helburuetako bat da jakintza eta masa kritiko kualifikatu bat sortzea, ildo estrategiko honetako I+G+Baren etorkizunari era autonomoan aurre egin ahal izateko. Horretarako, prestakuntza-ikastaroak, graduondoko masterrak, mintegi espezializatuak eta doktorego-ikastaroak antolatzen dira hizkuntza- eta hizketa-teknologien alorrean. Gainera, proiektuaren helburuak lortzeko, aplikazio-alorretako enpresa askorekin lehentasunezko harremanak izateaz gain, beharrezko masa kritikoa eta nazioarteko ospea duen zientzialari- eta ikertzaile-komunitate bat dute proiektua osatzen duten kideek.

Partzuergoa

Nork osatzen dute ElkarOla proiektuaren partzuergoa?

Elhuyar Fundazioa

IXA Taldea

Euskal Herriko Unibertsitatea

Aholab Taldea

Euskal Herriko Unibertsitatea

Vicomtech-IK4

Teknologia-zentroa

Tecnalia Research & Innovation

Aurrekariak

ElkarOla partzuergoaren aurreko proiektuak

ElkarOla proiektua ez da partzuergoaren lehena, hizkuntza- eta hizketa-teknologiei buruzko ikerketa estrategikoari dagokionez. Proiektu honen aurretik, beste lau egin dira: Hizking21 (2002-2004), hizkuntza- eta ahots-teknologiei dagokienez euskara beste hizkuntza batzuen parean jartzeko eginkizuna zuena; AnHitz (2006-2008), ingurune-adimena ikuspegitzat hartuta aipatutako teknologietan ikertzea xede zuena; BerbaTek (2009-2011), hizkuntzen-industrian aplikazioa izateko prestatua; eta Ber2Tek (2012-2014), BerbaTek-ek bezala hizkuntzen industria ardatz zuena.

Hizking XXI

2002-2004

AnHitz

2006-2008

BerbaTek

2009-2011

Ber2Tek

2012-2014

Emaitzak

ElkarOla proiektutik atera diren emaitzetako batzuk

ElkarOla proiektuan, oinarrizko ikerketaz gain, garapena eta transferentzia ere egiten da, eta hainbat baliabide, tresna eta zerbitzu atera dira handik. Hona hemen adibide batzuk:

  • Talaia: Talaia tresnak webguneetan eta sare sozialetan gai, enpresa, ekitaldi edo produktu bati buruz hainbat hizkuntzatan esaten diren gauzen segimendua egiten du, eta, iritzi-erauzketako teknologiaz baliatuz, iritziak erakusten ditu polaritatearen arabera (positiboak, negatiboak, neutroak...). Esteka honetan, 2016ko EAEko hauteskundeetarako Berria egunkariaren webgunean zer jarri zen ikus daiteke.
  • Ondarebideak: Teknologia honek ondare digitala atzitu, erabili eta interpretatzeko modu berriak eskaintzen dizkio erabiltzaileari. Liburutegi, museo eta artxibategietako ondarea gero eta digitalizatuago dago, baina biltegi digital horiek ez daude erraz bisitatu, ikusi edota atzitzeko moduan. Ondarebideak teknologiaren bidez, adituek “erakusketa digitalak” sortuko dituzte eta bisitariak jarraitu ahal izango ditu. Horrez gain, ibilbide propioak sortzeko aukera ematen dio erabiltzaileari, eta, hala, lan kulturalen irakurketa eta interpretazio pertsonaletarako bidea eskaintzen dio.
  • Hizketa-ezagutza Elhuyar hiztegien webgunean: Aholab-eko hizketa teknologia erabilita, Elhuyar hiztegien webgunean euskarazko bilaketak hizketa bidez egiteko aukera jarri da.
  • Gida eleaniztunak: Teknologia honek aukera ematen du museo, ibilbide turistiko edo ondare-ibilbideetako kartel edo paneletako oharrak hizkuntza askotan izateko, espazio fisikotik harago, erabiltzailearen mugikorra baliatuta. Panelean dagoen QR kode bat eskaneatuta, bisitariak bere mugikorrean eta bere hizkuntzan irakurri edo entzun ahal izango ditu paneleko azalpenak. Azalpenak itzulpen automatiko bidez itzul daitezke eta audioak hizketa-sintesi bidez sortu. Hainbat museo, eraikin eta herri-ibilbidetan inplementatuta dago.
  • Xenda: Eduki asko duten webguneetan (hala nola komunikabideak, webgune korporatiboak, intranetak, dokumentu-funtsak...) bilaketa eta nabigazio tresna egokiak behar-beharrezkoak dira. Xenda tresnak bilaketarako hainbat funtzio aurreratu eskaintzen ditu: hizkuntza arteko bilaketa, lematizazioa, autocomplete funtzioa, zuzenketa ortografikoa... Horrez gain, gai da erlazionatutako eduki eleaniztunak erakusteko eta etiketa gisa erakusten diren entitateak detektatzeko ere. Besteren artean, Hekimen-eko komunikabideetan integratu da Xenda.
  • Irakurle digitala: Ikasle ugarik irakurtzeko zailtasunak dituzte hainbat kausa medio: dislexia, ikusmen urritasuna, hizkuntzari lotuko nahasmenak... Tresna honek irakurketa-idazketa prozesuan lagundu dezake. Ikasleek landu behar dituzten testuak, zeinak hainbat hizkuntzatan (euskara, gaztelania, ingelesa) eta formatutan (PDf, HTML, DOC, ODT) egon baitaitezke, tresna honen bidez entzun ahal izango dituzte, esaldiz esaldi, tonu eta soinu naturalarekin eta hizkuntza bakoitzari dagokion ahoskerarekin.
  • e-Rolda: e-ROLda BVI lexikoian eta EPEC-RolSem corpusean dagoen informazioa arakatzea ahalbideratzen digun tresna bat da. Sisteman sartzen garenean, informazio orokorra eta horren gainean bilaketak egiteko aukera ematen zaigu. Bilaketa zenbait ezaugarri orokorren arabera egin daiteke: i) euskal aditza, ii) euskal aditzaren adiera jakin bat edo iii) PB-VNeko aditz-adiera. Corpusean bilaketa zehatzagoak egiteko aukera ere eskaintzen digu tresnak.
  • Konbitzul: Konbitzul datu-baseak aditzez eta izenez osatutako unitate fraseologikoak eta haien ordainak biltzen ditu, gaztelaniatik euskarara eta euskaratik gaztelaniara. Hiztegi fraseologiko bat izateaz gain, informazio linguistiko gehigarria ere gordetzen du, batez ere itzultzaile automatikoetan erabili ahal izateko prestatuta.
  • ixaKat: ixaKat Euskal Herriko Unibertsitateko IXA taldean garatzen ari den hizkuntza prozesatzaileen kate modular bat da, euskararen tratamendu automatikorako balio duena.
  • AnalHitza: AnalHitzak, Hizkuntza Teknologiak erabiliz, testu-multzoak aztertzen ditu Humanitateetako ikertzaileei datu linguistiko fidagarriak eta erraz manipula daitezkeenak eskaintzeko helburuarekin. ANALHITZArekin idatziriko testuak hizkuntza hauetan azter daitezke: euskaraz, ingelesez eta gaztelaniaz.
  • COMPRESS-EUS: COMPRESS-EUS tresna laburpenak jasotzeko sistema bat da, erabiltzaile ezberdinek laburtzeko duten gaitasuna aztertzeko helburuz egin dena. Tresna honek bi motatako laburpenak jasotzen ditu. a) Erauzketa-laburpenak: testua aldatu gabe testuko zatirik garrantzitsuenak azpimarratzea. b) Abstrakziozko laburpenak: erabiltzaileak nahi duen moduko testu laburra berreginez.
  • NeoTerm: NeoTerm tresnak medikuntzako domeinuko hitz elkarketak sortzen ditu, euskaraz.
  • Transkit: Transkripzio eta azpititulaketa automatikoa. Vicomtech-IK4ren ahotsa ezagutzeko teknologian oinarrituta, ahozko edukien transkibapen aberastuen eta azpitituluen sorkuntza automatikoa ahalbidetzen du. Transkripzioez gain, hizkuntzaren detekzioa, puntuazio eta kapitalizazio-marken esleipena, hiztunen segmentazio eta identifikazioa eta azpitituluen segmentazioa ere automatikoki egiten ditu [1].
  • EITB corpus elebiduna: Vicomtech-IK4k, EITBk eta Mondragon Linguak albiste domeinuan dagoen euskarazko eta gaztelaniazko lehen corpus elebiduna kaleratu dute, lerrokatuta dauden milioi erdi esaldi konparagarri baino gehiagorekin. Corpusa EITBk sortutako albisteetatik konpilatu da, Eusko Jaurlaritzaren laguntzarekin aurrera eraman diren hainbat ikerketa-proiektuen barruan. Edukia biziki konparagarria da, eta baliabide berri hori oso lagungarria izango da baliabide urriak dituen hizkuntza-bikote horren ikerketa eta garapenerako. Hiru kideak corpusaren bertsio handiago bat prestatzen ari dira, Vicomtech-IK4en dokumentuak [2] eta esaldiak [3] lerrokatzeko tresnak erabiliz.
  • Bertsolaritza-sintesirako hizketa datu-basea: Datu-base bat osatu dugu Bertsozale Elkartearengandik eskuratutako grabaketa eta datuekin, 189 bertsolariren 2094 grabaketa dituena, denak abestutako testuarekin eta kasurik gehienetan baita erabilitako doinu eta neurriarekin ere. Bertsolari bakoitzari dagokion zatia automatikoki etiketatu da diarizazio-sistema baten bidez, eta, ondoren, fonema mailan segmentatu lerrokatze behartu bidez [4].
  • Bertsokantari: Hizketa-sintesi sistema estatistiko batetik abiatuta, abestutako hizketa-sintesi sistema bat garatu da. Horretarako, fonemen iraupena eta pitch kurba aldatu behar izan dira, doinuaren erritmoa errespetatzeko. Gainera, naturaltasuna hobetzeko eskuz doitutako vibrato-kontrol bat gehitu zaio, eta espektro-transformazio bat ahots abestu eta hizketakoaren arteko espektro-maldaren ezberdintasunak konpentsatzeko. Sistema kontrolatzeko, PureData-n garatutako interfaze bat erabili da [5].

Hauez gain, aurreko proiektuetatik ateratako beste emaitza ugari ere badaude.

[1] A. Álvarez, H. Arzelus, S. Prieto, A. del Pozo, "Rich Transcription and Automatic Subtitling for Basque and Spanish", IberSPEECH 2016

[2] Thierry Etchegoyhen and Andoni Azpeitia (2016) A Portable Method for Parallel and Comparable Document Alignment. Baltic Journal of Modern Computing, 4(2):243–255, 2016. Special Issue: Proceedings of EAMT 2016.

[3] Thierry Etchegoyhen and Andoni Azpeitia (2016) Set-Theoretic Alignment for Comparable Corpora. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany, Volume 1: Long Papers, pages 2009–2018.

[4] X. Sarasola, E. Navas, D. Tavárez, D. Erro, I. Saratxaga, I. Hernáez, “A singing voice database in Basque for statistical singing synthesis of bertsolaritza” Proc. of LREC 2016.

[5] E. Del Blanco, I. Hernáez, E. Navas, X. Sarasola, D. Erro, “Bertsokantari: a TTS based singing synthesis system” Proc. Interspeech, 2016.

Demoak

ElkarOla proiektuko teknologiekin egin daitezkeen demo batzuk

Lurraldea

VADI: bezeroen arreta-zerbitzurako elkarrizketa-agentea

Biozientziak eta Osasuna

Osasun-arloko termino eta erlazioen bilatzailea

Fabrikazio Aurreratua

Aditu baten eta langile baten arteko denbora errealeko telelaguntza

Argitalpenak

Kongresu eta aldizkari zientifikoetan argitaratutako artikuluak

  1. D. Lindemann and I. San Vicente, “Building Corpus-based Frequency Lemma Lists’, Procedia – Social and Behavioral Sciences, vol. 198, pp. 266–277, Jul. 2015.
  2. D. Lindemann and I. San Vicente. 2015. “Corpusetan oinarritutako hiztegi elebidun berria sortzen”. In Proceedings of IkerGazte: Nazioarteko ikerketa euskaraz. Durango, Basque Country, 2015/05.
  3. Zubiaga, A., San Vicente, I., Gamallo, P., Pichel, J. R., Alegria, I., Aranberri, N., Ezeiza A., Fresno, V. (2015). “TweetLID: a benchmark for tweet language identification”. Language Resources and Evaluation (2015). DOI: 10.1007/s10579-015-9317-4.
  4. I. Alegria, N. Aranberri, P. Comas, V. Fresno, P. Gamallo, L. Padró, I. S. Vicente, J. Turmo, eta A. Zubiaga. (2015). “TweetNorm: a benchmark for lexical normalization of Spanish tweets.” Language Resources and Evaluation, volume 49, issue 4, pp. 883-905. DOI: 10.1007/s10579-015-9315-6.
  5. I. San Vicente, X. Saralegi, and R. Agerri. 2015. “EliXa: A modular and flexible ABSA platform“. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, 2015/06/04, pp. 748–752.[posterra]
  6. I. San Vicente and X. Saralegi Urizar. 2015. “Sentimenduen analisirako lexikoen sorkuntza”. In Proceedings of IkerGazte: Nazioarteko ikerketa euskaraz. Durango, Basque Country, 2015/05. [posterra]
  7. A. Gurrutxaga, I. Alegria, and X. Artola. 2015. “Euskarazko izena+aditza konbinazioak corpusetik automatikoki erauztea eta idiomatikotasunaren arabera karakterizatzea”. In Proceedings of IkerGazte: Nazioarteko ikerketa euskaraz. Durango, Basque Country, 2015/05.
  8. I. Leturia. 2015. “Weba euskarazko corpus gisa”. In Proceedings of IkerGazte: Nazioarteko ikerketa euskaraz. Durango, Basque Country, 2015/05.
  9. Mikel Artetxe, Eneko Agirre, Iñaki Alegria, Gorka Labaka 2015. Analyzing English-Spanish Named-Entity enhanced Machine Translation. Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-9). NAACL2015
  10. Eneko Agirre, Iñaki Alegria, Nora Aranberri, Mikel Artetxe, Ander Barrena, António Branco, Arantza Diaz de Ilarraza, Koldo Gojenola, Gorka Labaka, Arantxa Otegi and Kepa Sarasola 2015. Lexical semantics, Basque and Spanish in QTLeap: Quality Translation by Deep Language Engineering Approaches. Procesamiento del Lenguaje natural, vol. 55, pp. 169-172. ISSN: 1135-5948
  11. Artetxe M., Labaka G., Sarasola K. 2015. Building hybrid machine translation systems by using an EBMT preprocessor to create partial translations. Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT2015), pp. 11-18, Antalya, Turkey
  12. A. Minard, M. Speranza, R. Urizar, B. Altuna, M. van Erp, A. Schoen, and C. van Son, “MEANTIME, the NewsReader Multilingual Event and Time Corpus,” (submitted)
  13. A. Minard, M. Speranza, E. Agirre, I. Aldabe, M. van Erp, B. Magnini, G. Rigau, and R. Urizar, “Semeval-2015 task 4: timeline: cross-document event ordering,” in Proceedings of the 9th international workshop on semantic evaluation (semeval 2015), Denver, Colorado, 2015, pp. 778-786.
  14. Aldabe I., Larrañaga M., Maritxalar M., Arruarte A., Elorriaga J.A. 2015. Domain Module Building from Textbooks: Integrating Automatic Exercise Generation. 17th International Conference on Artificial Intelligence in Education (AIED 2015)
  15. Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Iñigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe 2015. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 252-263. ISBN 978-1-941643-40-2.
  16. Agirre E., Aldabe I., Lopez de Lacalle O., Lopez-Gazpio I., Maritxalar M. 2015. Erantzunen kalifikazio automatikorako lehen urratsak. EKAIA: Euskal Herriko Unibertsitateko zientzi eta teknologi aldizkaria. DOI: 10.1387/ekaia.14530. ISSN: 0214-9001.
  17. Eneko Agirre, Aitor Gonzalez-Agirre, Iñigo Lopez-Gazpio, Montse Maritxalar, German Rigau, Larraitz Uria 2015. UBC: Cubes for English Semantic Textual Similarity and Supervised Approaches for Interpretable STS. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 178-183.. ISBN 978-1-941643-40-2.
  18. Aranzabe M., Atutxa A., Bengoetxea K., Díaz de Ilarraza A., Goenaga I., Gojenola K., Uria L. 2015. Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies. Markus Dickinsons, Erhard Hinrichs, Agnieszka Patejuk, Adam Przepiórkowski (eds), Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14), 233-241. Institute of Computer Science of the Polish Academy of Sciences, Warszawa, Poland. ISBN: 978-83-63159-18-4
  19. Atutxa A., Ezeiza N., Goenaga I., Gojenola K. 2015. Experiments on Semi-supervised Dependency Parsing of a Morphologically Rich Language. 6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2015)
  20. Lopez de Lacalle O., Agirre E. 2015a. Crowdsourced Word Sense Annotations and Difficult Words and Examples. The 11th International Conference on Computational Semantics (IWCS-2015)
  21. Lopez de Lacalle O., Agirre E. 2015b. A Methodology for Word Sense Disambiguation at 90% based on large-scale CrowdSourcing. Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*Sem-2015)
  22. Lopez de Lacalle M., Aldabe I., Laparra E., Rigau G. (Revised and Resubmitted). Automatically extending the semantic interoperability between predicate resources. Language Resources and Evaluation. Q4 JCR 0.619
  23. Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J. 2015. Methodology for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labelled at predicate level following the PropBank/Verbnet model. Edward Vanhoutte (ed.) Digital Scholarship in the Humanities, Volume 30, Number 2, 1-23. Oxford University Press (Online ISSN 2055-768X - Print ISSN 2055-7671) doi: 10.1093/llc/fqv001
  24. Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J. 2015. EPEC-RolSem: Ingelesezko PropBank-VerbNet eredura etiketatutako euskarazko corpusa. Erabakiak, egokitzapenak eta berezitasunak. Maria-José Ezeizabarrena & Ricardo Gómez (arg.). Eridenen du zerzaz kontenta: sailkideen omenaldia Henrike Knörr irakasleari (1947-2008), 179-206 or., Bilbo: Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua. ISBN: 978-84-9082-092-6.
  25. E. Laparra, I. Aldabe, and G. Rigau, “Document level time-anchoring for timeline extraction,” in Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (acl-ijcnlp 2015), Beijing, China, 2015.
  26. R. Izquierdo, A. Suárez, and G. Rigau, “Word vs. class-based word sense disambiguation,” Journal of artificial intelligence research, vol. 54, pp. 83-122, 2015.
  27. R. Agerri, X. Artola, Z. Beloki, G. Rigau, and A. Soroa, “Big data for natural language processing: a streaming approach,” Knowledge-based systems, 2015. doi:http://dx.doi.org/10.1016/j.knosys.2014.11.007
  28. P. Vossen, E. Laparra, I. Aldabe, and G. Rigau, “Interoperability for cross-lingual and cross-document event detection,” in Proceedings of the 3rd workshop on events: definition, detection, coreference, and representation. events workshop at naacl-hlt 2015, Denver, Colorado, 2015.
  29. R. Segers, P. Vossen, M. Rospocher, L. Serafini, E. Laparra, and G. Rigau, “ESO: a frame based ontology for events and implied situations,” in Proceedings of maplex 2015, Yamagata, Japan, 2015.
  30. E. Laparra, I. Aldabe, and G. Rigau, “From timelines to storylines: a preliminary proposal for evaluating narratives,” in Proceedings of the 1st workshop on computing news storylines (cnews 2015) at the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (acl-ijcnlp 2015), Beijing, China, 2015.
  31. Arantxa Otegi, Nerea Ezeiza, Iakes Goenaga, Gorka Labaka. Efficient and Easy NLP Processing with Basque IXA Pipeline (submitted)
  32. SALABERRI, Haritz; ARREGI, Olatz; ZAPIRAIN, Benat. bRol: The Parser of Syntactic and Semantic Dependencies for Basque. In Proceedings of Recent Advances in Natural Language Processing, pages 555–562, Hissar, Bulgaria, Sep 7–9 2015.
  33. SALABERRI, Haritz; ARREGI, Olatz; ZAPIRAIN, Beñat. IXAGroupEHUSpaceEval:(X-Space) A WordNet-based approach towards the automatic recognition of spatial information following the ISO-Space annotation scheme. In Proceedings of the 9th International Workshop on Semantic Evaluation. 2015. p. 856-861.
  34. Nora Aranberri, Gorka Labaka, Oneka Jauregi, Arantza Díaz de Ilarraza, Iñaki Alegria, Eneko Agirre 2016. Tectogrammar-based machine translation for English-Spanish and English-Basque. SEPLN, 56, 73-80
  35. Aranberri N., Labaka G., Díaz de Ilarraza A. and Sarasola K. 2015. Exploiting portability to build an RBMT prototype for a new source language. Proceedings of the 18th Annual Conference of the European Association for Machine Translation, EAMT-2015, pp. 3-10, Antalya, Turkey
  36. Gorka Labaka, Oneka Jauregi, Arantza Díaz de Ilarraza, Michael Ustaszewski, Nora Aranberri and Eneko Agirre 2015. Deep-syntax TectoMT for English-Spanish MT. Proceedings of the 1st Deep Machine Translation Workshop (DMTW 2015), pages 55–63, Praha, Czech Republic,
  37. Aranberri N. 2015. SMT error analysis and mapping to syntactic, semantic and structural fixes. Proceedings of SSST-9, Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation , pages 30–38, Denver, Colorado, June 4, 2015.
  38. Gaudio, Rosa Del, Aljoscha Burchardt, Nora Aranberri, António Branco and Martin Popel, 2015. Report on the Embedding and Evaluation of the Second MT Pilot. Deliverable D3.10, Version 1.6, QTLeap Project.
  39. Aranberri N., Díaz de Ilarraza A., Labaka G., Sarasola K. 2015. Ebaluatoia 2014: ingelesa-euskara itzultzaile automatikoen konparazioa. X. Informatikari Euskaldunen Bilkura (IEB2015) Udako Euskal Unibertsitatea. Donostia
  40. Alegria I., Artetxe M., Labaka G., Sarasola K. 2015. EHU at TweetMT: Adapting MT Engines for Formal Tweets. CEUS-WS, Vol-1445, 20-24 Proceedings of the Tweet Translation Workshop 2015 co-located with 31st Conference of the Spanish Society for Natural Language Processing (SEPLN 2015) ISSN 1613-0073.
  41. Iñaki Alegria, Nora Aranberri, Cristina España-Bonet, Pablo Gamallo, Hugo Gonçalo Oliveira, Eva Martínez Garcia, Iñaki San Vicente, Antonio Toral, Arkaitz Zubiaga 2015. Overview of TweetMT: A Shared Task on Machine Translation of Tweets at SEPLN 2015. CEUS-WS, Vol-1445, 8-19 Proceedings of the Tweet Translation Workshop 2015 co-located with 31st Conference of the Spanish Society for Natural Language Processing (SEPLN 2015) ISSN 1613-0073
  42. Iñurrieta U. 2015. Translation of Spanish Multiword Expressions into Basque: linguistic analysis and detection experiment. Actas del XXXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. ISBN: 978-84-608-1989-9.
  43. Iñurrieta U. 2015. Konbitzul: euskarazko eta gaztelaniazko izen+aditz konbinazioen datu-basea. IkerGazte: nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma, 32-38. ISBN: 978-84-8438-540-
  44. Casillas A., Díaz de Ilarraza A., Gojenola K., Oronoz M., Perez A. 2015 Computer aided classification of diagnostic terms in Spanish Expert Systems with Applications, Volume 42, Issue 6, 15 April 2015, Pages 2949-2958
  45. Maite Oronoz, Koldo Gojenola, Alicia Pérez, Arantza Díaz de Ilarraza, Arantza Casillas 2015. On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions. Journal of Biomedical Informatics, Volume 56, August 2015, Pages 318–332
  46. Sara Santiso, Arantza Casillas, Alicia Pérez, Maite Oronoz, Koldo Gojenola, 2015. Document-level adverse drug reaction event extraction on electronic health records in Spanish. Procesamiento del Lenguaje Natural, Volume 56, Pages 49–56.
  47. Sanchez, J., Saratxaga, I., Hernaez, I., Navas, E., Erro, D., & Raitio, T. (2015). Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information. IEEE Transactions on Information Forensics and Security, 10(4), pp. 810–820. ISSN: 1556-6013.
  48. Erro, D., Alonso, A., Serrano, L., Navas, E., Hernaez, I. Interpretable parametric voice conversion functions based on Gaussian mixture models and constrained transformations. Computer Speech & Language, vol. 30(1), pp. 3-15, 2015. ISSN: 0885-2308
  49. Castán, D., Tavarez, D., Lopez-Otero, P., Franco-Pedroso, J., Delgado, H., Navas, E., Docio-Fernández, L., Ramos, D., Serrano, J., Ortega, A., Lleida, E. Albayzín-2014 evaluation: audio segmentation and classification in broadcast news domains. EURASIP Journal on Audio, Speech, and Music Processing, 2015:33. ISSN: 1687-4722. Impact factor 2014: 0.386 (213/249, Q4)
  50. Erro, D., Hernaez, I., Alonso, A., García-Lorenzo, D., Navas, E., Ye, J., Arzelus, H., Jauk, I., Hy, N. Q., Magariños, C., Pérez-Ramón, R., Sulír, M., Tian, X., Wang, X. Personalized synthetic voices for speaking impaired: website and app, Proceedings of Interspeech, pp. 1251-1254. ISSN: 1990-9770
  51. Alonso A., Erro D., Navas E., Hernaez I. Speaker Adaptation using only Vocalic Segments via Frequency Warping. Proceedings of Interspeech 2015, pp. 2764-2768. ISSN: 1990-9770
  52. Sanchez, J., Saratxaga, I., Hernaez, I., Navas, E., Erro, D. The AHOLAB RPS SSD spoofing challenge 2015 submission, Proceedings of Interspeech, pp. 2042-2046. ISSN: 1990-9770
  53. A. García Pablos, M. Cuadros Oller, G. Rigau Claramunt, V3: Unsupervised Aspect Based Sentiment Analysis for SemEval2015 Task 12, Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
  54. A. García Pablos, M. Cuadros Oller, G. Rigau Claramunt, Unsupervised Word Polarity Tagging by Exploiting Continuous Word Representations, Revista de Procesamiento de Lenguaje Natural 2015
  55. A. Álvarez, H. Arzelus, A. del Pozo, C. Mendes, M. Raffaelli, T. Luís, S. Paulo, N. Piccinini, J. Neto, Automating live and batch subtitling of multimedia contents for several European languages, Multimedia Tools and Applications Journal, 2015
  56. M. Serras, N. Perez, M. I. Torres, A. del Pozo, R. Justo, Topic Classifier for Customer Service Dialog Systems, 18th International Conference on Text, Speech and Dialogue, TSD 2015
  57. I. Saratxaga, J. Sanchez, Z. Wu, I. Hernaez, E. Navas, E. Synthetic speech detection using phase information Speech Communication, 2016; 30 - 41
  58. D. Erro, Two-Band Radial Postfiltering in Cepstral Domain with Application to Speech Synthesis IEEE Signal Processing Letters, 2016; 23(2), 202 - 206 -1070-9908
  59. D. Erro, A. Alonso, L. Serrano, D. Tavarez, I. Odriozola, X. Sarasola, E. del Blanco, J. Sanchez, I. Saratxaga, E. Navas, I. Hernaez ML Parameter Generation with a Reformulated MGE Training Criterion — Participation in the Voice Conversion Challenge 2016 Proceedings of Interspeech, 2016; 1662 - 1666
  60. E. del Blanco, I. Hernaez, E. Navas, X. Sarasola, D. Erro Bertsokantari: a TTS Based Singing Synthesis System Proceedings of Interspeech, 2016; 1240 - 1244
  61. J. Sanchez, I. Hernaez, I. Saratxaga Use of the harmonic phase in synthetic speech detection Proceeding of Iberspeech, 2016; 331 - 338
  62. A. Valdivielso, D. Erro, I. Hernaez Reversible speech de-identification using parametric transformations and watermarking Lecture Notes in Computer Science (LNCS), 2016; 10077, 266 - 275 -978-3-319-49168-4
  63. A. Alonso, D. Erro, E. Navas, I. Hernaez Study of the Effect of Reducing Training Data in Speech Synthesis Adaptation Based on Frequency Warping Lecture Notes in Computer Science (LNCS), 2016; 10077, 3 - 13 -978-3-319-49168-4
  64. A. Pierard, D. Erro, I. Hernaez, E. Navas, T. Dutoit Speech Synthesis Models to Overcome the Scarcity of Training Data Lecture Notes in Computer Science (LNCS), 2016; 10077, 73 - 83 -978-3-319-49168-4
  65. D. Erro, I. Hernaez, L. Serrano, I. Saratxaga, E. Navas Objective comparison of four GMM-based methods for PMA-to-speech conversion Lecture Notes in Computer Science (LNCS), 2016; 10077, 24 - 32 -978-3-319-49168-4
  66. L. Serrano, D. Tavarez, I. Odriozola, I. Hernáez, I. Saratxaga Aholab system for Albayzin 2016 Search-on-Speech Evaluation Proceeding of Iberspeech, 2016; 33 - 42
  67. D. Tavárez, X. Sarasola, E. Navas, L. Serrano, A. Alonso, I. Saratxaga, I. Hernaez Aholab Speaker Diarization System for Albayzin 2016 Evaluation Campaign Proceeding of Iberspeech, 2016; 9 - 18
  68. X. Sarasola, E. Navas, D. Tavárez, D. Erro, I. Saratxaga, I. Hernáez A singing voice database in Basque for statistical singing synthesis of bertsolaritza Proceedings of LREC, 2016; 756 - 759
  69. A. Alvarez, M. Balenciaga, A. del Pozo, H. Arzelus, A. Matamala, C. D. Martínez-Hinarejos, “Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles”, LREC 2016
  70. T. Etchegoyhen, A. Azpeitia, N. Perez, “Exploiting a Large Strongly Comparable Corpus”, LREC 2016
  71. T. Etchegoyhen, A. Azpeitia, “A Portable Method for Parallel and Comparable Document Alignment”, EAMT 2016
  72. A. Azpeitia, T. Etchegoyhen, “Set-Theoretic Alignment for Comparable Corpora”, ACL 2016
  73. M. Serras, N. Perez, M. I. Torres, A. del Pozo, “Entropy-Driven Dialog for Topic Classification: Detecting and Tackling Uncertainty”, IWSDS 2016
  74. A. Minard, M. Speranza, R. Urizar, B. Altuna, M. van Erp, A. Schoen, and C. van Son (2016) “MEANTIME, the NewsReader Multilingual Event and Time Corpus,” Proceedings of LREC 2016.
  75. Altuna Begoña; Aranzabe, Maria Jesús; Díaz de Ilarraza, Arantza (2016) “Euskarazko denbora-informazioaren tratamendu automatikoa TimeMLren eta HeidelTimeren bidez” Ekaia, 30, 153-165. Euskal Herriko Unibertsitateko Argitalpen Zerbitzua. ISSN: 0214-9001 e-ISSN: 2444-3255
  76. Altuna Begoña; Aranzabe, Maria Jesús; Díaz de Ilarraza, Arantza (2016) “Adapting TimeML to Basque: Event annotation”. Proceedings of CICLING 2016
  77. Lopez de Lacalle O., Agirre E. 2015a. Crowdsourced Word Sense Annotations and Difficult Words and Examples. The 11th International Conference on Computational Semantics (IWCS-2015)
  78. Lopez de Lacalle O., Agirre E. 2015b. A Methodology for Word Sense Disambiguation at 90% based on large-scale CrowdSourcing. Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*Sem-2015)
  79. López de Lacalle M., Aldabe I., Laparra E. and Rigau G. Predicate Matrix. Automatically extending the semantic interoperability between predicate resources. Language Resources and Evaluation. ISSN: 1574-0218. Volume 50(2), pages 263-289. 2016. http://dx.doi.org/10.1007/s10579-016-9348-5.
  80. Atutxa, Aitziber; Ezeiza, Nerea, Goenaga Iakes; Gojenola, Koldo (2015). Experiments on Semi-supervised Dependency Parsing of a Morphologically Rich Language. 6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2015)
  81. Otegi A., Ezeiza N., Goenaga I., Labaka G. (2016). A Modular Chain of NLP Tools for Basque. Proceedings of the 19th International Conference on Text, Speech and Dialogue - TSD 2016, Brno, Czech Republic, volume 9924 of Lecture Notes in Artificial Intelligence, pp. 93-100. ISBN 978-3-319-45509-9
  82. Bengoetxea, Kepa; Gojenola, Koldo (2016). Euskararako analizatzaile sintaktiko-estatistikoa hobetzeko teknikak. Ekaia, 19-25, ISSN: 0214-9001
  83. Urruzola, Jasone. Euskarazko espazio-egituren etiketatzea: lehen urratsak. Tutoreak: Izaskun Aldezabal eta Ainara Estarrona. http://ixa.si.ehu.es/master/master_tesiak
  84. Salaberri, Haritz; Arregi, Olatz; Zapirain, Beñat. Euskarazko gertaeren etiketatze automatikoa. Enviado a Ikergazte 2017.
  85. Agerri, R., Rigau, G. (2016). Robust multilingual Named Entity Recognition with shallow semi-supervised features. Artificial Intelligence, 238, pp 63-82.
  86. Piek Vossen, Rodrigo Agerri, Itziar Aldabe, Agata Cybulska, Marieke van Erp, Antske Fokkens, Egoitz Laparra, Anne-Lyse Minard, Alessio Palmero Aprosio, German Rigau, Marco Rospocher, Roxane Segers (2016) NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Knowledge-Based Systems, 110, pages 60-85.
  87. Agerri A., Aldabe I., Laparra E., Rigau G., Fokkens A., Huijgen P., van Erp M., Izquierdo R., Vossen P., Minard A. and Magnini B (2016) Multilingual Event Detection using the NewsReader pipelines. Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability at the 10th Language Resources and Evaluation Conference (LREC’16). Portorož, Slovenia.
  88. Otegi A., Ezeiza N., Goenaga I., Labaka G. (2016). A Modular Chain of NLP Tools for Basque. Proceedings of the 19th International Conference on Text, Speech and Dialogue - TSD 2016, Brno, Czech Republic, volume 9924 of Lecture Notes in Artificial Intelligence, pp. 93-100. ISBN 978-3-319-45509-9
  89. E. Laparra, I. Aldabe, and G. Rigau, “Document level time-anchoring for timeline extraction,” in Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (acl-ijcnlp 2015), Beijing, China, 2015.
  90. R. Izquierdo, A. Suárez, and G. Rigau, “Word vs. class-based word sense disambiguation,” Journal of artificial intelligence research, vol. 54, pp. 83-122, 2015.
  91. R. Agerri, X. Artola, Z. Beloki, G. Rigau, and A. Soroa, “Big data for natural language processing: a streaming approach” Knowledge-based systems, 2015.
  92. P. Vossen, E. Laparra, I. Aldabe, and G. Rigau, “Interoperability for cross-lingual and cross-document event detection” in Proceedings of the 3rd workshop on events: definition, detection, coreference, and representation. events workshop at naacl-hlt 2015, Denver, Colorado, 2015.
  93. R. Segers, P. Vossen, M. Rospocher, L. Serafini, E. Laparra, and G. Rigau, “ESO: a frame based ontology for events and implied situations” in Proceedings of maplex 2015, Yamagata, Japan, 2015.
  94. E. Laparra, I. Aldabe, and G. Rigau, “From timelines to storylines: a preliminary proposal for evaluating narratives,” in Proceedings of the 1st workshop on computing news storylines (cnews 2015) at the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (acl-ijcnlp 2015), Beijing, China, 2015.
  95. Vossen P., Agerri R., Aldabe I., Cybulska A., van Erp M., Fokkens A., Laparra E., Minard A., Palmero A., Rigau G., Rospocher M., Segers R. (2016) “NewsReader: using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news” Knowledge-Based Systems. ISSN 0950-7051. Volumen 110, pages 60-85.
  96. Agerri R. and Rigau G. (2016) “Robust multilingual named entity recognition with shallow semi-supervised features” Artificial Intelligence. ISSN: 0004-3702. Volume 238, pages 63-82.
  97. López de Lacalle M., Aldabe I., Laparra E., Rigau G. (2016) Predicate Matrix. Automatically extending the semantic interoperability between predicate resources. Language Resources and Evaluation. ISSN: 1574-0218. Volumen 50(2), pages 263-289
  98. Rospocher M., van Erp M., Vossen P., Fokkens A., Aldabe I., Rigau G., Soroa A., Ploeger T., Bogaard T. (2016) “Building Event-Centric Knowledge Graphs from News”. Journal of Web Semantics. ISSN: 1570-8268. Volumens 37-38, pages 132-151
  99. López de Lacalle M., Laparra El, Aldabe I., Rigau G. (2016) “A Multilingual Predicate Matrix” Proceedings of the 10th Language Resources and Evaluation Conference (LREC’16). Portoroz, Slovenia
  100. Segers R., Rospocher M., Vossen P., Laparra E., Rigau G., Minard A. (2016) “The Event and Implied Situation Ontology (ESO): Application and Evaluation”. Proceedings of the 10th Language Resources and Evaluation Conference (LREC’16). Portoroz, Slovenia
  101. Postma M., Izquierdo R., Agirre E., Rigau G., Vossen P. (2016) “Addressing the MFS bias in WSD systems”. Proceedings of the 10th Language Resources and Evaluation Conference (LREC’16). Portoroz, Slovenia
  102. Garcia-Pablos A., Cuadros M., Rigau G. (2016) “A Comparison of Domain-based Word Polarity Estimation using different Word Embeddings”. Proceedings of the 10th Language Resources and Evaluation Conference (LREC’16). Portoroz, Slovenia
  103. Agerri R., Aldabe I., Laparra E., Rigau G., Fokkens A., Huijgen P., van Erp M., Izquierdo R., Vossen P., Minard A., Magnini B. (2016) “Multilingual Event Detection using the NewsReader pipelines”. Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability at the 10th Language Resources and Evaluation Conference (LREC’16). Portoroz, Slovenia
  104. Segers R., Laparra E., Rospocher M., Vossen P., Rigau G., Ilievsky F. (2016) “The Predicate Matrix and the Event and Implied Situation Ontoloy: Making More of Events”. Proceedings of the 8th Global WordNet Conference (GWC 2016). Buchares, Rumania
  105. Kattenberg M., Beloki Z., Soroa A., Artola X., Fokkens A., Huygen P., Verstoep K. (2016). “Two architectures for parallel processing for huge amounts of text” Proceedings of the 10th Language Resources and Evaluation Conference (LREC’16). Portoroz, Slovenia
  106. Otegi A., Ezeiza N., Goenaga I., Labaka G. (2016). “A Modular Chain of NLP Tools for Basque” Proceedings of the 19th International Conference on Text, Speech and Dialogue - TSD 2016, Brno, Czech Republic, volume 9924 of Lecture Notes in Artificial Intelligence, pp. 93-100. ISBN 978-3-319-45509-9
  107. Aldabe I., Larrañaga M., Maritxalar M., Arruarte A., Elorriaga J.A. 2015. Domain Module Building from Textbooks: Integrating Automatic Exercise Generation. 17th International Conference on Artificial Intelligence in Education (AIED 2015)
  108. Agirre, Eneko, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Iñigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe 2015. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 252-263. ISBN 978-1-941643-40-2.
  109. Agirre E., Aldabe I., Lopez de Lacalle O., Lopez-Gazpio I., Maritxalar M. 2015. Erantzunen kalifikazio automatikorako lehen urratsak. EKAIA: Euskal Herriko Unibertsitateko zientzi eta teknologi aldizkaria. DOI: 10.1387/ekaia.14530. ISSN: 0214-9001.
  110. Agirre, E, Aitor Gonzalez-Agirre, Iñigo Lopez-Gazpio, Montse Maritxalar, German Rigau, Larraitz Uria 2015. UBC: Cubes for English Semantic Textual Similarity and Supervised Approaches for Interpretable STS. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 178-183.. ISBN 978-1-941643-40-2.
  111. Agirre E., Gonzalez Agirre A., Lopez-Gazpio I., Maritxalar M., Rigau G., Uria L. 2016. SemEval-2016 Task 2: Interpretable Semantic Textual Similarity. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), pp. 512-524. ISBN 978-1-941643-95-2
  112. Agirre E., Lopez-Gazpio I., Maritxalar M. 2016. iUBC at SemEval-2016 - Task 2: RNNs and LSTMs for interpretable STS. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), pp. 771-776. ISBN 978-1-941643-95-2
  113. Nora Aranberri, Gorka Labaka, Oneka Jauregi, Arantza Díaz de Ilarraza, Iñaki Alegria, Eneko Agirre 2016. Tectogrammar-based machine translation for English-Spanish and English-Basque. SEPLN, 56, 73-80
  114. Aranberri N., Labaka G., Díaz de Ilarraza A. and Sarasola K. 2015. Exploiting portability to build an RBMT prototype for a new source language. Proceedings of the 18th Annual Conference of the European Association for Machine Translation, EAMT-2015, pp. 3-10, Antalya, Turkey
  115. Gorka Labaka, Oneka Jauregi, Arantza Díaz de Ilarraza, Michael Ustaszewski, Nora Aranberri and Eneko Agirre 2015. Deep-syntax TectoMT for English-Spanish MT. Proceedings of the 1st Deep Machine Translation Workshop (DMTW 2015), pages 55–63, Praha, Czech Republic
  116. Aranberri N. 2015. SMT error analysis and mapping to syntactic, semantic and structural fixes. Proceedings of SSST-9, Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation , pages 30–38, Denver, Colorado, June 4, 2015.
  117. Gaudio, Rosa Del, Aljoscha Burchardt, Nora Aranberri, António Branco and Martin Popel, 2015. Report on the Embedding and Evaluation of the Second MT Pilot. Deliverable D3.10, Version 1.6, QTLeap Project.
  118. Nora Aranberri, Gorka Labaka, Oneka Jauregi, Arantza Díaz de Ilarraza, Iñaki Alegria, Eneko Agirre 2016. Tectogrammar-based machine translation for English-Spanish and English-Basque. SEPLN, 56, 73-80
  119. Aranberri N., Labaka G., Díaz de Ilarraza A. and Sarasola K. 2015. Exploiting portability to build an RBMT prototype for a new source language. Proceedings of the 18th Annual Conference of the European Association for Machine Translation, EAMT-2015, pp. 3-10, Antalya, Turkey
  120. Gorka Labaka, Oneka Jauregi, Arantza Díaz de Ilarraza, Michael Ustaszewski, Nora Aranberri and Eneko Agirre 2015. Deep-syntax TectoMT for English-Spanish MT. Proceedings of the 1st Deep Machine Translation Workshop (DMTW 2015), pages 55–63, Praha, Czech Republic
  121. Aranberri N. 2015. SMT error analysis and mapping to syntactic, semantic and structural fixes. Proceedings of SSST-9, Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation , pages 30–38, Denver, Colorado, June 4, 2015.
  122. Gaudio, Rosa Del, Aljoscha Burchardt, Nora Aranberri, António Branco and Martin Popel, 2015. Report on the Embedding and Evaluation of the Second MT Pilot. Deliverable D3.10, Version 1.6, QTLeap Project.
  123. Artetxe M., Labaka G., Saedi C., Rodrigues J., Silva J., Branco A., Agirre E. (2016) Adding syntactic structure to bilingual terminology for improved domain adaptation Proceedings of the 2nd Deep Machine Translation Workshop (DMTW 2016). ISBN 978-80-88132-02-8
  124. San Vicente, I. Alegria, I., Aranberri, N. España-Bonet, C., Gamallo, P., Gonçalo Oliveira, H., Martínez, E., Toral, A., Zubiaga, A (2016) TweetMT: A parallel microblog corpus. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
  125. O. Perez-de-Viñaspre and G. Labaka (2016) IXA Biomedical Translation System. Biomedical Translation Task. Proceedings of WMT16
  126. Labaka G., Alegria I. and Sarasola K. (2016) Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
  127. Mikel Artetxe, Eneko Agirre, Iñaki Alegria, Gorka Labaka 2015. Analyzing English-Spanish Named-Entity enhanced Machine Translation. Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-9). NAACL2015
  128. Eneko Agirre, Iñaki Alegria, Nora Aranberri, Mikel Artetxe, Ander Barrena, António Branco, Arantza Diaz de Ilarraza, Koldo Gojenola, Gorka Labaka, Arantxa Otegi and Kepa Sarasola 2015. Lexical semantics, Basque and Spanish in QTLeap: Quality Translation by Deep Language Engineering Approaches. Procesamiento del Lenguaje natural, vol. 55, pp. 169-172. ISSN: 1135-5948
  129. Artetxe M., Labaka G., Sarasola K. 2015. Building hybrid machine translation systems by using an EBMT preprocessor to create partial translations.Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT2015), pp. 11-18, Antalya, Turkey
  130. R. Gaudio, G. Labaka, E. Agirre, P. Osenova, K. Simov, M. Popel, D. Oele, G. van Noord, L. Gomes, J. Rodrigues, S. Neale, J. Silva, A. Querido, N. Rendeiro and A. Branco (2016) SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task. First conference on machine translation (WMT16)
  131. Nora Aranberri, Gorka Labaka, Oneka Jauregi, Arantza Díaz de Ilarraza, Iñaki Alegria, Eneko Agirre (2016) Tectogrammar-based machine translation for English-Spanish and English-Basque. SEPLN, 56, 73-80. ISSN: 1135-5948
  132. Nora Aranberri, Eleftherios Avramidis, Aljoscha Burchardt, Ondrej Klejch, Martin Popel and Maja Popovic (2016) Tools and Guidelines for Principled Machine Translation Development. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
  133. Nora Aranberri, Gorka Labaka, Arantza Díaz de Ilarraza eta Kepa Sarasola (2016) Ebaluatoia: crowd evaluation for English;Basque machine translation. Language Resources and Evaluation, p1-32
  134. Aranberri N., Díaz de Ilarraza A., Labaka G., Sarasola K. 2015. Ebaluatoia 2014: ingelesa-euskara itzultzaile automatikoen konparazioa. X. Informatikari Euskaldunen Bilkura (IEB2015) Udako Euskal Unibertsitatea. Donostia
  135. Alegria I., Artetxe M., Labaka G., Sarasola K. 2015. EHU at TweetMT: Adapting MT Engines for Formal Tweets. CEUS-WS, Vol-1445, 20-24 Proceedings of the Tweet Translation Workshop 2015 co-located with 31st Conference of the Spanish Society for Natural Language Processing (SEPLN 2015) ISSN 1613-0073.
  136. Iñaki Alegria, Nora Aranberri, Cristina España-Bonet, Pablo Gamallo, Hugo Gonçalo Oliveira, Eva Martínez Garcia, Iñaki San Vicente, Antonio Toral, Arkaitz Zubiaga 2015. Overview of TweetMT: A Shared Task on Machine Translation of Tweets at SEPLN 2015. CEUS-WS, Vol-1445, 8-19 Proceedings of the Tweet Translation Workshop 2015 co-located with 31st Conference of the Spanish Society for Natural Language Processing (SEPLN 2015) ISSN 1613-0073
  137. Iñurrieta U. 2015. Translation of Spanish Multiword Expressions into Basque: linguistic analysis and detection experiment. Actas del XXXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. ISBN: 978-84-608-1989-9.
  138. Iñurrieta U. 2015. Konbitzul: euskarazko eta gaztelaniazko izen+aditz konbinazioen datu-basea. IkerGazte: nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma, 32-38. ISBN: 978-84-8438-540-0
  139. Iñurrieta U., Aduriz I., Díaz de Ilarraza A., Labaka G., Sarasola K. (2016) Ez burua hautsi, Matxin! Elhuyar aldizkaria 323.
  140. Iñurrieta U., Díaz de Ilarraza A., Labaka G., Sarasola K., Aduriz I., Carroll J. (2016) Using Linguistic Data for Verb-Noun Combination Identification. Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016)
  141. Iñurrieta U., Aduriz I., Díaz de Ilarraza A., Labaka G., Sarasola K. (2016) Analysing linguistic information about word combinations for a Spanish-Basque Rule-Based Machine Translation system. Multiword Units in Machine Translation and Translation Technologies, John Benjamins Publishing Company (to be published)
  142. Iñurrieta U., Aduriz I., Díaz de Ilarraza A., Labaka G., Sarasola K. (2016) Izen+aditz konbinazioen itzulpenaz eta tratamendu konputazionalaz Senez 47. http://www.eizie.eus/Argitalpenak/Senez/20161103
  143. Casillas A., Díaz de Ilarraza A., Gojenola K., Oronoz M., Perez A. 2015 Computer aided classification of diagnostic terms in Spanish Expert Systems with Applications, Volume 42, Issue 6, 15 April 2015, Pages 2949-2958. JCR Impact Factor: 2.240, COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE: Q1.
  144. Maite Oronoz, Koldo Gojenola, Alicia Pérez, Arantza Díaz de Ilarraza, Arantza Casillas 2015. On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions. Journal of Biomedical Informatics, Volume 56, August 2015, Pages 318–332. JCR 2014: 2.126, COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS: Q1
  145. Weegar R., Arantza Casillas, Arantza Diaz de Ilarraza, Maite Oronoz, Alicia Pérez, Koldo Gojenola (2016). The impact of simple feature engineering in multilingual medical NER. Coling (Clinical NLP). Osaka, Japan, December 11-17, 2016.
  146. Sara Santiso, Arantza Casillas, Alicia Pérez, Maite Oronoz, Koldo Gojenola, 2015. Document-level adverse drug reaction event extraction on electronic health records in Spanish. Procesamiento del Lenguaje Natural, Volume 56, Pages 49–56.
  147. Alicia Pérez, Arantza Casillas, Koldo Gojenola (2016) Fully unsupervised low-dimensional representation of adverse drug reaction events through distributional semantics; CoLing (BioTxtM);
  148. Arantza Casillas, Koldo Gojenola, Alicia Pérez, Maite Oronoz (2016). Clinical text mining for efficient extraction of drug-allergy reactions. IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, China, Dec 15-18, 2016.
  149. Arantza Casillas, Arantza Diaz de Ilarraza, Kike Fernandez, Koldo Gojenola, Maite Oronoz, Alicia Pérez, Sara Santiso (2016). IXAmed-IE: on-line medical entity identification and ADR event extraction in Spanish. IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, China, Dec 15-18, 2016.
  150. Casillas A., Pérez A., Oronoz M., Gojenola K., Santiso S. (2016). Learning to extract adverse drug reaction events from electronic health records in Spanish. Expert Systems with Applications, Volume 61, 1 November 2016. JCR Impact Factor: 2.981 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE: Q1 5-Year Impact Factor: 2.879 SCImago Journal Rank (SJR): 1.487
  151. Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J. 2015. Methodology for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labelled at predicate level following the PropBank/Verbnet model. Edward Vanhoutte (ed.) Digital Scholarship in the Humanities, Volume 30, Number 2, 1-23. Oxford University Press (Online ISSN 2055-768X - Print ISSN 2055-7671) doi: 10.1093/llc/fqv001
  152. Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J. 2015. EPEC-RolSem: Ingelesezko PropBank-VerbNet eredura etiketatutako euskarazko corpusa. Erabakiak, egokitzapenak eta berezitasunak. Maria-José Ezeizabarrena & Ricardo Gómez (arg.). Eridenen du zerzaz kontenta: sailkideen omenaldia Henrike Knörr irakasleari (1947-2008), 179-206 or., Bilbo: Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua. ISBN: 978-84-9082-092-6.
  153. Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J. (2016) Methodology for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labelled at predicate level following the PropBank/Verbnet model.Edward Vanhoutte (ed.) Digital Scholarship in the Humanities (2016) 31 (3): 470-492.Published by Oxford University Press on behalf of EADH: The European Association for Digital Humanities (Online ISSN 2055-768X - Print ISSN 2055-7671)
  154. Saralegi, X., Agirre, E., and Alegria, I. (2016). Evaluating translation quality and clir performance of query sessions. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, may 23-28.
  155. I. San Vicente and X. Saralegi. 2016. Polarity lexicon building: to what extent is the manual effort worth? In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, may 23-28. [slides]
  156. Soria, C., Russo, I., Quochi, V., Hicks, D., Gurrutxaga, A., Sarhimaa, A., and Tuomisto, M. 2016. Fostering digital representation of eu regional and minority languages: the digital language diversity project. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, may 23-28.
  157. Zubiaga, A., San Vicente, I., Gamallo, P., Pichel, J. R., Alegria, I., Aranberri, N., Ezeiza A., Fresno, V. (2016). TweetLID: a benchmark for tweet language identification. Language Resources and Evaluation, volume 50, issue 4. pp. 729-766. DOI: 10.1007/s10579-015-9317-4
  158. Lindemann, D., and I. San Vicente. Bilingual Dictionary Drafting: Connecting Basque Word Senses to Multilingual Equivalents. In Proceedings of EURALEX 2016, 898–905. Tbilisi: Tbilisi State University, 2016.

Komunikabideetako agerpenak

ElkarOla proiektuaren komunikabideetako agerpenak